Parallel/Distributed FFT

Parallel FFT

Suppose we are given a polynomial P(X)=i=022n1ciXiP(X) = \sum_{i=0}^{2^{2n} - 1} c_i X^i. The Discrete Fourier Transform (DFT) of this polynomial is defined as:

c^j=i=022n1ciωij,where 0j<22n\hat{c}_j = \sum_{i=0}^{2^{2n} - 1} c_i \cdot \omega^{ij}, \quad \text{where } 0 \le j < 2^{2n}

Here, ω\omega is a primitive 22n2^{2n}-th root of unity.

We can decompose this equation using index splitting as follows (where 0j0,j1<2n0 \le j_0, j_1 < 2^n):

c^j12n+j0=i0=02n1i1=02n1ci12n+i0ω(i12n+i0)(j12n+j0)=(i0=02n1(ωi0j0(i1=02n1ci12n+i0(ω2n)(j12n+j0)i1)Step 1)Step 2(ω2n)i0j1)Step 3\begin{align*} \hat{c}_{j_1 \cdot 2^n + j_0} &= \sum_{i_0 = 0}^{2^n - 1} \sum_{i_1 = 0}^{2^n - 1} c_{i_1 \cdot 2^n + i_0} \cdot \omega^{(i_1 \cdot 2^n + i_0)(j_1 \cdot 2^n + j_0)} \\ &= \underbrace{ \left(\sum_{i_0 = 0}^{2^n - 1} \underbrace{\left( \omega^{i_0 j_0} \underbrace{ \left( \sum_{i_1 = 0}^{2^n - 1} c_{i_1 \cdot 2^n + i_0} \cdot (\omega^{2^n})^{(j_1 \cdot 2^n + j_0)i_1} \right)}_{\text{Step 1}} \right)}_{\text{Step 2}} \cdot (\omega^{2^n})^{i_0 j_1}\right)}_{\text{Step 3}} \end{align*}

Example (n=2n=2)

Let P(X)=c0+c1X++c15X15P(X) = c_0 + c_1 X + \cdots + c_{15} X^{15}, and suppose ω16=1\omega^{16} = 1. Define the following input vectors:

c(0)=(c0,c4,c8,c12)c(1)=(c1,c5,c9,c13)c(2)=(c2,c6,c10,c14)c(3)=(c3,c7,c11,c15)\begin{aligned} \mathbf{c}^{(0)} &= (c_0, c_4, c_8, c_{12}) \\ \mathbf{c}^{(1)} &= (c_1, c_5, c_9, c_{13}) \\ \mathbf{c}^{(2)} &= (c_2, c_6, c_{10}, c_{14}) \\ \mathbf{c}^{(3)} &= (c_3, c_7, c_{11}, c_{15}) \\ \end{aligned}

Step 1: Perform FFT on each c(i0)\mathbf{c}^{(i_0)}

c(0)^=(c(0)^0,c(0)^1,c(0)^2,c(0)^3)c(1)^=(c(1)^0,c(1)^1,c(1)^2,c(1)^3)c(2)^=(c(2)^0,c(2)^1,c(2)^2,c(2)^3)c(3)^=(c(3)^0,c(3)^1,c(3)^2,c(3)^3)\begin{aligned} \widehat{\mathbf{c}^{(0)}} &= (\widehat{c^{(0)}}_0, \widehat{c^{(0)}}_1, \widehat{c^{(0)}}_2, \widehat{c^{(0)}}_3) \\ \widehat{\mathbf{c}^{(1)}} &= (\widehat{c^{(1)}}_0, \widehat{c^{(1)}}_1, \widehat{c^{(1)}}_2, \widehat{c^{(1)}}_3) \\ \widehat{\mathbf{c}^{(2)}} &= (\widehat{c^{(2)}}_0, \widehat{c^{(2)}}_1, \widehat{c^{(2)}}_2, \widehat{c^{(2)}}_3) \\ \widehat{\mathbf{c}^{(3)}} &= (\widehat{c^{(3)}}_0, \widehat{c^{(3)}}_1, \widehat{c^{(3)}}_2, \widehat{c^{(3)}}_3) \end{aligned}

Each transform is defined by:

c(i0)^j0=i1=03c4i1+i0(ω4)(4j1+j0)i1\widehat{c^{(i_0)}}_{j_0} = \sum_{i_1 = 0}^3 c_{4 \cdot i_1 + i_0} \cdot (\omega^4)^{(4j_1 + j_0)i_1}

Step 2: Construct z[j0] \mathbf{z}^{[j_0]}

z[0]=(c(0)^0,c(1)^0,c(2)^0,c(3)^0)z[1]=(c(0)^1,ω1c(1)^1,ω2c(2)^1,ω3c(3)^1)z[2]=(c(0)^2,ω2c(1)^2,ω4c(2)^2,ω6c(3)^2)z[3]=(c(0)^3,ω3c(1)^3,ω6c(2)^3,ω9c(3)^3)\begin{aligned} \mathbf{z}^{[0]} &= (\widehat{c^{(0)}}_0, \widehat{c^{(1)}}_0, \widehat{c^{(2)}}_0, \widehat{c^{(3)}}_0) \\ \mathbf{z}^{[1]} &= (\widehat{c^{(0)}}_1, \omega^1 \widehat{c^{(1)}}_1, \omega^2 \widehat{c^{(2)}}_1, \omega^3 \widehat{c^{(3)}}_1) \\ \mathbf{z}^{[2]} &= (\widehat{c^{(0)}}_2, \omega^2 \widehat{c^{(1)}}_2, \omega^4 \widehat{c^{(2)}}_2, \omega^6 \widehat{c^{(3)}}_2) \\ \mathbf{z}^{[3]} &= (\widehat{c^{(0)}}_3, \omega^3 \widehat{c^{(1)}}_3, \omega^6 \widehat{c^{(2)}}_3, \omega^9 \widehat{c^{(3)}}_3) \end{aligned}

Each z[j0] \mathbf{z}^{[j_0]} can be defined generally as:

z[j0]:=(c(0)^j0,ωj0c(1)^j0,ω2j0c(2)^j0,ω3j0c(3)^j0)\mathbf{z}^{[j_0]} := \left( \widehat{c^{(0)}}_{j_0}, \omega^{j_0} \widehat{c^{(1)}}_{j_0}, \omega^{2j_0} \widehat{c^{(2)}}_{j_0}, \omega^{3j_0} \widehat{c^{(3)}}_{j_0} \right)

Step 3: Final FFT on each z[j0] \mathbf{z}^{[j_0]}

z[0]^=(c^0,c^4,c^8,c^12)z[1]^=(c^1,c^5,c^9,c^13)z[2]^=(c^2,c^6,c^10,c^14)z[3]^=(c^3,c^7,c^11,c^15)\begin{aligned} \widehat{\mathbf{z}^{[0]}} &= (\hat{c}_0, \hat{c}_4, \hat{c}_8, \hat{c}_{12}) \\ \widehat{\mathbf{z}^{[1]}} &= (\hat{c}_1, \hat{c}_5, \hat{c}_9, \hat{c}_{13}) \\ \widehat{\mathbf{z}^{[2]}} &= (\hat{c}_2, \hat{c}_6, \hat{c}_{10}, \hat{c}_{14}) \\ \widehat{\mathbf{z}^{[3]}} &= (\hat{c}_3, \hat{c}_7, \hat{c}_{11}, \hat{c}_{15}) \end{aligned}

Each transform is again a standard FFT:

z[j0]^j1=i0=03z[j0]i0(ω4)i0j1\widehat{z^{[j_0]}}_{j_1} = \sum_{i_0 = 0}^3 {z^{[j_0]}}_{i_0} \cdot (\omega^4)^{i_0 j_1}

Verification Examples

  • z[1]^2=c^9 \widehat{z^{[1]}}_2 = \hat{c}_9 (j1=2,j0=1)(j_1 = 2, j_0 = 1)

z[1]^2=z[1]0+z[1]1ω8+z[1]2ω16+z[1]3ω24=c(0)^1+c(1)^1ω9+c(2)^1ω18+c(3)^1ω27=i1=03c4i1(ω4)9i1+i1=03c4i1+1(ω4)9i1ω9+i1=03c4i1+2(ω4)9i1ω18+i1=03c4i1+3(ω4)9i1ω27=c^9 \begin{align*} \widehat{z^{[1]}}_2 &= {z^{[1]}}_0 + {z^{[1]}}_1 \omega^{8} + {z^{[1]}}_2 \omega^{16} + {z^{[1]}}_3 \omega^{24} \\ &= \widehat{c^{(0)}}_1 + \widehat{c^{(1)}}_1 \omega^{9} + \widehat{c^{(2)}}_1 \omega^{18} + \widehat{c^{(3)}}_1 \omega^{27} \\ &= \sum_{i_1=0}^3 c_{4i_1} (\omega^4)^{9i_1} + \sum_{i_1=0}^3 c_{4i_1 + 1} (\omega^4)^{9i_1}\omega^9 + \sum_{i_1=0}^3 c_{4i_1 + 2} (\omega^4)^{9i_1}\omega^{18} + \sum_{i_1=0}^3 c_{4i_1 + 3} (\omega^4)^{9i_1}\omega^{27} \\ &= \hat{c}_9 \end{align*}
  • z[2]^1=c^6\widehat{z^{[2]}}_1 = \hat{c}_6 (j1=1,j0=2)(j_1 = 1, j_0 = 2)

z[2]^1=z[2]0+z[2]1ω4+z[2]2ω8+z[2]3ω12=c(0)^2+c(1)^2ω6+c(2)^2ω12+c(3)^2ω18=i1=03c4i1(ω4)6i1+i1=03c4i1+1(ω4)6i1ω6+i1=03c4i1+2(ω4)6i1ω12+i1=03c4i1+3(ω4)6i1ω18=c^6\begin{align*} \widehat{z^{[2]}}_1 &= {z^{[2]}}_0 + {z^{[2]}}_1 \omega^{4} + {z^{[2]}}_2 \omega^{8} + {z^{[2]}}_3 \omega^{12} \\ &= \widehat{c^{(0)}}_2 + \widehat{c^{(1)}}_2 \omega^{6} + \widehat{c^{(2)}}_2 \omega^{12} + \widehat{c^{(3)}}_2 \omega^{18} \\ &= \sum_{i_1=0}^3 c_{4i_1} (\omega^4)^{6i_1} + \sum_{i_1=0}^3 c_{4i_1 + 1} (\omega^4)^{6i_1} \omega^6 + \sum_{i_1=0}^3 c_{4i_1 + 2} (\omega^4)^{6i_1} \omega^{12} + \sum_{i_1=0}^3 c_{4i_1 + 3} (\omega^4)^{6i_1} \omega^{18} \\ &= \hat{c}_6 \end{align*}

The inverse FFT is structurally similar, but we do not cover it in this article.

Distributed FFT

The Parallel FFT algorithm described above can be implemented as a Distributed FFT by mapping it onto the MapReduce model. The diagram above illustrates how the same example from the parallel FFT is executed within the MapReduce paradigm.

  • Map phase: Each executor performs the first-stage FFTs over c(i0)\mathbf{c}^{(i_0)}

  • Shuffle (transpose + twist): Results are reorganized into z[j0]\mathbf{z}^{[j_0]}

  • Reduce phase: Second-stage FFTs are computed over each z[j0]\mathbf{z}^{[j_0]}

This structure enables FFT computations over extremely large input sizes to be parallelized and scaled effectively in distributed systems like Sparkarrow-up-right or Hadooparrow-up-right.

Reference

Written by Ryan Kim of Fractalyze

Last updated