Speeding Up Sumcheck

Presentation: https://www.youtube.com/watch?v=YXANnCp5118

The sum-check protocol is a fundamental tool in the design of modern succinct interactive proofs. It allows us to verify claims of the form

x{0,1}g(x)=C0\sum_{\bm{x}\in\{0,1\}^\ell}g(\bm{x})=C_0

for a \ell-variate polynomial gg of degree at most dd in each variable by checking the evaluation of gg at a random point with a single oracle query at random challenge rF\bm{r}\in\mathbb{F}^\ell:

g(r)=vg(\bm{r}) = v

Recap of the Protocol

Setup: Assume that the verifier has oracle access to the polynomial gg, and aims to verify:

x{0,1}g(x)=C0\sum_{\bm{x}\in\{0,1\}^\ell}g(\bm{x})=C_0

For each round i=1,,i = 1, \dots,\ell:

  1. P\mathcal{P} sends the univariate polynomial si(X)s_i(X) claimed to equal:

(xi+1,...,x){0,1}ig(r1,,ri1,X,xi+1,...,x)\sum_{(x_{i+1},...,x_{\ell})\in\{0,1\}^{\ell−i}} g(r_1,\dots, r_{i−1}, X, x_{i+1}, . . . , x_\ell)

P\mathcal{P} does so by sending the evaluations {si(u):uU^d}\{s_i(u) : u \in \widehat{U}_d\} to V\mathcal{V}.

circle-info

Here, the original protocol typically evaluates from some evaluation domain UdU_d but we can take U^d=Ud/{1}\widehat{U}_d=U_d/\{1\} and since verifier can calculate si(1)s_i(1) using other evaluations.

  1. V\mathcal{V} sends a random challenge riFr_i \leftarrow \mathbb{F} to P\mathcal{P}.

  2. V\mathcal{V} derives si(1):=Ci1si(0)s_i(1) := C_{i−1} − s_i(0), then sets Ci:=si(ri)C_i := s_i(r_i). After \ell rounds, we reduce to the claim that g(r1,,r)=Cg(r_1,\dots,r_{\ell}) = C_\ell. V\mathcal{V} checks this claim by making a single oracle query to gg.

Typically, the initial polynomial involve "small" values (e.g., from a base field Fp\mathbb{F}_p), but for security, the protocol's challenges are drawn from a much larger possible values (e.g., from a extension field Fpk\mathbb{F}_{p^k}). Below, I will denote this challenge field as Fc\mathbb{F}_c. This creates a significant performance gap between different types of multiplication:

  • ss\mathfrak{ss}(small-small): Base field by base field. Very fast.

  • sl\mathfrak{sl} (small-large): Base field by extension field. Fast.

  • ll\mathfrak{ll} (large-large): Extension field by extension field. Very slow.

Therefore, the one of the optimization idea is reducing the number of ll\mathfrak{ll} multiplications.

Existing Algorithms

Consider the case where the polynomial g(x)g(x) is given as

g(X)=p1(X)×p2(X)××pd(X)=k=1dpk(X)(9)g(X) = p_1(X) \times p_2(X) \times \cdots \times p_d(X)=\prod^d_{k=1}p_k(X) \tag{9}

where p1,...,pdp_1, . . . , p_d are dd multilinear polynomials in \ell variables such that pi(x)p_i(\bm{x}) has small values for all i{1,...,d}i \in \{1, . . . , d\} and x{0,1}\bm{x} \in \{0, 1\}^\ell.

Algorithm 1:

The prover maintains dd arrays P1,,PdP_1,\dots, P_d which initially store all evaluations of p1,...,pdp_1, . . . , p_d over {0,1}\{0, 1\}^\ell, e.g., we have Pk[x]=pk(x)P_k[\bm{x}] = p_k(\bm{x}) for all k[1,d]k \in [1,d] and x{0,1}\bm{x}\in\{0, 1\}^\ell. The prover halves the size of the arrays after each round ii, via “binding” the arrays {Pk}k[0,d]\{P_k\}_{k\in[0,d]} to the challenge rir_i received from the verifier.

circle-info

Setup: arrays P1(0),,Pd(0)P_1^{(0)},\dots,P_d^{(0)} of size 22^\ell, corresponding to evaluations of each polynomial over the boolean hypercube {0,1}\{0,1\}^\ell

For each round i=1,,i = 1, \dots,\ell:

  1. For uU^du \in \widehat{U}_d, compute si(u)s_i(u) using the formula:

si(u):=x{0,1}ik=1d((Pk(i)[1,x]Pk(i)[0,x])u+Pk(i)[0,x])s_i(u) := \sum_{\bm{x}'\in\{0,1\}^{\ell−i}} \prod^d_{k=1}\bigg(\big(P^{(i)}_k[1,\bm{x}'] − P^{(i)}_k [0,\bm{x}']\big)\cdot u + P^{(i)}_k[0, \bm{x}'] \bigg)
  1. Send evaluations {si(u):uU^d}\{s_i(u) : u \in \widehat{U}_d\} to the verifier.

  2. Receive challenge riFcr_i \in \mathbb{F}_c from the verifier.

  3. For k=1,...,dk = 1, . . . , d and x{0,1}i\bm{x}' \in \{0, 1\}^{\ell−i}, update arrays:

Pk(i+1)[x]:=(Pk(i)[1,x]Pk(i)[0,x])ri+Pk(i)[0,x]P^{(i+1)}_k[\bm{x}']:=(P^{(i)}_k[1, \bm{x}'] − P^{(i)}_k[0,\bm{x}'])\cdot r_i + P^{(i)}_k[0,\bm{x}']
circle-exclamation

Lemma. When the evaluations of p1,...,pdp_1, . . . , p_d are small, Algorithm 1’s dominant cost is d221lld\cdot 2\cdot 2^{\ell -1}\mathfrak{ll} multiplications, followed by d21sld · 2^{ℓ−1}\mathfrak{sl} multiplications, d(d1)21ssd(d − 1) · 2^{\ell-1} \mathfrak{ss}, and O(n)O(n) field additions.

Algorithm 2: Quasilinear Time and Square-Root Space

The main idea is to avoid updating the arrays P1,,PdP_1,\dots, P_d in each round i/2i \leq \ell/2, and instead compute k=1dpk(r1,,ri1,u,x)\prod^d_{k=1} p_k(r_1,\dots, r_{i−1}, u, x') directly using the following lemma.

Lemma. Let p:FFp: \mathbb{F}^\ell \rightarrow \mathbb{F} be a multilinear polynomial. Then for any 0i0 \leq i \leq \ell, any (r1,...,ri)Fci(r_1, . . . , r_i) \in \mathbb{F}_c^i, and any x{0,1}ix' ∈ \{0, 1\}^{\ell-i},

p(r[1,i],x)=y{0,1}ieq~(r[1,i],y)p(y,x)p(\bm{r}_{[1,i]}, \bm{x}' ) = \sum_{\bm{y}\in\{0,1\}^i} \widetilde{\sf{eq}}(\bm{r}_{[1,i]}, \bm{y})\cdot p(\bm{y}, \bm{x}')

With this formula, we can calculate si(u)s_i(u) as follows:

si(u)=x{0,1}ik=1dpk(r[1,i1],u,x)=x{0,1}ik=1dy{0,1}ieq~((r[1,i1],u),y)pk(y,x)=x{0,1}ik=1dy{0,1}i1eq~((r[1,i1],u),(y,u))pk(y,u,x)=x{0,1}ik=1dy{0,1}i1eq~(r[1,i1],y)p(y,u,x)\begin{aligned} s_i(u)&=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod^d_{k=1} p_k(\bm{r}_{[1,i-1]},u,\bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^d\sum_{\bm{y}\in\{0,1\}^i}\widetilde{\sf{eq}}((\bm{r}_{[1,i-1]},u), \bm{y})\cdot p_k(\bm{y}, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^d\sum_{\bm{y}'\in\{0,1\}^{i-1}}\widetilde{\sf{eq}}((\bm{r}_{[1,i-1]},u), (\bm{y}',u))\cdot p_k(\bm{y}',u, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^d\sum_{\bm{y}'\in\{0,1\}^{i-1}}\widetilde{\sf{eq}}(\bm{r}_{[1,i-1]}, \bm{y}')\cdot p(\bm{y}',u, \bm{x}')\\ \end{aligned}

After /2\ell/2 rounds, as both the time and space needed for the eq~\widetilde{\sf{eq}} evaluations goes past 2/22^{\ell/2}, we will switch to Algorithm 1, which incurs a negligible cost of d22/2d^2\cdot2^{ℓ/2} ll\mathfrak{ll}. Note that in practice, one could switch as soon as the prover is no longer space-constrained, which may happen well before round /2\ell/2.

circle-info

For each round i=1,,/2i=1,\dots,\ell/2:

  1. For uU^du\in\widehat{U}_d, compute si(u)s_i(u) using the formula:

si(u)=x{0,1}ik=1dy{0,1}i1eq~(r[1,i1],y)pk(y,u,x)(10)s_i(u)=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^d\sum_{\bm{y}'\in\{0,1\}^{i-1}}\widetilde{\sf{eq}}(\bm{r}_{[1,i-1]}, \bm{y}')\cdot p_k(\bm{y}',u, \bm{x}') \tag{10}
  1. Send evaluations {si(u):uU^d}\{s_i(u) : u \in \widehat{U}_d\} to the verifier.

  2. Receive challenge riFcr_i \in \mathbb{F}_c from the verifier.

During round i=/2i=\ell/2:

  1. Store the evaluations of pk(r[1,/2],b,x)p_k(\bm{r}_{[1,\ell/2]},b,\bm{x}') for all k=1,,dk=1,\dots,d, b{0,1}b\in\{0,1\} and x{0,1}/2\bm{x}'\in\{0,1\}^{\ell/2}, obtained while computing round /2\ell/2.

For the leftover rounds:

  1. Follow algorithm 1.

circle-exclamation

Lemma. When the evaluations of p1,,pdp_1,\dots, p_d are small, Algorithm 2’s cost is dominated by d021sld · \ell_0 \cdot 2^{\ell-1} \mathfrak{sl} multiplications (spread over the first l/2 rounds), and d(d1)2lld\cdot(d − 1) · 2^\ell \mathfrak{ll} multiplications. The lower-order costs include d0(d1)2l1ssd\cdot \ell_0\cdot(d − 1)\cdot2^{l−1} \mathfrak{ss} multiplications and O(n)O(n) field additions.

Algorithm 3: Warm-up Attempt

In the Algorithm 1, the prover performs exclusively ll\mathfrak{ll} multiplications starting from round 2, due to the need to bind the multilinear polynomials after round 1. If we want to reduce the number of ll\mathfrak{ll} multiplications, we must delay this binding process. A natural attempt at this leads to Algorithm 3, which is typically faster than Algorithm 1 in the first few rounds, and uses significantly less memory.

If we rearrange the Equation (10), we can get:

si(u)=x{0,1}ik=1dy{0,1}i1eq~(r[1,i1],y)pk(y,u,x)=x{0,1}iy1,,yd{0,1}i1k=1deq~(r[1,i1],yk)pk(yk,u,x)=x{0,1}iy1,,yd{0,1}i1k=1deq~(r[1,i1],yk)k=1dpk(yk,u,x)=x{0,1}iy1,,yd{0,1}i1k=1d(j=1i1(1rj)yk[j]rj1yk[j])k=1dpk(yk,u,x)=x{0,1}iy1,,yd{0,1}i1j=1i1((1rj)yk[j]rjdyk[j])k=1dpk(yk,u,x)=y1,,yd{0,1}i1x{0,1}ij=1i1((1rj)yk[j]rjdyk[j])k=1dpk(yk,u,x)=y1,,yd{0,1}i1j=1i1((1rj)yk[j]rjdyk[j])x{0,1}ik=1dpk(yk,u,x)\begin{aligned} s_i(u)&=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^d\sum_{\bm{y}'\in\{0,1\}^{i-1}}\widetilde{\sf{eq}}(\bm{r}_{[1,i-1]}, \bm{y}')\cdot p_k(\bm{y}',u, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}}\prod_{k=1}^d\widetilde{\sf{eq}}(\bm{r}_{[1,i-1]}, \bm{y}'_k)\cdot p_k(\bm{y}'_k,u, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}}\prod_{k=1}^d\widetilde{\sf{eq}}(\bm{r}_{[1,i-1]}, \bm{y}'_k)\cdot \prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}}\prod_{k=1}^d\bigg(\prod_{j=1}^{i-1}(1-r_j)^{\bm{y}'_k[j]}\cdot r_j^{1-\bm{y}'_k[j]} \bigg)\cdot\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}}\prod_{j=1}^{i-1}\bigg((1-r_j)^{\sum\bm{y}'_k[j]}\cdot r_j^{d-\sum\bm{y}'_k[j]}\bigg)\cdot\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\\ &=\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}}\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{j=1}^{i-1}\bigg((1-r_j)^{\sum\bm{y}'_k[j]}\cdot r_j^{d-\sum\bm{y}'_k[j]}\bigg)\cdot\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\\ &=\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}}\prod_{j=1}^{i-1}\bigg((1-r_j)^{\sum\bm{y}'_k[j]}\cdot r_j^{d-\sum\bm{y}'_k[j]}\bigg)\cdot\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\\ \end{aligned}
  1. Exchanging product with the inner sum

  2. Distribute \prod

  3. Expand eq

  4. Switch inner \prod with outer \prod

  5. Switch the order of outer sums

  6. eq~\widetilde{\mathsf{eq}} part is invariant over x\bm{x}' so we can factor it out.

At this point, we have rewritten the computation of si(u)s_i(u) as an inner product between two vectors of length 2d(i1)2^{d(i−1)} (indexed by y1,...,yd\bm{y}'_1, . . . , \bm{y}'_d): one dependent on the challenges r1,...,ri1r_1, . . . , r_{i−1}, and the other dependent on the multilinear polynomials p1,...,pdp_1, . . . , p_d.

We can do slightly better by noticing that the challenge-dependent terms only depend on v{k=1dyk[j]}j[1,i1]\bm{v} \in \{\sum_{k=1}^d \bm{y}'_k[j]\}_{j\in[1,i−1]} and not on the individual {yk}k[1,d]\{\bm{y}'_k\}_{k\in[1,d]}—there are only (d+1)i1(d + 1)^{i−1} distinct such terms. By re-indexing based on v[0,d]i1\bm{v} \in [0, d]^{i−1}, we can further rewrite:

si(u)=y1,,yd{0,1}i1j=1i1((1rj)yk[j]rjdyk[j])x{0,1}ik=1dpk(yk,u,x)=v{0,1,d}i1j=1i1((1rj)vjrjdvj)x{0,1}iy1,,yd{0,1}i1;v=ykk=1dpk(yk,u,x)\begin{aligned} s_i(u)&=\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}}\prod_{j=1}^{i-1}\bigg((1-r_j)^{\sum\bm{y}'_k[j]}\cdot r_j^{d-\sum\bm{y}'_k[j]}\bigg)\cdot\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\\ &=\sum_{\bm{v}\in\{0,1\dots,d\}^{i-1}}\prod_{j=1}^{i-1}\bigg((1-r_j)^{\bm{v}_j}\cdot r_j^{d-\bm{v}_j}\bigg)\cdot\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}; \bm{v}=\sum \bm{y}_k}\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}') \end{aligned}

Notice that above sum over the hypercube v{0,,d}i1\bm{v}\in\{0,\dots,d\}^{i-1} can be thought of as a inner product of a two vector of size (d+1)i1(d+1)^{i-1}:

j=1i1((1rj)vjrjdvj)vj=0d,(x{0,1}iy1,,yd{0,1}i1v=ykk=1dpk(yk,u,x))v[0,d]i1\Braket{\bigotimes_{j=1}^{i-1}\bigg((1-r_j)^{\bm{v}_j}\cdot r_j^{d-\bm{v}_j}\bigg)_{v_j=0}^d, \bigg(\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\substack{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}\\ \bm{v}=\sum \bm{y}_k}}\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\bigg)_{\bm{v}\in[0,d]^{i−1}}}
circle-info

(1rj)vjrjdvj(1-r_j)^{\bm{v}_j}\cdot r_j^{d-\bm{v}_j}: this can have dd different values depending on the value of vjv_j which can range from 00 to dd. If we store them in a vector, we can denote as:

((1rj)vjrjdvj)vj=0d\bigg((1-r_j)^{\bm{v}_j}\cdot r_j^{d-\bm{v}_j}\bigg)_{v_j=0}^d

The outer sum and product is then taking product of all possible i1i-1 combinations of entries within this vector and summing them. These product combinations can be thought of as Kronecker productarrow-up-right:

j=1i1((1rj)vjrjdvj)vj=0d\bigotimes_{j=1}^{i-1}\bigg((1-r_j)^{\bm{v}_j}\cdot r_j^{d-\bm{v}_j}\bigg)_{v_j=0}^d
circle-info

Let us denote the right hand side of the inner product as:

Ai(v,u)=x{0,1}iy1,,yd{0,1}i1v=ykk=1dpk(yk,u,x)\mathsf{A}_i(\bm{v},u)=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\substack{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}\\ \bm{v}=\sum \bm{y}_k}}\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')

Then we have:

si(u)=j=1i1((1rj)vjrjdvj)vj=0d,(Ai(v,u))v[0,d]i1s_i(u)=\Braket{\bigotimes_{j=1}^{i-1}\bigg((1-r_j)^{\bm{v}_j}\cdot r_j^{d-\bm{v}_j}\bigg)_{v_j=0}^d, \bigg(\mathsf{A}_i(\bm{v},u)\bigg)_{\bm{v}\in[0,d]^{i−1}}}

Since the accumulators Ai(v,u)\mathsf{A}_i(\bm{v}, u) do not depend on the challenges r1,...,ri1r_1, . . . , r_{i−1}, the prover can precompute them before the protocol begins. For each of these, we don't have to recompute {pk(yk,u,x)}k[1,d]\{p_k(\bm{y}'_k, u, \bm{x}')\}_{k\in[1,d]} and instead reuse these across them. For this, we can do multilinear extension on pkp_k over variable uu to get:

Ai(v,u)=x{0,1}iy1,,yd{0,1}i1;v=ykk=1dpk(yk,u,x)=x{0,1}iy1,,yd{0,1}i1;v=ykk=1dyk{0,1}uyk(1u)1ykpk(yk,yk,x)=x{0,1}iy1,,yd{0,1}i1;v=yk{y1,,yd}{0,1}uyj(u1)dyjk=1dpk(yk,yk,x)\begin{aligned} \mathsf{A}_i(\bm{v},u)&=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}; \bm{v}=\sum \bm{y}_k}\prod_{k=1}^dp_k(\bm{y}'_k,u, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}; \bm{v}=\sum \bm{y}_k}\prod_{k=1}^d\sum_{y^*_k\in\{0,1\}}u^{y^*_k}(1-u)^{1-y^*_k}p_k(\bm{y}'_k,y^*_k, \bm{x}')\\ &=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{y}'_1,\dots,\bm{y}'_d\in\{0,1\}^{i-1}; \bm{v}=\sum \bm{y}_k}\sum_{\{y^*_1, \dots, y^*_d\}\in\{0,1\}}u^{\sum y^*_j}(u-1)^{d-\sum y^*_j}\cdot\prod_{k=1}^dp_k(\bm{y}'_k,y^*_k, \bm{x}') \end{aligned}

and now we only need to evaluations of pkp_k over the boolean hypercube.

Cost Analysis. Algorithm 3 uses O((d+1)0)O((d + 1)^{\ell_0} ) space for rounds 11 to 0\ell_0 (accumulators and challenge vectors) and d20d · 2^{\ell−\ell_0} space thereafter (cached results). Since (d+1)0d20(d + 1)^{\ell_0} \ll d · 2^{\ell−\ell_0} in practice, the overall space complexity is dominated by the latter term, representing a space saving factor of roughly 202^{\ell_0} compared to Algorithm 1.

Regarding runtime, Algorithm 3 is bottlenecked by the (d1)2d020ss(d − 1) · 2^{d\ell_0} · 2^{\ell−\ell_0} \mathfrak{ss} multiplications needed to compute the products {k=1dpk(yk,x)}y,x\{\prod_{k=1}^d p_k(\bm{y}'_k, \bm{x}')\}_{\bm{y},\bm{x}'}. We also incur roughly d2sld · 2^\ell \mathfrak{sl} mults when using Algorithm 2 for the (0+1)(\ell_0 + 1)-th round, and roughly d2201lld^2 · 2^{\ell−\ell_0−1}\mathfrak{ll} mults for the remaining rounds using Algorithm 1.

Algorithm 4: Optimizing with Toom-Cook Multiplication

While Algorithm 3 almost eliminates ll\mathfrak{ll} multiplications for the first few rounds and slashes memory usage, we still end up performing a lot of ss\mathfrak{ss} multiplications (namely, by a factor of 2(d1)0\approx 2^{(d−1)·\ell_0} compared to the number of ll\mathfrak{ll} mults in Algorithm 1). To reduce this cost, we need another way to rewrite the products k=1dpk(r[1,i1],u,x)\prod_{k=1}^d p_k(\bm{r}_{[1,i-1]},u, \bm{x}').

The idea is to see the product as the evaluation of the polynomial:

F(Y1,...,Yi1)=k=1dpk(Y1,...,Yi1,u,x) F(Y_1, . . . , Y_{i−1}) = \prod_{k=1}^d p_k(Y_1, . . . , Y_{i−1}, u, \bm{x}')

at the challenge point r[1,i1]\bm{r}_{[1,i-1]}.

Since FF has individual degree dd, we can apply Lagrange interpolation to each variable to rewrite F(Y1,,Yi1)F(Y_1,\dots, Y_{i−1}) as

v1UdLUd,v1(Y1)vi1UdLUd,vi1(Yi1)k=1dpk(v,u,x)(14)\sum_{v_1\in U_d}\mathcal{L}_{U_d,v_1}(Y_1)\dots\sum_{v_{i-1}\in U_d}\mathcal{L}_{U_d,v_{i-1}}(Y_{i-1})\prod_{k=1}^dp_k(\bm{v},u,\bm{x}')\tag{14}

As a result, when we plug in the evaluation (r1,,ri1)(r_1, \dots , r_{i−1}), we can rewrite the ii-th sum-check polynomial si(X)s_i(X) as:

si(u)=x{0,1}ivUdj=1i1LUd,vj(Yi1)k=1dpk(v,u,x)=vUdi1j=1i1LUd,vj(rj)x{0,1}ik=1dpk(v,u,x)\begin{aligned} s_i(u)&=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\sum_{\bm{v}\in U_d}\prod_{j=1}^{i-1}\mathcal{L}_{U_d,\bm{v}_j}(Y_{i-1})\prod_{k=1}^dp_k(\bm{v},u, \bm{x}')\\ &=\sum_{\bm{v}\in U_{d}^{i-1}}\prod_{j=1}^{i-1}\mathcal{L}_{U_d,\bm{v}_j}(r_j)\cdot\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\prod_{k=1}^d p_k(\bm{v},u,\bm{x}') \end{aligned}

then we have:

si(u)=j=1i1(LUd,vj(rj))vjUd,(Ai(v,u))vUdi1s_i(u)=\Braket{\bigotimes_{j=1}^{i-1}\big(\mathcal{L}_{U_d,v_j}(r_j)\big)_{v_j\in U_d}, \bigg(\mathsf{A}_i(\bm{v},u)\bigg)_{\bm{v}\in U_d^{i−1}}}

Just like with Algorithm 3, we can precompute the accumulators Ai(v,u)\mathsf{A}_i(\bm{v}, u) by computing each of the (d+1)0\approx (d + 1)^{\ell_0} product

{k=1dpk(v,u,x):v}\bigg\{\prod_{k=1}^d p_k(\bm{v},u,\prod{x}'):\bm{v}\bigg\}

over vUdi1\bm{v}\in U_d^{i-1}, uU^du \in \widehat{U}_d and x{0,1}i\bm{x}'\in\{0,1\}^{\ell-i} once, then inserting the result into the appropriate Ai(v,u)\mathsf{A}_i(\bm{v}, u) terms over all i[1,0]i\in [1,\ell_0] rounds. For details, see Section A.3.

Cost Analysis. Similar to Algorithm 3, Algorithm 4 also saves a factor of 2l02^{l_0} in space. Unlike Algorithm 3, however, it incurs only (d1)(d+1)020ss(d − 1) · (d + 1)^{\ell_0} · 2^{\ell−\ell_0}\mathfrak{ss}, instead of (d1)2d020(d − 1) · 2^{d\cdot \ell_0} · 2^{\ell−\ell_0}. This factor can be roughly 5 times smaller for common parameters (e.g. d=2d = 2 and 0=5\ell_0 = 5). The cost of sl\mathfrak{sl} and ll\mathfrak{ll} are the same as in Algorithm 3.

Optimizations for Sum-Check involving EqPoly

In this section, the optimization only applies to sum-check where gg is of the form:

x{0,1}g(x)=x{0,1}eq~(w,x)k=1dpk(x)\sum_{\bm{x}\in\{0,1\}^\ell}g(\bm{x})=\sum_{\bm{x}\in\{0,1\}^\ell}\widetilde{\mathsf{eq}}(\bm{w},\bm{x})\cdot\prod_{k=1}^dp_k(\bm{x})

with wFcw \in \mathbb{F}_c^\ell a vector of verifier’s challenges, and p1(X),,pd(X)p_1(X), \dots, p_d(X) are multilinear polynomials. (e.g QioQ_{\bm{io}} zero-check in Spartan).

Gruen's Optimization

The main idea of this optimization by Gruen et al. is to reduce the computation of the degree-(d+1)(d + 1) polynomial si(X)s_i(X) sent by the prover in each round ii, to the computation of a degree-dd factor ti(X)t_i(X) and a linear factor li(X)\mathsf{l}_i(X). This way, the linear term is “free” to compute, and the prover saves an evaluation when computing ti(X)t_i(X) instead of si(X)s_i(X).

This is possible due to the special structure of the eq~\widetilde{\mathsf{eq}} polynomial, which for every i[1,]i \in [1,\ell] can be decomposed as:

eq~(w,(r[1,i1],X,x))=eq~(w[1,i1],r[1,i1])eq~(wi,X)eq~(w[i+1,],x)\widetilde{\mathsf{eq}}(\bm{w},(\bm{r}_{[1,i-1]},X,\bm{x}'))=\widetilde{\mathsf{eq}}(\bm{w}_{[1,i-1]},\bm{r}_{[1,i-1]})\cdot\widetilde{\mathsf{eq}}(w_i,X)\cdot\widetilde{\mathsf{eq}}(\bm{w}_{[i+1,\ell]},\bm{x}')

If we adapt the original definition for si(u)s_i(u) to this case, we get:

si(u):=x{0,1}ieq~(w,r[1,i1],u,x)k=1d((Pk(i)[1,x]Pk(i)[0,x])u+Pk(i)[0,x])s_i(u) := \sum_{x'\in\{0,1\}^{\ell−i}}\widetilde{\mathsf{eq}}(\bm{w},\bm{r}_{[1,i-1]},u,\bm{x}') \prod^d_{k=1}\bigg(\big(P^{(i)}_k[1, x'] − P^{(i)}_k [0,x']\big)\cdot u + P^{(i)}_k[0, x'] \bigg)

Thus, if we set li(X):=eq~(w[1,i1],r[1,i1])eq~(wi,X)\mathsf{l}_i(X):=\widetilde{\mathsf{eq}}(\bm{w}_{[1,i-1]},\bm{r}_{[1,i-1]})\cdot\widetilde{\mathsf{eq}}(w_i,X) and

ti(X):=x{0,1}ieq~(w[i+1,],x)k=1dpk(r[1,i1],X,x)t_i(X):=\sum_{\bm{x}'\in\{0,1\}^{\ell-i}}\widetilde{\mathsf{eq}}(\bm{w}_{[i+1,\ell]},\bm{x}')\cdot\prod_{k=1}^dp_k(\bm{r}_{[1,i-1]},X,\bm{x}')

then we have si(X)=li(X)ti(X)s_i(X)=\mathsf{l}_i(X)\cdot t_i(X).

Using this fact, the sum-check prover now instead computes dd evaluations of ti(u)t_i(u) at uU^du\in\widehat{U}_d, then rely on the equality si(0)+si(1)=Cis_i(0) + s_i(1)=C_i to compute:

ti(1):=li(1)1(Cili(0)ti(0))t_i(1):=\mathsf{l}_i(1)^{-1}\cdot(C_i-\mathsf{l}_i(0)\cdot t_i(0))

The prover now has d+1d+1 evaluations of tit_i, and thus can interpolate ti(X)t_i(X) and compute si(X)s_i(X). Note that computing {ti(u):uU^d}\{t_i(u):u\in\widehat{U}_d\} can be done generically using either Algorithm 1 or 2; since our focus is on optimizing time rather than space, we assume that Algorithm 1 is used.

Algorithm 5

We can further reduce 21ll2^{\ell-1}\mathfrak{ll} associated with the eq~\widetilde{\mathsf{eq}} polynomial. Our insight is that we can further split off the eq~\widetilde{\mathsf{eq}} terms for the first i</2i<\ell/2 rounds, so that:

ti(u)=xR{0,1}/2eq~(wR,xR)xL{0,1}/2ieq~(wL,xL)(17)t_i(u)=\sum_{\bm{x}_R\in\{0,1\}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}_R,\bm{x}_R)\cdot\sum_{\bm{x}_L\in\{0,1\}^{\ell/2 -i}}\widetilde{\mathsf{eq}}(\bm{w}_L,\bm{x}_L)\cdot \tag{17}
k=1dpk(r[1,i1],u,xL,xR)(18)\cdot \prod^{d}_{k=1}p_k(\bm{r}_{[1,i-1]},u,\bm{x}_L,\bm{x}_R)\tag{18}

for all uU^du\in\widehat{U}_d, where wR=w[i+1,/2]\bm{w}_R=\bm{w}_{[i+1,\ell/2]} and wL=w[/2+1,n]\bm{w}_L=\bm{w}_{[\ell/2 + 1,n]}. The sum-check prover now computes all inner sums over xL\bm{x}_L, then all outer sums over xR\bm{x}_R. For rounds i/2i\geq \ell/2, we no longer have the inner sum, and proceed similarly to Algorithm 1. Note that we put the left part eq~(wL,xL)\widetilde{\mathsf{eq}}(\bm{w}_L, \bm{x}_L) in the inner sum, since it leads to better memory locality in practice, assuming the evaluations of pkp_k are streamed in big-endian order (so that xL\bm{x}_L are low-order bits compared to xR\bm{x}_R, which means their chunks are contiguous in memory).

circle-info

Initialize: Length-22^\ell arrays P1(0),,Pd(0)P^{(0)}_1,\dots, P^{(0)}_d such that

Pk(0)[x]=pk(x) for all k=1,,d and x{0,1}lP^{(0)}_k[x] = p_k(x) \text{ for all }k = 1,\dots,d \text{ and }\bm{x}\in\{0, 1\}^l

Pre-computation: Use Procedure 3 from the paper to compute the evaluations:

{eq~(w[1,i],xL),eq~(w[/2,/2+i],xR)}\{\widetilde{\mathsf{eq}}(\bm{w}_{[1,i]},\bm{x}_L), \widetilde{\mathsf{eq}}(\bm{w}_{[\ell/2,\ell/2+i]},\bm{x}_R)\}

for all i[1,/2],xL{0,1}/2ii\in[1,\ell/2],\bm{x}_L\in\{0,1\}^{\ell/2-i}, and xR{0,1}/2i\bm{x}_R\in\{0,1\}^{\ell/2-i}.

For each round i=1,,i=1,\dots,\ell:

  1. For uU^du\in\widehat{U}_d, if i</2i < \ell/2, compute ti(u)t_i(u) using the formula

xR{0,1}/2eq~(wR,xR)xL{0,1}/2ieq~(wL,xL)k=1d((Pk(i)[1,xL,xR]Pk(i)[0,xL,xR])u+Pk(i)[0,xL,xR])\sum_{\bm{x}_R\in\{0,1\}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}_R,\bm{x}_R)\cdot \sum_{\bm{x}_L\in\{0,1\}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}_L,\bm{x}_L)\cdot\prod_{k=1}^{d}\bigg(\big(P_k^{(i)}[1,\bm{x}_L,\bm{x}_R]-P_k^{(i)}[0,\bm{x}_L,\bm{x}_R]\big)\cdot u + P_k^{(i)}[0,\bm{x}_L,\bm{x}_R]\bigg)

If i/2i\geq\ell/2, compute ti(u)t_i(u) using the formula:

xR{0,1}/2ieq~(wR,xR)k=1d((Pk(i)[1,xR]Pk(i)[0,xR])u+Pk(i)[0,xR])\sum_{\bm{x}_R\in\{0,1\}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}_R,\bm{x}_R)\cdot\prod_{k=1}^{d}\bigg(\big(P_k^{(i)}[1,\bm{x}_R]-P_k^{(i)}[0,\bm{x}_R]\big)\cdot u + P_k^{(i)}[0,\bm{x}_R]\bigg)
  1. Use Procedure 8 form the paper to compute si(X)s_i(X) from ({ti(u)},w,r[1,i1])\big(\{t_i(u)\},\bm{w},\bm{r}_{[1,i-1]}\big)

  2. Send evaluations of sis_i over U^d\widehat{U}_d

  3. Receive challenge riFr_i\in\mathbb{F} from the verifier.

  4. For k[1,d]k \in [1,d] and x{0,1}i\bm{x}' \in \{0, 1\}^{\ell−i}, update arrays:

Pk(i+1)[x]:=(Pk(i)[1,x]Pk(i)[0,x])ri+Pk(i)[0,x]P^{(i+1)}_k[\bm{x}']:=(P^{(i)}_k[1, \bm{x}'] − P^{(i)}_k[0,\bm{x}'])\cdot r_i + P^{(i)}_k[0,\bm{x}']

Cost Analysis. Our optimization avoids computing the 2i2^{\ell−i}-sized table of evaluations {eq~(w[i+1,],x):x{0,1}i}\{\widetilde{\mathsf{eq}}(\bm{w}_{[i+1,\ell]}, \bm{x}') : x' \in \{0, 1\}^{\ell−i}\} for each round ii, and instead only compute two square-root-sized tables, which costs 2/2ll2^{\ell/2}\mathfrak{ll} mults for eq~(wR,xR)\widetilde{\mathsf{eq}}(\bm{w}_R, \bm{x}_R) and 2/2ill2^{\ell/2−i}\mathfrak{ll} mults for eq~(wL,xL)\widetilde{\mathsf{eq}}(\bm{w}_L, \bm{x}_L). With memoization, these evaluations can be computed for all rounds ii in 2/2ll2^{\ell/2}\mathfrak{ll} multiplications total. The (negligible) downside is that we need to compute an extra 2/2ll2^{\ell/2}\mathfrak{ll} multiplications when computing the outer sum over xR\bm{x}_R.

Using the same process for cost analysis as in Lemma 3.2, we have the following cost breakdown for Algorithm 5.

Lemma 5.1. Algorithm 5’s dominant cost is d(d+1)/2N lld(d + 1)/2 · N\ \mathfrak{ll}, reducing from Algorithm 1’s cost in the same setting by roughly N llN\ \mathfrak{ll}.

Algorithm 6: Combining SVO with Eq-Poly optimization

Our starting point is Equation (17) in the exposition of Algorithm 5. We then apply the multivariate Lagrange decomposition formula in Equation (14) to the evaluations k=1dpk(r,u,xL,xR)\prod_{k=1}^d p_k(\bm{r}, u, \bm{x}_L, \bm{x}_R), which gives:

ti(u)=xR{0,1}/2ieq~(wR,xR)xL{0,1}/2eq~(wL,xL)vUdi1j=1i1LUd,vj(rj)k=1dpk(v,u,xL,xR)=j=1i1(LUd,v(rj))vUd,(Ai(v,u))vUdi1\begin{aligned} t_i(u)&=\sum_{\bm{x}_R\in\{0,1\}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}_R,\bm{x}_R)\cdot\sum_{\bm{x}_L\in\{0,1\}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}_L,\bm{x}_L)\cdot \\ &\cdot \sum_{\bm{v}\in U_d^{i-1}}\prod_{j=1}^{i-1}\mathcal{L}_{U_d,\bm{v}_j}(r_j)\cdot\prod_{k=1}^d p_k(\bm{v},u,\bm{x}_L,\bm{x}_R) \\ &=\Braket{\bigotimes_{j=1}^{i-1}\big(\mathcal{L}_{U_d,v}(r_j)\big)_{v\in U_d}, \bigg(\mathsf{A}_i(\bm{v},u)\bigg)_{\bm{v}\in U_d^{i−1}}} \end{aligned}

where the accumulators Ai(v,u)\mathsf{A}_i(\bm{v}, u) are now of the form:

Ai(v,u)=xR{0,1}/2ieq~(wR,xR)xL{0,1}/2eq~(wL,xL)k=1dpk(v,u,xL,xR)\mathsf{A}_i(\bm{v}, u)=\sum_{\bm{x}_R\in\{0,1\}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}_R,\bm{x}_R)\sum_{\bm{x}_L\in\{0,1\}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}_L,\bm{x}_L)\prod_{k=1}^d p_k(\bm{v},u,\bm{x}_L,\bm{x}_R)

Unlike the small-value only case, here because of the presence of the eq~(w,)\widetilde{\mathsf{eq}}(\bm{w}, \cdot) terms, we need to use sl\mathfrak{sl} (instead of ss\mathfrak{ss}) multiplications in computing the accumulators, and the additions are over the big field as well. We also note that we have swapped the lengths of the vectors xR\bm{x}_R and xL\bm{x}_L in the accumulator formula. This is so that we can compute the same inner sum across the accumulators over all rounds, and we can dispatch these inner sums to different accumulators in different rounds afterwards, with negligible cost. See Section A.4 from the paper for more details.

circle-info

Pre-computation: Use Procedure 9 from the paper to compute the accumulators:

Ai(v,u)=xR{0,1}/2ieq~(wR,xR)xL{0,1}/2eq~(wL,xL)k=1dpk(v,u,xL,xR)\mathsf{A}_i(\bm{v}, u)=\sum_{\bm{x}_R\in\{0,1\}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}_R,\bm{x}_R)\sum_{\bm{x}_L\in\{0,1\}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}_L,\bm{x}_L)\prod_{k=1}^d p_k(\bm{v},u,\bm{x}_L,\bm{x}_R)

for all i[1,0],vUdi1i\in[1,\ell_0],\bm{v}\in U_d^{i-1}, and uU^du\in \widehat{U}_d.

Initialize: R1=(1)\mathsf{R}_1=(1).

For each round i=1,,0i=1,\dots,\ell_0:

  1. For uU^du\in\widehat{U}_d, compute ti(u)t_i(u) using the formula

ti(u):=v[0,d]i1Ri[natd+1(v)]Ai(v,u)t_i(u):=\sum_{\bm{v}\in [0,d]^{i-1}}\mathsf{R}_i[\mathsf{nat}_{d+1}(\bm{v})]\cdot \mathsf{A}_i(\bm{v},u)
  1. Use Procedure 8 form the paper to compute si(X)s_i(X) from ({ti(u)},w,r[1,i1])\big(\{t_i(u)\},\bm{w},\bm{r}_{[1,i-1]}\big)

  2. Send evaluations of sis_i over U^d\widehat{U}_d

  3. Receive challenge riFcr_i\in\mathbb{F}_c from the verifier.

  4. Compute Ri+1=Ri(LUd,k(rj))k=0d\mathsf{R}_{i+1}=\mathsf{R}_i \otimes(\mathcal{L}_{U_d,k}(r_j))^{d}_{k=0}.

For round 0+1\ell_0+1: Follow Algorithm 2, storing the evaluations pk(r[1,01],b,x)p_k(\bm{r}_{[1,\ell_0-1]},b,\bm{x}') for all k[1,d],b{0,1}k\in[1,d],b\in\{0,1\}, and x{0,1}0\bm{x}'\in\{0,1\}^{\ell-\ell_0}.

Receive challenge r+1Fcr_{\ell+1}\in\mathbb{F}_c from the verifier, and perform updates to get pk(r[1,0],x)p_k(\bm{r}_{[1,\ell_0]},\bm{x}') for all kk and x\bm{x}', similar to Algorithm 1.

For rounds +2,,\ell+2,\dots,\ell: Follow Algorithm 5.

Cost Analysis. Besides the extra subtleties as noted above, the cost estimate of Algorithm 6 is fairly straightforward and follows directly from combining the estimates of Algorithm 4 and Algorithm 5. In particular, the small-value precomputation process is now dominated by (d+1)020sl(d + 1)^{\ell_0} · 2^{\ell−\ell_0}\mathfrak{sl}. Next, the streaming round takes roughly d2sld · 2^\ell \mathfrak{sl}, and the rest of the linear-time sum-check rounds cost roughly d2201lld^2 · 2^{\ell−\ell_0−1}\mathfrak{ll}. We thus summarize as follows.

Theorem 6.1. Algorithm 6’s cost is dominated by ((d+12)0+d)N sl((\frac{d+1}{2})^{\ell_0} + d) N\ \mathfrak{sl} and d2/20+1N lld^2/2^{\ell_0+1}N\ \mathfrak{ll}.

Summary

Multiplications
Space Complexity

Algorithm 1

d21sl+d2lld · 2^{ℓ−1}\mathfrak{sl}+d\cdot 2^{\ell}\mathfrak{ll}

d2d · 2^{\ell}

Algorithm 2

d021sl+d2lld · \ell_0 \cdot 2^{\ell-1} \mathfrak{sl}+d\cdot 2^{\ell}\mathfrak{ll}

d20d · 2^{\ell−\ell_0}

Algorithm 3

(d1)2d020ss+d2sl+d2201ll(d − 1) · 2^{d\ell_0} · 2^{\ell−\ell_0} \mathfrak{ss} +d · 2^\ell \mathfrak{sl}+d^2 · 2^{\ell−\ell_0−1}\mathfrak{ll}

d20d · 2^{\ell−\ell_0}

Algorithm 4

(d1)(d+1)020ss+d2sl+d2201ll(d − 1) · (d + 1)^{\ell_0} · 2^{\ell−\ell_0}\mathfrak{ss}+d · 2^\ell \mathfrak{sl}+d^2 · 2^{\ell−\ell_0−1}\mathfrak{ll}

d20d · 2^{\ell-\ell_0}

Algorithm 5

d(d+1)/2N lld(d + 1)/2 · N\ \mathfrak{ll}

d2d · 2^{\ell}

Algorithm 6

((d+12)0+d)N sl+d2/20+1N ll((\frac{d+1}{2})^{\ell_0} + d) N\ \mathfrak{sl} + d^2/2^{\ell_0+1}N\ \mathfrak{ll}

d20d · 2^{\ell-\ell_0}

References:

Written by Batzorig Zorigoo of Fractalyze

Last updated