# Speeding Up Sumcheck

The sum-check protocol is a fundamental tool in the design of modern succinct interactive proofs. It allows us to verify claims of the form

$$
\sum\_{\bm{x}\in{0,1}^\ell}g(\bm{x})=C\_0
$$

for a $$\ell$$-variate polynomial $$g$$ of degree at most $$d$$ in each variable by checking the evaluation of $$g$$ at a random point with a single oracle query at random challenge $$\bm{r}\in\mathbb{F}^\ell$$:

$$
g(\bm{r}) = v
$$

### Recap of the Protocol

**Setup**: Assume that the verifier has oracle access to the polynomial $$g$$, and aims to verify:

$$
\sum\_{\bm{x}\in{0,1}^\ell}g(\bm{x})=C\_0
$$

**For each round** $$i = 1, \dots,\ell$$**:**

1. $$\mathcal{P}$$ sends the univariate polynomial $$s\_i(X)$$ claimed to equal:

$$
\sum\_{(x\_{i+1},...,x\_{\ell})\in{0,1}^{\ell−i}} g(r\_1,\dots, r\_{i−1}, X, x\_{i+1}, . . . , x\_\ell)
$$

&#x20;    $$\mathcal{P}$$ does so by sending the evaluations $${s\_i(u) : u \in \widehat{U}\_d}$$ to $$\mathcal{V}$$.&#x20;

{% hint style="info" %}
Here, the original protocol typically evaluates from some evaluation domain $$U\_d$$ but we can take $$\widehat{U}\_d=U\_d/{1}$$ and since verifier can calculate $$s\_i(1)$$ using other evaluations.
{% endhint %}

2. $$\mathcal{V}$$ sends a random challenge $$r\_i \leftarrow \mathbb{F}$$ to $$\mathcal{P}$$.
3. $$\mathcal{V}$$ derives $$s\_i(1) := C\_{i−1} − s\_i(0)$$, then sets $$C\_i := s\_i(r\_i)$$. After $$\ell$$ rounds, we reduce to the claim that $$g(r\_1,\dots,r\_{\ell}) = C\_\ell$$. $$\mathcal{V}$$ checks this claim by making a single oracle query to $$g$$.

Typically, the initial polynomial involve "small" values (e.g., from a base field $$\mathbb{F}*p$$), but for security, the protocol's challenges are drawn from a much larger possible values (e.g., from a extension field $$\mathbb{F}*{p^k}$$). Below, I will denote this challenge field as $$\mathbb{F}\_c$$. This creates a significant performance gap between different types of multiplication:

* $$\mathfrak{ss}$$(small-small): Base field by base field. **Very fast.**
* $$\mathfrak{sl}$$ (small-large): Base field by extension field. **Fast.**
* $$\mathfrak{ll}$$ (large-large): Extension field by extension field. **Very slow.**

Therefore, the one of the optimization idea is reducing the number of $$\mathfrak{ll}$$ multiplications.

### Existing Algorithms

Consider the case where the polynomial $$g(x)$$ is given as

$$
g(X) = p\_1(X) \times p\_2(X) \times \cdots \times p\_d(X)=\prod^d\_{k=1}p\_k(X) \tag{9}
$$

where $$p\_1, . . . , p\_d$$ are $$d$$ multilinear polynomials in $$\ell$$ variables such that $$p\_i(\bm{x})$$ has small values for all $$i \in {1, . . . , d}$$ and $$\bm{x} \in {0, 1}^\ell$$.

#### Algorithm 1:

The prover maintains $$d$$ arrays $$P\_1,\dots, P\_d$$ which initially store all evaluations of $$p\_1, . . . , p\_d$$ over $${0, 1}^\ell$$, e.g., we have $$P\_k\[\bm{x}] = p\_k(\bm{x})$$ for all $$k \in \[1,d]$$ and $$\bm{x}\in{0, 1}^\ell$$. The prover halves the size of the arrays after each round $$i$$, via “binding” the arrays $${P\_k}\_{k\in\[0,d]}$$ to the challenge $$r\_i$$ received from the verifier.

{% hint style="info" %}
**Setup:** arrays $$P\_1^{(0)},\dots,P\_d^{(0)}$$ of size $$2^\ell$$, corresponding to evaluations of each polynomial over the boolean hypercube $${0,1}^\ell$$

**For each round** $$i = 1, \dots,\ell$$**:**

1. For $$u \in \widehat{U}\_d$$, compute $$s\_i(u)$$ using the formula:

$$
s\_i(u) := \sum\_{\bm{x}'\in{0,1}^{\ell−i}} \prod^d\_{k=1}\bigg(\big(P^{(i)}\_k\[1,\bm{x}'] − P^{(i)}\_k \[0,\bm{x}']\big)\cdot u + P^{(i)}\_k\[0, \bm{x}'] \bigg)
$$

2. Send evaluations $${s\_i(u) : u \in \widehat{U}\_d}$$ to the verifier.
3. Receive challenge $$r\_i \in \mathbb{F}\_c$$ from the verifier.
4. For $$k = 1, . . . , d$$ and $$\bm{x}' \in {0, 1}^{\ell−i}$$, update arrays:

$$
P^{(i+1)}\_k\[\bm{x}']:=(P^{(i)}\_k\[1, \bm{x}'] − P^{(i)}\_k\[0,\bm{x}'])\cdot r\_i + P^{(i)}\_k\[0,\bm{x}']
$$
{% endhint %}

{% hint style="warning" %}
**Cost Analysis:**&#x20;

1. In round 1, since the initial evaluations $${p\_k(x) : k \in \[0,d], \bm{x} \in {0, 1}^{\ell}}$$ are small, all multiplications are $$\mathfrak{ss}$$. Since we need to compute $$d$$ evaluations $${s\_1(u) : u \in \widehat{U}\_d}$$, and each evaluation is a sum over $$2^{\ell−1}$$ products of $$d$$ small values, the total cost is $$d · (d − 1) · 2^{\ell-1}$$ $$\mathfrak{ss}$$ multiplications. We then take $$d · 2 ^{\ell−1}$$ $$\mathfrak{sl}$$ multiplications for updating the arrays.
2. From round $$i \geq 2$$, all array values are large,  so we incur $$d · (d − 1) · 2^{\ell −i}$$ $$\mathfrak{ll}$$ multiplications for computing all evaluations, and $$d · 2^{\ell−i}$$ $$\mathfrak{ll}$$ multiplications for updating the arrays.
   {% endhint %}

**Lemma**. When the evaluations of $$p\_1, . . . , p\_d$$ are small, Algorithm 1’s dominant cost is $$d\cdot 2\cdot 2^{\ell -1}\mathfrak{ll}$$\
multiplications, followed by $$d · 2^{ℓ−1}\mathfrak{sl}$$ multiplications, $$d(d − 1) · 2^{\ell-1} \mathfrak{ss}$$, and $$O(n)$$ field additions.

#### Algorithm 2: Quasilinear Time and Square-Root Space

The main idea is to avoid updating the arrays $$P\_1,\dots, P\_d$$ in each round $$i  \leq \ell/2$$, and instead compute\
$$\prod^d\_{k=1} p\_k(r\_1,\dots, r\_{i−1}, u, x')$$ directly using the following lemma.

**Lemma**. Let $$p: \mathbb{F}^\ell \rightarrow \mathbb{F}$$ be a multilinear polynomial. Then for any $$0 \leq i \leq \ell$$, any $$(r\_1, . . . , r\_i) \in \mathbb{F}\_c^i$$,\
and any $$x' ∈ {0, 1}^{\ell-i}$$,

$$
p(\bm{r}*{\[1,i]}, \bm{x}'
) = \sum*{\bm{y}\in{0,1}^i}
\widetilde{\sf{eq}}(\bm{r}\_{\[1,i]}, \bm{y})\cdot p(\bm{y}, \bm{x}')
$$

With this formula, we can calculate $$s\_i(u)$$ as follows:

$$
\begin{aligned}
s\_i(u)&=\sum\_{\bm{x}'\in{0,1}^{\ell-i}}\prod^d\_{k=1} p\_k(\bm{r}*{\[1,i-1]},u,\bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\prod\_{k=1}^d\sum\_{\bm{y}\in{0,1}^i}\widetilde{\sf{eq}}((\bm{r}*{\[1,i-1]},u), \bm{y})\cdot p\_k(\bm{y}, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\prod\_{k=1}^d\sum\_{\bm{y}'\in{0,1}^{i-1}}\widetilde{\sf{eq}}((\bm{r}*{\[1,i-1]},u), (\bm{y}',u))\cdot p\_k(\bm{y}',u, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\prod\_{k=1}^d\sum\_{\bm{y}'\in{0,1}^{i-1}}\widetilde{\sf{eq}}(\bm{r}\_{\[1,i-1]}, \bm{y}')\cdot p(\bm{y}',u, \bm{x}')\\
\end{aligned}
$$

After $$\ell/2$$ rounds, as both the time and space needed for the $$\widetilde{\sf{eq}}$$ evaluations goes past $$2^{\ell/2}$$, we will switch to Algorithm 1, which incurs a negligible cost of $$d^2\cdot2^{ℓ/2}$$ $$\mathfrak{ll}$$. Note that in practice, one could switch as soon as the prover is no longer space-constrained, which may happen well before round $$\ell/2$$.

{% hint style="info" %}
**For each round** $$i=1,\dots,\ell/2$$:

1. For $$u\in\widehat{U}\_d$$, compute $$s\_i(u)$$ using the formula:

$$
s\_i(u)=\sum\_{\bm{x}'\in{0,1}^{\ell-i}}\prod\_{k=1}^d\sum\_{\bm{y}'\in{0,1}^{i-1}}\widetilde{\sf{eq}}(\bm{r}\_{\[1,i-1]}, \bm{y}')\cdot p\_k(\bm{y}',u, \bm{x}') \tag{10}
$$

2. Send evaluations $${s\_i(u) : u \in \widehat{U}\_d}$$ to the verifier.
3. Receive challenge $$r\_i \in \mathbb{F}\_c$$ from the verifier.

**During round** $$i=\ell/2$$:&#x20;

1. Store the evaluations of $$p\_k(\bm{r}\_{\[1,\ell/2]},b,\bm{x}')$$ for all $$k=1,\dots,d$$, $$b\in{0,1}$$ and $$\bm{x}'\in{0,1}^{\ell/2}$$, obtained while computing round $$\ell/2$$.

**For the leftover rounds:**

1. Follow algorithm 1.
   {% endhint %}

{% hint style="warning" %}
**Cost Analysis**:

1. Inner sum has $$2^{i-1}$$ $$\mathfrak{sl}$$ multiplications between the $$\widetilde{\sf{eq}}$$ terms and the evaluations $$p\_k(\bm{y}', u, \bm{x}')$$.
2. $$(d − 1)$$ $$\mathfrak{ll}$$ multiplications for the product $$\prod^d\_{k=1} p\_k(\bm{r}\_{\[1,i−1]}, u, \bm{x}')$$.
3. Outer summation then gives us, $$2^{l-i}\cdot(2^{i-1}\mathfrak{sl} + (d-1)\mathfrak{ll})=2^{l-1}\mathfrak{sl}+2^{l-i}(d-1)\mathfrak{ll}$$ multiplications for each $$s\_i(u)$$ evaluation and we need to evaluate over $$d$$ points.
4. &#x20;Besides these, we have lower-order costs of $$d(d − 1)· 2^{\ell−1}$$ $$\mathfrak{ss}$$ multiplications to compute the evaluations $${p\_k(\bm{y}', u, \bm{x}')}*{k,\bm{y}',u,\bm{x}'}$$ and $$2^{i−1}$$$$\mathfrak{ll}$$ multiplications to compute the evaluations $$\big{\widetilde{\sf{eq}}(\bm{r}*{\[0,i-1]} , \bm{y}')\big}\_{\bm{y}'\in{0,1}^{i−1}}$$ and store them in an array.
   {% endhint %}

**Lemma**. When the evaluations of $$p\_1,\dots, p\_d$$ are small, Algorithm 2’s cost is dominated by $$d · \ell\_0 \cdot 2^{\ell-1} \mathfrak{sl}$$ multiplications (spread over the first l/2 rounds), and $$d\cdot(d − 1) · 2^\ell \mathfrak{ll}$$ multiplications. The lower-order costs include $$d\cdot \ell\_0\cdot(d − 1)\cdot2^{l−1} \mathfrak{ss}$$ multiplications and $$O(n)$$ field additions.

#### Algorithm 3: Warm-up Attempt

In the Algorithm 1, the prover performs exclusively $$\mathfrak{ll}$$ multiplications starting from round 2, due to the need to bind the multilinear polynomials after round 1. If we want to reduce the number of $$\mathfrak{ll}$$ multiplications, we must delay this binding process. A natural attempt at this leads to Algorithm 3, which is typically faster than Algorithm 1 in the first few rounds, and uses significantly less memory.

If we rearrange the Equation (10), we can get:

$$
\begin{aligned}
s\_i(u)&=\sum\_{\bm{x}'\in{0,1}^{\ell-i}}\prod\_{k=1}^d\sum\_{\bm{y}'\in{0,1}^{i-1}}\widetilde{\sf{eq}}(\bm{r}*{\[1,i-1]}, \bm{y}')\cdot p\_k(\bm{y}',u, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum\_{\bm{y}'*1,\dots,\bm{y}'*d\in{0,1}^{i-1}}\prod*{k=1}^d\widetilde{\sf{eq}}(\bm{r}*{\[1,i-1]}, \bm{y}'*k)\cdot p\_k(\bm{y}'*k,u, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum*{\bm{y}'*1,\dots,\bm{y}'*d\in{0,1}^{i-1}}\prod*{k=1}^d\widetilde{\sf{eq}}(\bm{r}*{\[1,i-1]}, \bm{y}'*k)\cdot \prod*{k=1}^dp\_k(\bm{y}'*k,u, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum\_{\bm{y}'*1,\dots,\bm{y}'*d\in{0,1}^{i-1}}\prod*{k=1}^d\bigg(\prod*{j=1}^{i-1}(1-r\_j)^{\bm{y}'*k\[j]}\cdot r\_j^{1-\bm{y}'*k\[j]} \bigg)\cdot\prod*{k=1}^dp\_k(\bm{y}'*k,u, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum*{\bm{y}'\_1,\dots,\bm{y}'*d\in{0,1}^{i-1}}\prod*{j=1}^{i-1}\bigg((1-r\_j)^{\sum\bm{y}'\_k\[j]}\cdot r\_j^{d-\sum\bm{y}'*k\[j]}\bigg)\cdot\prod*{k=1}^dp\_k(\bm{y}'*k,u, \bm{x}')\\
&=\sum*{\bm{y}'*1,\dots,\bm{y}'*d\in{0,1}^{i-1}}\sum*{\bm{x}'\in{0,1}^{\ell-i}}\prod*{j=1}^{i-1}\bigg((1-r\_j)^{\sum\bm{y}'\_k\[j]}\cdot r\_j^{d-\sum\bm{y}'*k\[j]}\bigg)\cdot\prod*{k=1}^dp\_k(\bm{y}'*k,u, \bm{x}')\\
&=\sum*{\bm{y}'\_1,\dots,\bm{y}'*d\in{0,1}^{i-1}}\prod*{j=1}^{i-1}\bigg((1-r\_j)^{\sum\bm{y}'*k\[j]}\cdot r\_j^{d-\sum\bm{y}'*k\[j]}\bigg)\cdot\sum*{\bm{x}'\in{0,1}^{\ell-i}}\prod*{k=1}^dp\_k(\bm{y}'\_k,u, \bm{x}')\\
\end{aligned}
$$

1. Exchanging product with the inner sum
2. Distribute $$\prod$$
3. Expand eq
4. Switch inner $$\prod$$ with outer $$\prod$$
5. Switch the order of outer sums
6. $$\widetilde{\mathsf{eq}}$$ part is invariant over $$\bm{x}'$$ so we can factor it out.

At this point, we have rewritten the computation of $$s\_i(u)$$ as an inner product between two vectors of length $$2^{d(i−1)}$$ (indexed by $$\bm{y}'\_1, . . . , \bm{y}'*d$$): one dependent on the challenges $$r\_1, . . . , r*{i−1}$$, and the other dependent on the multilinear polynomials $$p\_1, . . . , p\_d$$.

We can do slightly better by noticing that the challenge-dependent terms only depend on $$\bm{v} \in {\sum\_{k=1}^d \bm{y}'*k\[j]}*{j\in\[1,i−1]}$$ and not on the individual $${\bm{y}'*k}*{k\in\[1,d]}$$—there are only $$(d + 1)^{i−1}$$ distinct such terms. By re-indexing based on $$\bm{v} \in \[0, d]^{i−1}$$, we can further rewrite:

$$
\begin{aligned}
s\_i(u)&=\sum\_{\bm{y}'*1,\dots,\bm{y}'*d\in{0,1}^{i-1}}\prod*{j=1}^{i-1}\bigg((1-r\_j)^{\sum\bm{y}'*k\[j]}\cdot r\_j^{d-\sum\bm{y}'*k\[j]}\bigg)\cdot\sum*{\bm{x}'\in{0,1}^{\ell-i}}\prod*{k=1}^dp\_k(\bm{y}'*k,u, \bm{x}')\\
&=\sum*{\bm{v}\in{0,1\dots,d}^{i-1}}\prod*{j=1}^{i-1}\bigg((1-r\_j)^{\bm{v}*j}\cdot r\_j^{d-\bm{v}*j}\bigg)\cdot\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum*{\bm{y}'\_1,\dots,\bm{y}'\_d\in{0,1}^{i-1}; \bm{v}=\sum \bm{y}*k}\prod*{k=1}^dp\_k(\bm{y}'\_k,u, \bm{x}')
\end{aligned}
$$

Notice that above sum over the hypercube $$\bm{v}\in{0,\dots,d}^{i-1}$$ can be thought of as a inner product of a two vector of size $$(d+1)^{i-1}$$:&#x20;

$$
\Braket{\bigotimes\_{j=1}^{i-1}\bigg((1-r\_j)^{\bm{v}*j}\cdot r\_j^{d-\bm{v}*j}\bigg)*{v\_j=0}^d, \bigg(\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum\_{\substack{\bm{y}'\_1,\dots,\bm{y}'\_d\in{0,1}^{i-1}\ \bm{v}=\sum \bm{y}*k}}\prod*{k=1}^dp\_k(\bm{y}'*k,u, \bm{x}')\bigg)*{\bm{v}\in\[0,d]^{i−1}}}
$$

{% hint style="info" %}
$$(1-r\_j)^{\bm{v}\_j}\cdot r\_j^{d-\bm{v}\_j}$$: this can have $$d$$ different values depending on the value of $$v\_j$$ which can range from $$0$$ to $$d$$. If we store them in a vector, we can denote as:

$$
\bigg((1-r\_j)^{\bm{v}\_j}\cdot r\_j^{d-\bm{v}*j}\bigg)*{v\_j=0}^d
$$

The outer sum and product is then taking product of all possible $$i-1$$ combinations of entries within this vector and summing them. These product combinations can be thought of as [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product):

$$
\bigotimes\_{j=1}^{i-1}\bigg((1-r\_j)^{\bm{v}\_j}\cdot r\_j^{d-\bm{v}*j}\bigg)*{v\_j=0}^d
$$
{% endhint %}

{% hint style="info" %}
Let us denote the right hand side of the inner product as:

$$
\mathsf{A}*i(\bm{v},u)=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum\_{\substack{\bm{y}'\_1,\dots,\bm{y}'\_d\in{0,1}^{i-1}\ \bm{v}=\sum \bm{y}*k}}\prod*{k=1}^dp\_k(\bm{y}'\_k,u, \bm{x}')
$$

Then we have:

$$
s\_i(u)=\Braket{\bigotimes\_{j=1}^{i-1}\bigg((1-r\_j)^{\bm{v}\_j}\cdot r\_j^{d-\bm{v}*j}\bigg)*{v\_j=0}^d, \bigg(\mathsf{A}*i(\bm{v},u)\bigg)*{\bm{v}\in\[0,d]^{i−1}}}
$$
{% endhint %}

Since the accumulators $$\mathsf{A}*i(\bm{v}, u)$$ do not depend on the challenges $$r\_1, . . . , r*{i−1}$$, the prover can precompute them before the protocol begins. For each of these, we don't have to recompute $${p\_k(\bm{y}'*k, u, \bm{x}')}*{k\in\[1,d]}$$ and instead reuse these across them. For this, we can do multilinear extension on $$p\_k$$ over variable $$u$$ to get:

$$
\begin{aligned}
\mathsf{A}*i(\bm{v},u)&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum\_{\bm{y}'\_1,\dots,\bm{y}'*d\in{0,1}^{i-1}; \bm{v}=\sum \bm{y}*k}\prod*{k=1}^dp\_k(\bm{y}'*k,u, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum*{\bm{y}'\_1,\dots,\bm{y}'*d\in{0,1}^{i-1}; \bm{v}=\sum \bm{y}*k}\prod*{k=1}^d\sum*{y^*\_k\in{0,1}}u^{y^**k}(1-u)^{1-y^*\_k}p\_k(\bm{y}'\_k,y^**k, \bm{x}')\\
&=\sum*{\bm{x}'\in{0,1}^{\ell-i}}\sum*{\bm{y}'\_1,\dots,\bm{y}'\_d\in{0,1}^{i-1}; \bm{v}=\sum \bm{y}*k}\sum*{{y^*\_1, \dots, y^*\_d}\in{0,1}}u^{\sum y^*\_j}(u-1)^{d-\sum y^**j}\cdot\prod*{k=1}^dp\_k(\bm{y}'\_k,y^\*\_k, \bm{x}')
\end{aligned}
$$

and now we only need to evaluations of $$p\_k$$ over the boolean hypercube.

**Cost Analysis.** Algorithm 3 uses $$O((d + 1)^{\ell\_0} )$$ space for rounds $$1$$ to $$\ell\_0$$ (accumulators and challenge vectors) and $$d · 2^{\ell−\ell\_0}$$ space thereafter (cached results). Since $$(d + 1)^{\ell\_0} \ll d · 2^{\ell−\ell\_0}$$ in practice, the overall space complexity is dominated by the latter term, representing a space saving factor of roughly $$2^{\ell\_0}$$ compared to Algorithm 1.&#x20;

Regarding runtime, Algorithm 3 is bottlenecked by the $$(d − 1) · 2^{d\ell\_0} · 2^{\ell−\ell\_0} \mathfrak{ss}$$ multiplications needed to compute the products $${\prod\_{k=1}^d p\_k(\bm{y}'*k, \bm{x}')}*{\bm{y},\bm{x}'}$$. We also incur roughly $$d · 2^\ell \mathfrak{sl}$$ mults when using Algorithm 2 for the $$(\ell\_0 + 1)$$-th round, and roughly $$d^2 · 2^{\ell−\ell\_0−1}\mathfrak{ll}$$ mults for the remaining rounds using Algorithm 1.

#### Algorithm 4: Optimizing with Toom-Cook Multiplication

While Algorithm 3 almost eliminates $$\mathfrak{ll}$$ multiplications for the first few rounds and slashes memory usage, we still end up performing a lot of $$\mathfrak{ss}$$ multiplications (namely, by a factor of $$\approx 2^{(d−1)·\ell\_0}$$ compared to the number of $$\mathfrak{ll}$$ mults in Algorithm 1). To reduce this cost, we need another way to rewrite the products $$\prod\_{k=1}^d p\_k(\bm{r}\_{\[1,i-1]},u, \bm{x}')$$.

The idea is to see the product as the evaluation of the polynomial:

$$
F(Y\_1, . . . , Y\_{i−1}) = \prod\_{k=1}^d p\_k(Y\_1, . . . , Y\_{i−1}, u, \bm{x}')
$$

at the challenge point $$\bm{r}\_{\[1,i-1]}$$.

Since $$F$$ has individual degree $$d$$, we can apply Lagrange interpolation to each variable to rewrite $$F(Y\_1,\dots, Y\_{i−1})$$ as

$$
\sum\_{v\_1\in U\_d}\mathcal{L}*{U\_d,v\_1}(Y\_1)\dots\sum*{v\_{i-1}\in U\_d}\mathcal{L}*{U\_d,v*{i-1}}(Y\_{i-1})\prod\_{k=1}^dp\_k(\bm{v},u,\bm{x}')\tag{14}
$$

As a result, when we plug in the evaluation $$(r\_1, \dots , r\_{i−1})$$, we can rewrite the $$i$$-th sum-check polynomial $$s\_i(X)$$ as:

$$
\begin{aligned}
s\_i(u)&=\sum\_{\bm{x}'\in{0,1}^{\ell-i}}\sum\_{\bm{v}\in U\_d}\prod\_{j=1}^{i-1}\mathcal{L}*{U\_d,\bm{v}*j}(Y*{i-1})\prod*{k=1}^dp\_k(\bm{v},u, \bm{x}')\\
&=\sum\_{\bm{v}\in U\_{d}^{i-1}}\prod\_{j=1}^{i-1}\mathcal{L}*{U\_d,\bm{v}*j}(r\_j)\cdot\sum*{\bm{x}'\in{0,1}^{\ell-i}}\prod*{k=1}^d p\_k(\bm{v},u,\bm{x}')
\end{aligned}
$$

then we have:

$$
s\_i(u)=\Braket{\bigotimes\_{j=1}^{i-1}\big(\mathcal{L}*{U\_d,v\_j}(r\_j)\big)*{v\_j\in U\_d}, \bigg(\mathsf{A}*i(\bm{v},u)\bigg)*{\bm{v}\in U\_d^{i−1}}}
$$

Just like with Algorithm 3, we can precompute the accumulators $$\mathsf{A}\_i(\bm{v}, u)$$ by computing each of the $$\approx (d + 1)^{\ell\_0}$$ product

$$
\bigg{\prod\_{k=1}^d p\_k(\bm{v},u,\prod{x}'):\bm{v}\bigg}
$$

over $$\bm{v}\in U\_d^{i-1}$$, $$u \in \widehat{U}\_d$$ and $$\bm{x}'\in{0,1}^{\ell-i}$$ once, then inserting the result into the appropriate $$\mathsf{A}\_i(\bm{v}, u)$$ terms over all $$i\in \[1,\ell\_0]$$ rounds. For details, see Section A.3.

**Cost Analysis**. Similar to Algorithm 3, Algorithm 4 also saves a factor of $$2^{l\_0}$$ in space. Unlike Algorithm 3, however, it incurs only $$(d − 1) · (d + 1)^{\ell\_0} · 2^{\ell−\ell\_0}\mathfrak{ss}$$, instead of $$(d − 1) · 2^{d\cdot \ell\_0} · 2^{\ell−\ell\_0}$$. This factor can be roughly 5 times smaller for common parameters (e.g. $$d = 2$$ and $$\ell\_0 = 5$$). The cost of $$\mathfrak{sl}$$ and $$\mathfrak{ll}$$ are the same as in Algorithm 3.

### Optimizations for Sum-Check involving EqPoly

In this section, the optimization only applies to sum-check where $$g$$ is of the form:

$$
\sum\_{\bm{x}\in{0,1}^\ell}g(\bm{x})=\sum\_{\bm{x}\in{0,1}^\ell}\widetilde{\mathsf{eq}}(\bm{w},\bm{x})\cdot\prod\_{k=1}^dp\_k(\bm{x})
$$

with $$w \in \mathbb{F}*c^\ell$$ a vector of verifier’s challenges, and $$p\_1(X), \dots, p\_d(X)$$ are multilinear polynomials. (e.g $$Q*{\bm{io}}$$ zero-check in Spartan).

#### Gruen's Optimization

The main idea of this optimization by Gruen et al. is to reduce the computation of the degree-$$(d + 1)$$ polynomial $$s\_i(X)$$ sent by the prover in each round $$i$$, to the computation of a degree-$$d$$ factor $$t\_i(X)$$ and a linear factor $$\mathsf{l}\_i(X)$$. This way, the linear term is “free” to compute, and the prover saves an evaluation when computing $$t\_i(X)$$ instead of $$s\_i(X)$$.&#x20;

This is possible due to the special structure of the $$\widetilde{\mathsf{eq}}$$ polynomial, which for every $$i \in \[1,\ell]$$ can be decomposed as:

$$
\widetilde{\mathsf{eq}}(\bm{w},(\bm{r}*{\[1,i-1]},X,\bm{x}'))=\widetilde{\mathsf{eq}}(\bm{w}*{\[1,i-1]},\bm{r}*{\[1,i-1]})\cdot\widetilde{\mathsf{eq}}(w\_i,X)\cdot\widetilde{\mathsf{eq}}(\bm{w}*{\[i+1,\ell]},\bm{x}')
$$

If we adapt the original definition for $$s\_i(u)$$ to this case, we get:

$$
s\_i(u) := \sum\_{x'\in{0,1}^{\ell−i}}\widetilde{\mathsf{eq}}(\bm{w},\bm{r}*{\[1,i-1]},u,\bm{x}') \prod^d*{k=1}\bigg(\big(P^{(i)}\_k\[1, x'] − P^{(i)}\_k \[0,x']\big)\cdot u + P^{(i)}\_k\[0, x'] \bigg)
$$

Thus, if we set $$\mathsf{l}*i(X):=\widetilde{\mathsf{eq}}(\bm{w}*{\[1,i-1]},\bm{r}\_{\[1,i-1]})\cdot\widetilde{\mathsf{eq}}(w\_i,X)$$ and

$$
t\_i(X):=\sum\_{\bm{x}'\in{0,1}^{\ell-i}}\widetilde{\mathsf{eq}}(\bm{w}*{\[i+1,\ell]},\bm{x}')\cdot\prod*{k=1}^dp\_k(\bm{r}\_{\[1,i-1]},X,\bm{x}')
$$

then we have $$s\_i(X)=\mathsf{l}\_i(X)\cdot t\_i(X)$$.&#x20;

Using this fact, the sum-check prover now instead computes $$d$$ evaluations of $$t\_i(u)$$ at $$u\in\widehat{U}\_d$$, then rely on the equality $$s\_i(0) + s\_i(1)=C\_i$$ to compute:

$$
t\_i(1):=\mathsf{l}\_i(1)^{-1}\cdot(C\_i-\mathsf{l}\_i(0)\cdot t\_i(0))
$$

The prover now has $$d+1$$ evaluations of $$t\_i$$, and thus can interpolate $$t\_i(X)$$ and compute $$s\_i(X)$$. Note that computing $${t\_i(u):u\in\widehat{U}\_d}$$ can be done generically using either Algorithm 1 or 2; since our focus is on optimizing time rather than space, we assume that Algorithm 1 is used.

#### Algorithm 5

We can further reduce $$2^{\ell-1}\mathfrak{ll}$$ associated with the $$\widetilde{\mathsf{eq}}$$ polynomial. Our insight is that we can further split off the $$\widetilde{\mathsf{eq}}$$ terms for the first $$i<\ell/2$$ rounds, so that:

$$
t\_i(u)=\sum\_{\bm{x}\_R\in{0,1}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}\_R,\bm{x}*R)\cdot\sum*{\bm{x}\_L\in{0,1}^{\ell/2 -i}}\widetilde{\mathsf{eq}}(\bm{w}\_L,\bm{x}\_L)\cdot \tag{17}
$$

$$
\cdot \prod^{d}*{k=1}p\_k(\bm{r}*{\[1,i-1]},u,\bm{x}\_L,\bm{x}\_R)\tag{18}
$$

for all $$u\in\widehat{U}\_d$$, where $$\bm{w}*R=\bm{w}*{\[i+1,\ell/2]}$$ and $$\bm{w}*L=\bm{w}*{\[\ell/2 + 1,n]}$$. The sum-check prover now computes all inner sums over $$\bm{x}\_L$$, then all outer sums over $$\bm{x}\_R$$. For rounds $$i\geq \ell/2$$, we no longer have the inner sum, and proceed similarly to Algorithm 1. Note that we put the left part $$\widetilde{\mathsf{eq}}(\bm{w}\_L, \bm{x}\_L)$$ in the inner sum, since it leads to better memory locality in practice, assuming the evaluations of $$p\_k$$ are streamed in big-endian order (so that $$\bm{x}\_L$$ are low-order bits compared to $$\bm{x}\_R$$, which means their chunks are contiguous in memory).

{% hint style="info" %}
**Initialize:** Length-$$2^\ell$$ arrays $$P^{(0)}\_1,\dots, P^{(0)}\_d$$ such that&#x20;

$$
P^{(0)}\_k\[x] = p\_k(x) \text{ for all }k = 1,\dots,d \text{ and }\bm{x}\in{0, 1}^l
$$

**Pre-computation:** Use Procedure 3 from the paper to compute the evaluations:

$$
{\widetilde{\mathsf{eq}}(\bm{w}\_{\[1,i]},\bm{x}*L), \widetilde{\mathsf{eq}}(\bm{w}*{\[\ell/2,\ell/2+i]},\bm{x}\_R)}
$$

for all $$i\in\[1,\ell/2],\bm{x}\_L\in{0,1}^{\ell/2-i}$$, and $$\bm{x}\_R\in{0,1}^{\ell/2-i}$$.

**For each round** $$i=1,\dots,\ell$$**:**

1. For $$u\in\widehat{U}\_d$$, if $$i < \ell/2$$, compute $$t\_i(u)$$ using the formula

$$
\sum\_{\bm{x}\_R\in{0,1}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}\_R,\bm{x}*R)\cdot \sum*{\bm{x}\_L\in{0,1}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}\_L,\bm{x}*L)\cdot\prod*{k=1}^{d}\bigg(\big(P\_k^{(i)}\[1,\bm{x}\_L,\bm{x}\_R]-P\_k^{(i)}\[0,\bm{x}\_L,\bm{x}\_R]\big)\cdot u + P\_k^{(i)}\[0,\bm{x}\_L,\bm{x}\_R]\bigg)
$$

&#x20;      If $$i\geq\ell/2$$, compute $$t\_i(u)$$ using the formula:

$$
\sum\_{\bm{x}\_R\in{0,1}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}\_R,\bm{x}*R)\cdot\prod*{k=1}^{d}\bigg(\big(P\_k^{(i)}\[1,\bm{x}\_R]-P\_k^{(i)}\[0,\bm{x}\_R]\big)\cdot u + P\_k^{(i)}\[0,\bm{x}\_R]\bigg)
$$

2. Use Procedure 8 form the paper to compute $$s\_i(X)$$ from $$\big({t\_i(u)},\bm{w},\bm{r}\_{\[1,i-1]}\big)$$
3. Send evaluations of $$s\_i$$ over $$\widehat{U}\_d$$
4. Receive challenge $$r\_i\in\mathbb{F}$$ from the verifier.
5. For $$k \in \[1,d]$$ and $$\bm{x}' \in {0, 1}^{\ell−i}$$, update arrays:

$$
P^{(i+1)}\_k\[\bm{x}']:=(P^{(i)}\_k\[1, \bm{x}'] − P^{(i)}\_k\[0,\bm{x}'])\cdot r\_i + P^{(i)}\_k\[0,\bm{x}']
$$
{% endhint %}

**Cost Analysis.** Our optimization avoids computing the $$2^{\ell−i}$$-sized table of evaluations $${\widetilde{\mathsf{eq}}(\bm{w}\_{\[i+1,\ell]}, \bm{x}') : x' \in {0, 1}^{\ell−i}}$$ for each round $$i$$, and instead only compute two square-root-sized tables, which costs $$2^{\ell/2}\mathfrak{ll}$$ mults for $$\widetilde{\mathsf{eq}}(\bm{w}\_R, \bm{x}\_R)$$ and $$2^{\ell/2−i}\mathfrak{ll}$$ mults for $$\widetilde{\mathsf{eq}}(\bm{w}\_L, \bm{x}\_L)$$. With memoization, these evaluations can be computed for all rounds $$i$$ in $$2^{\ell/2}\mathfrak{ll}$$ multiplications total. The (negligible) downside is that we need to compute an extra $$2^{\ell/2}\mathfrak{ll}$$ multiplications when computing the outer sum over $$\bm{x}\_R$$.&#x20;

Using the same process for cost analysis as in Lemma 3.2, we have the following cost breakdown for Algorithm 5.

**Lemma 5.1.** Algorithm 5’s dominant cost is $$d(d + 1)/2 · N\ \mathfrak{ll}$$, reducing from Algorithm 1’s cost in the same setting by roughly $$N\ \mathfrak{ll}$$.

#### Algorithm 6: Combining SVO with Eq-Poly optimization

Our starting point is Equation (17) in the exposition of **Algorithm 5**. We then apply the multivariate Lagrange decomposition formula in Equation (14) to the evaluations $$\prod\_{k=1}^d p\_k(\bm{r}, u, \bm{x}\_L, \bm{x}\_R)$$, which gives:

$$
\begin{aligned}
t\_i(u)&=\sum\_{\bm{x}\_R\in{0,1}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}*R,\bm{x}*R)\cdot\sum*{\bm{x}*L\in{0,1}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}*L,\bm{x}*L)\cdot \\
&\cdot \sum*{\bm{v}\in U\_d^{i-1}}\prod*{j=1}^{i-1}\mathcal{L}*{U\_d,\bm{v}*j}(r\_j)\cdot\prod*{k=1}^d p\_k(\bm{v},u,\bm{x}*L,\bm{x}*R) \\
&=\Braket{\bigotimes*{j=1}^{i-1}\big(\mathcal{L}*{U\_d,v}(r\_j)\big)*{v\in U\_d}, \bigg(\mathsf{A}*i(\bm{v},u)\bigg)*{\bm{v}\in U\_d^{i−1}}}
\end{aligned}
$$

where the accumulators $$\mathsf{A}\_i(\bm{v}, u)$$ are now of the form:

$$
\mathsf{A}*i(\bm{v}, u)=\sum*{\bm{x}\_R\in{0,1}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}\_R,\bm{x}*R)\sum*{\bm{x}\_L\in{0,1}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}\_L,\bm{x}*L)\prod*{k=1}^d p\_k(\bm{v},u,\bm{x}\_L,\bm{x}\_R)
$$

Unlike the small-value only case, here because of the presence of the $$\widetilde{\mathsf{eq}}(\bm{w}, \cdot)$$ terms, we need to use $$\mathfrak{sl}$$ (instead of $$\mathfrak{ss}$$) multiplications in computing the accumulators, and the additions are over the big field as well. We also note that we have swapped the lengths of the vectors $$\bm{x}\_R$$ and $$\bm{x}\_L$$ in the accumulator formula. This is so that we can compute the same inner sum across the accumulators over all rounds, and we can dispatch these inner sums to different accumulators in different rounds afterwards, with negligible cost. See Section A.4 from the paper for more details.

{% hint style="info" %}
**Pre-computation:** Use Procedure 9 from the paper to compute the accumulators:

$$
\mathsf{A}*i(\bm{v}, u)=\sum*{\bm{x}\_R\in{0,1}^{\ell/2-i}}\widetilde{\mathsf{eq}}(\bm{w}\_R,\bm{x}*R)\sum*{\bm{x}\_L\in{0,1}^{\ell/2}}\widetilde{\mathsf{eq}}(\bm{w}\_L,\bm{x}*L)\prod*{k=1}^d p\_k(\bm{v},u,\bm{x}\_L,\bm{x}\_R)
$$

for all $$i\in\[1,\ell\_0],\bm{v}\in U\_d^{i-1}$$, and $$u\in \widehat{U}\_d$$.

**Initialize:** $$\mathsf{R}\_1=(1)$$.

**For each round** $$i=1,\dots,\ell\_0$$**:**

1. For $$u\in\widehat{U}\_d$$, compute $$t\_i(u)$$ using the formula

$$
t\_i(u):=\sum\_{\bm{v}\in \[0,d]^{i-1}}\mathsf{R}*i\[\mathsf{nat}*{d+1}(\bm{v})]\cdot \mathsf{A}\_i(\bm{v},u)
$$

2. Use Procedure 8 form the paper to compute $$s\_i(X)$$ from $$\big({t\_i(u)},\bm{w},\bm{r}\_{\[1,i-1]}\big)$$
3. Send evaluations of $$s\_i$$ over $$\widehat{U}\_d$$
4. Receive challenge $$r\_i\in\mathbb{F}\_c$$ from the verifier.
5. Compute $$\mathsf{R}*{i+1}=\mathsf{R}*i \otimes(\mathcal{L}*{U\_d,k}(r\_j))^{d}*{k=0}$$.

**For round** $$\ell\_0+1$$: Follow Algorithm 2, storing the evaluations $$p\_k(\bm{r}\_{\[1,\ell\_0-1]},b,\bm{x}')$$ for all $$k\in\[1,d],b\in{0,1}$$, and $$\bm{x}'\in{0,1}^{\ell-\ell\_0}$$.

**Receive challenge** $$r\_{\ell+1}\in\mathbb{F}*c$$ from the verifier, and perform updates to get $$p\_k(\bm{r}*{\[1,\ell\_0]},\bm{x}')$$ for all $$k$$ and $$\bm{x}'$$, similar to Algorithm 1.

**For rounds** $$\ell+2,\dots,\ell$$: Follow Algorithm 5.
{% endhint %}

**Cost Analysis.** Besides the extra subtleties as noted above, the cost estimate of Algorithm 6 is fairly straightforward and follows directly from combining the estimates of Algorithm 4 and Algorithm 5. In particular, the small-value precomputation process is now dominated by $$(d + 1)^{\ell\_0} · 2^{\ell−\ell\_0}\mathfrak{sl}$$. Next, the streaming round takes roughly $$d · 2^\ell \mathfrak{sl}$$, and the rest of the linear-time sum-check rounds cost roughly $$d^2 · 2^{\ell−\ell\_0−1}\mathfrak{ll}$$. We thus summarize as follows.

**Theorem 6.1.** Algorithm 6’s cost is dominated by $$((\frac{d+1}{2})^{\ell\_0} + d) N\ \mathfrak{sl}$$ and $$d^2/2^{\ell\_0+1}N\ \mathfrak{ll}$$.

### Summary

<table><thead><tr><th width="148.484375"></th><th width="421.6875">Multiplications</th><th>Space Complexity</th></tr></thead><tbody><tr><td>Algorithm 1</td><td><span class="math">d · 2^{ℓ−1}\mathfrak{sl}+d\cdot 2^{\ell}\mathfrak{ll}</span></td><td><span class="math">d · 2^{\ell}</span></td></tr><tr><td>Algorithm 2</td><td><span class="math">d · \ell_0 \cdot 2^{\ell-1} \mathfrak{sl}+d\cdot 2^{\ell}\mathfrak{ll}</span></td><td><span class="math">d · 2^{\ell−\ell_0}</span></td></tr><tr><td>Algorithm 3</td><td><span class="math">(d − 1) · 2^{d\ell_0} · 2^{\ell−\ell_0} \mathfrak{ss} +d · 2^\ell \mathfrak{sl}+d^2 · 2^{\ell−\ell_0−1}\mathfrak{ll}</span></td><td><span class="math">d · 2^{\ell−\ell_0}</span></td></tr><tr><td>Algorithm 4</td><td><span class="math">(d − 1) · (d + 1)^{\ell_0} · 2^{\ell−\ell_0}\mathfrak{ss}+d · 2^\ell \mathfrak{sl}+d^2 · 2^{\ell−\ell_0−1}\mathfrak{ll}</span></td><td><span class="math">d · 2^{\ell-\ell_0}</span></td></tr><tr><td>Algorithm 5</td><td><span class="math">d(d + 1)/2 · N\ \mathfrak{ll}</span></td><td><span class="math">d · 2^{\ell}</span></td></tr><tr><td>Algorithm 6</td><td><span class="math">((\frac{d+1}{2})^{\ell_0} + d) N\ \mathfrak{sl} + d^2/2^{\ell_0+1}N\ \mathfrak{ll}</span></td><td><span class="math">d · 2^{\ell-\ell_0}</span></td></tr></tbody></table>

### References:

* [Speeding Up Sum-Check Proving](https://eprint.iacr.org/2025/1117.pdf) by Suyash et al, 2025.
* [Optimizing the Sumcheck Protocol](https://hackmd.io/@tcoratger/SJjJBfWWlg) by Thomas Coratger, 2025.

> Written by [Batzorig Zorigoo](mailto:undefined) of Fractalyze


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://fractalyze.gitbook.io/intro/primitives/sumcheck/speeding-up-sumcheck.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
