# GKR

## Overview

Although interactive proofs like [Sumcheck](https://fractalyze.gitbook.io/intro/primitives/sumcheck) enabled the verifier to efficiently check the validity of solutions to problems in complexity classes beyond NP, real world computing entities cannot solve a large instance of problems beyond NP practically.&#x20;

Still, it can be pretty useful regarding more solvable problems in the real world (even for the verifier) **if the verifier's cost is much less than solving it by itself** and the **prover's work is not increased asymptotically** compared to what it takes to merely solve it.&#x20;

Let $$\mathcal{C}$$ be the circuit in layered form whose number of gates (or circuit size) is $$S$$ and circuit depth is $$d$$. The GKR protocol is important in a sense that :&#x20;

* $$\mathcal{P}$$-cost is *almost* $$O(S)$$; it is not much more than the cost to solve the circuit.
* $$\mathcal{V}$$-cost is $$O(d \log S)$$; it's even less than just reading through the entire circuit.&#x20;
  * Note: This needs the assumption that the circuit is [logspace uniform](https://en.wikipedia.org/wiki/Circuit_complexity#Logspace_uniform), which implies that it has a succinct implicit description. That's why $$\mathcal{V}$$ can verify even without looking into the entire $$\mathcal{C}$$.&#x20;
* It can function as a general-purpose protocol for any $$\mathcal{P}$$ and $$\mathcal{V}$$ to run an IP on any arithmetic circuit evaluation instance.

The protocol starts from $$\mathcal{P}$$ sending the claimed output value to $$\mathcal{V}$$ as the first message.&#x20;

In a high level view, the protocol iterates from the first layer (output layer) to the bottom (input layer), where in each iteration $$i$$ the goal is to **reduce the claim of the output values of the gates in layer** $$i$$ **to the claim of the output values of the gates in layer** $$i+1$$.

Going through all the way down, the first claim is reduced to that of input values, which are known to $$\mathcal{V}$$ and thus can be checked by itself. Reduction in each iteration is done by the [sumcheck protocol](https://fractalyze.gitbook.io/intro/primitives/sumcheck).&#x20;

<figure><img src="https://755218234-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Frwz1ZAZJtK5FHz4Y1esA%2Fuploads%2FBNKFNgzxwqPS4f2Z1444%2Fimage.png?alt=media&#x26;token=b4ac1327-7c69-4df7-baa5-3d4703e9dd31" alt=""><figcaption></figcaption></figure>

> When a circuit is in layered form, it means every leaf appears in the last layer (at depth $$d$$). For a circuit that is not layered, there is a transformation with a blowup in size of factor $$d$$.&#x20;

***

## Protocol details

Let $$\mathcal{C}$$ be a fan-in 2 circuit over a finite field $$\mathbb{F}$$ in a layered form, with

* $$S$$ : the size of the circuit (number of total gates)
* $$S\_i$$ : the number of gates in layer $$i$$ (hence each gates in layer $$i$$ is labeled by $$0, \dots, S\_i - 1$$)
* $$d$$ : the depth
* $$n$$ : the input length (i.e. $$S\_d$$)&#x20;

and each gate is either addition or multiplication.

Without loss of generality let $$S\_i = 2^{k\_i}$$. Let $$W\_i : {0,1}^{k\_i} \rightarrow \mathbb{F}$$ to be a mapping which takes as input the binary representation of gate labels in layer $$i$$ and gives the gate output.&#x20;

Define "wiring predicate" $$\mathsf{in}*{1,i}, \mathsf{in}*{2,i} : {0,1}^{k\_i} \rightarrow {0,1}^{k\_{i+1}}$$, which takes as input the gate label and gives the label of one of its input gates in the lower layer.&#x20;

Lastly, define "gate switch" $$\mathsf{add}\_i, \mathsf{mult}*i : {0,1}^{k\_i + 2k*{i+1}} \rightarrow {0,1}$$ which takes as input the in/out gate labels and gives its instruction type. For example, supposing that gate $$a$$ in layer $$i$$ is an addition gate,

$$
\mathsf{add}*i(a,b,c) =
\begin{cases}
1 & \text{if } (b,c) = (\mathsf{in}*{1,i}(a), \mathsf{in}\_{2,i}(a))
\ 0 & \text{otherwise}
\end{cases}
$$

Functions with a tilde on top of the function name indicates the multilinear extension of the mapping.

(TODO: Make polynomial extension page and set backlinks to it in pages [Spartan](https://fractalyze.gitbook.io/intro/zk/snark/spartan), [Lasso](https://fractalyze.gitbook.io/intro/zk/lookup/lasso), etc.)

<figure><img src="https://755218234-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Frwz1ZAZJtK5FHz4Y1esA%2Fuploads%2F3soA4tWO6XYW2C8niVXj%2Fimage.png?alt=media&#x26;token=a15fd14d-bfe4-479f-abb0-aa06d1c47379" alt=""><figcaption></figcaption></figure>

### Lemma 1

The multilinear extension of the gate output polynomial can be represented as below.&#x20;

$$
\widetilde{W}*i(z) = \sum*{b,c \in {0,1}^{k\_{i+1}}} \widetilde{\mathsf{add}}*i(z,b,c)(\widetilde{W}*{i+1}(b) +\widetilde{W}*{i+1}(c)) + \widetilde{\mathsf{mult}}*i(z,b,c)(\widetilde{W}*{i+1}(b) \cdot \widetilde{W}*{i+1}(c))
$$

#### Proof.&#x20;

Obviously both hands match at $${0,1}^{k\_i}$$. As both hands are multilinear in $$z$$ and a multlinear extension of such mapping is uniquely defined at domain $${0,1}^{k\_i}$$, both hands are identical.

<figure><img src="https://755218234-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Frwz1ZAZJtK5FHz4Y1esA%2Fuploads%2FTJDvqLKVVMKrEh0nM7fv%2Fimage.png?alt=media&#x26;token=63133ca5-b447-4c81-80a1-4c2bea8d435e" alt=""><figcaption><p>Circuit over <span class="math">\mathbb{F}_5</span> with only multiplication gates. </p></figcaption></figure>

### Protocol

1. $$\mathcal{P}$$ sends a function $$D : {0,1}^{k\_0} \rightarrow \mathbb{F}$$ claimed to equal $$W\_0$$ (the function that maps output gate labels to output values).&#x20;
2. $$\mathcal{V}$$ picks a random $$r\_0 \stackrel{$}{\leftarrow} \mathbb{F}^{k\_0}$$, and lets $$m\_0 \leftarrow \widetilde{D}(r\_0)$$. The remainder of the protocol is devoted to confirming that $$m\_0 = \widetilde{W}\_0(r\_0)$$.&#x20;
3. Now iterate for $$i = 0, 1, \dots, d-1$$. Each iteration is devoted to reduce the claim on $$\widetilde{W}*i(r\_i)$$ to the claim on $$\widetilde{W}*{i+1}(r\_{i+1})$$.&#x20;
   1. Define the $$(2k\_{i+1})$$-variate polynomial:$$f\_{r\_i}^{(i)}(b,c) = \widetilde{\mathsf{add}}*i(r\_i, b, c) \left( \widetilde{W}*{i+1}(b) + \widetilde{W}*{i+1}(c) \right) + \widetilde{\mathsf{mult}}*i(r\_i, b, c) \left( \widetilde{W}*{i+1}(b) \cdot \widetilde{W}*{i+1}(c) \right)$$
   2. $$\mathcal{P}$$'s claim is identical to that $$\sum\_{b, c \in {0,1}^{k\_{i+1}}} f\_{r\_i}^{(i)}(b, c) = m\_i$$.
   3. So that $$\mathcal{V}$$ may check this claim, $$\mathcal{P}$$ and $$\mathcal{V}$$ apply the sumcheck protocol to $$f\_{r\_i}^{(i)}$$, up until $$\mathcal{V}$$'s final check in that protocol, where $$\mathcal{V}$$ must evaluate $$f\_{r\_i}^{(i)}$$ at a randomly chosen point $$(b^*, c^*) \in \mathbb{F}^{k\_{i+1}} \times \mathbb{F}^{k\_{i+1}}$$.&#x20;
      1. Note that $$\mathcal{V}$$ only needs to know $$\widetilde{\mathsf{add}}\_i$$ and $$\widetilde{\mathsf{mult}}*i$$ and not the entire polynomial $$f*{r\_i}^{(i)}$$.
   4. Now we need to reduce the two claims on $$\widetilde{W}*{i+1}(b^\*)$$ and $$\widetilde{W}*{i+1}(c^*)$$ to one claim. Let $$\ell$$ be the unique line satisfying $$\ell(0) = b^*$$ and $$\ell(1) = c^\*$$. $$\mathcal{P}$$ sends a univariate polynomial $$q$$ of degree at most $$k\_{i+1}$$ to $$\mathcal{V}$$, claimed to equal $$\widetilde{W}\_{i+1}$$ restricted to $$\ell$$.&#x20;
   5. $$\mathcal{V}$$ now performs the final check in the sumcheck protocol, using $$q(0)$$ and $$q(1)$$ in place of $$\widetilde{W}*{i+1}(b^\*)$$ and $$\widetilde{W}*{i+1}(c^\*)$$.&#x20;
      1. Note we assume here that for each layer $$i$$, $$\mathcal{V}$$ can evaluate the multilinear extensions $$\widetilde{\mathsf{add}}\_i$$ and $$\widetilde{\mathsf{mult}}\_i$$ in polylogarithmic time. This is necessary for keeping the verifier cost sublinear to circuit size $$S$$.&#x20;
   6. $$\mathcal{V}$$ chooses $$r^\* \in \mathbb{F}$$ at random and sets $$r\_{i+1} \leftarrow \ell(r^\*)$$ and $$m\_{i+1} \leftarrow q(r\_{i+1})$$.
      1. Note that $$q(r\_{i+1}) = \widetilde{W}*{i+1}(r*{i+1})$$. The claim is successfully reduced to that of the next layer.&#x20;
4. After the final round, $$\mathcal{V}$$ directly checks $$m\_d = \widetilde{W}\_d(r\_d)$$. As $$\mathcal{V}$$ knows the input vector, it can evaluate $$\widetilde{W}\_d$$ at any point with $$O(n)$$ time complexity where $$n = 2^{k\_d}$$, i.e. the length of input.

<figure><img src="https://755218234-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Frwz1ZAZJtK5FHz4Y1esA%2Fuploads%2FwXMUPZ7t31wP31fuTcLB%2Fimage.png?alt=media&#x26;token=9226d105-f374-4a3e-8d77-70f8c21f4a9c" alt=""><figcaption></figcaption></figure>

For a more detailed explanation and analysis on costs and soundness, please refer to [Section 4.1 and 4.2 of the original paper](https://people.cs.georgetown.edu/jthaler/gkrnotes.pdf).

In particular, the prover cost is reduced from $$O(S^3)$$ of the naive method of sumcheck protocol to $$O(S^2)$$ of the optimized method which exploits the fact that $$\widetilde{\mathsf{add}}\_i$$ and $$\widetilde{\mathsf{mult}}\_i$$ are sparse and structured and thus can be evaluated very efficiently. That is, as most of the evaluations are guaranteed to be zero, instead of passing over the entire evaluation domain, it passes only over the values that are known to be nonzero. Furthermore, the prover cost can be reduced to $$O(S)$$ when the computation can be parallelized by data, as shown in the next section.

Similarly in $$\mathcal{V}$$'s cost analysis, the cost for evaluating $$\widetilde{\mathsf{add}}\_i$$ and $$\widetilde{\mathsf{mult}}\_i$$ is considered to be lower order and omitted. This owes to repeatedly structured wiring patterns that commonly arise in circuits of real world problems in interest. Even without structured wiring patterns, we also have an option to let the verifier itself commit to $$\widetilde{\mathsf{add}}\_i$$ and $$\widetilde{\mathsf{mult}}\_i$$ (with $$O(S)$$ preprocessing time) using polynomial commitment scheme and let the prover open it each time in need, which turns the interactive proof to an interactive argument.&#x20;

## Leveraging Data Parallelism&#x20;

When the circuit has the data parallelism property; that is, the same sub-computation is applied independently on different pieces of data and then possibly aggregated, the performance of GKR is greatly improved. Fortunately, data parallel computation is pervasive in real-world computing, including the use cases in ZK era such as [LogUp-GKR](https://fractalyze.gitbook.io/intro/zk/lookup/logup-gkr), [Lasso](https://fractalyze.gitbook.io/intro/zk/lookup/lasso), [Libra](https://eprint.iacr.org/2019/317.pdf) and [Ceno](https://eprint.iacr.org/2024/387.pdf).

<figure><img src="https://755218234-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Frwz1ZAZJtK5FHz4Y1esA%2Fuploads%2F1GjWOKw0ZOdNNgkaU93j%2Fimage.png?alt=media&#x26;token=5ea7b448-f9c9-45cd-a9ce-d6a7bbb8a9d0" alt=""><figcaption></figcaption></figure>

Let $$\mathcal{C}$$ be a circuit of size $$S$$ with an arbitrary wiring pattern, and let $$\mathcal{C}'$$ be a "super-circuit" of size $$B \cdot S$$ that applies $$\mathcal{C}$$ independently to $$B = 2^b$$ input data before aggregating results. Labels of gates consists of two indices, one for the location in the layer and the other for which circuit among the copies to choose. For example, $$a = (a\_1, a\_2) \in {0,1}^{k\_{i}} \times {0,1}^b$$.&#x20;

Let $$h$$ denote the polynomial $$\mathbb{F}^{k\_i \times b} \rightarrow \mathbb{F}$$ defined by&#x20;

$$
h(a\_1, a\_2) :=\sum\_{b\_1, c\_1 \in {0,1}^{k\_{i+1}}} g(a\_1, a\_2, b\_1, c\_1)
$$

where&#x20;

$$
g(a\_1, a\_2, b\_1, c\_1) :=\widetilde{\mathsf{add}}*i(a\_1, b\_1, c\_1) \left( \widetilde{W}*{i+1}^\prime(b\_1, a\_2) + \widetilde{W}*{i+1}^\prime(c\_1, a\_2) \right)+ \widetilde{\mathsf{mult}}*i(a\_1, b\_1, c\_1) \cdot \widetilde{W}*{i+1}^\prime(b\_1, a\_2) \cdot \widetilde{W}*{i+1}^\prime(c\_1, a\_2)
$$

Then $$h$$ extends $$W\_i'$$, just as in the single data case. However since $$h$$ is not multilinear like before, we multi-linearize $$h$$, and represent the multilinear extension of $$W\_i'$$ as below. For any $$z \in {0,1}^{k\_i + b}$$,&#x20;

$$
\widetilde{W}*i^\prime(z) =
\sum*{(a\_1, a\_2, b\_1, c\_1) \in {0,1}^{k\_i + b + 2k\_{i+1}}} g\_z^{(i)}(a\_1, a\_2, b\_1, c\_1) \\
$$

where

$$
g\_z^{(i)}(a\_1, a\_2, b\_1, c\_1) :=
\widetilde{\mathsf{eq}}*{k\_i + b}(z, (a\_1, a\_2)) \cdot \left\[
\widetilde{\textsf{add}}*i(a\_1, b\_1, c\_1) \left( \widetilde{W}*{i+1}^\prime(b\_1, a\_2) + \widetilde{W}*{i+1}^\prime(c\_1, a\_2) \right) +
\widetilde{\textsf{mult}}*i(a\_1, b\_1, c\_1) \cdot \widetilde{W}*{i+1}^\prime(b\_1, a\_2) \cdot \widetilde{W}\_{i+1}^\prime(c\_1, a\_2)
\right]
$$

Note that as the same circuit structure is repeated, $$\widetilde{\mathsf{add}}\_i$$ and $$\widetilde{\mathsf{mult}}\_i$$ do not depend on $$a\_2$$, which significantly reduces the cost of evaluating them for both parties and mitigates the bottleneck.&#x20;

The protocol is the same as the single data version, except for that at layer $$i$$, the sumcheck protocol is applied to $$g\_{r\_i}^{(i)}$$ instead.&#x20;

<figure><img src="https://755218234-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2Frwz1ZAZJtK5FHz4Y1esA%2Fuploads%2FAndRbZXWjm2e7boRtFBS%2Fimage.png?alt=media&#x26;token=ae24d73d-437e-4365-8ed3-2e2ff70f634f" alt=""><figcaption></figcaption></figure>

Check [Section 5 of the paper](https://people.cs.georgetown.edu/jthaler/gkrnotes.pdf) for detailed cost analysis. &#x20;

***

## References

* <https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/2008-DelegatingComputation.pdf>
* <https://people.cs.georgetown.edu/jthaler/gkrnotes.pdf>
* <https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.pdf>
