GKR
Presentation: https://youtu.be/OTCxQ9qIzDY
Overview
Although interactive proofs like Sumcheck enabled the verifier to efficiently check the validity of solutions to problems in complexity classes beyond NP, real world computing entities cannot solve a large instance of problems beyond NP practically.
Still, it can be pretty useful regarding more solvable problems in the real world (even for the verifier) if the verifier's cost is much less than solving it by itself and the prover's work is not increased asymptotically compared to what it takes to merely solve it.
Let C be the circuit in layered form whose number of gates (or circuit size) is S and circuit depth is d. The GKR protocol is important in a sense that :
P-cost is almost O(S); it is not much more than the cost to solve the circuit.
V-cost is O(dlogS); it's even less than just reading through the entire circuit.
Note: This needs the assumption that the circuit is logspace uniform, which implies that it has a succinct implicit description. That's why V can verify even without looking into the entire C.
It can function as a general-purpose protocol for any P and V to run an IP on any arithmetic circuit evaluation instance.
The protocol starts from P sending the claimed output value to V as the first message.
In a high level view, the protocol iterates from the first layer (output layer) to the bottom (input layer), where in each iteration i the goal is to reduce the claim of the output values of the gates in layer i to the claim of the output values of the gates in layer i+1.
Going through all the way down, the first claim is reduced to that of input values, which are known to V and thus can be checked by itself. Reduction in each iteration is done by the sumcheck protocol.

When a circuit is in layered form, it means every leaf appears in the last layer (at depth d). For a circuit that is not layered, there is a transformation with a blowup in size of factor d.
Protocol details
Let C be a fan-in 2 circuit over a finite field F in a layered form, with
S : the size of the circuit (number of total gates)
Si : the number of gates in layer i (hence each gates in layer i is labeled by 0,…,Si−1)
d : the depth
n : the input length (i.e. Sd)
and each gate is either addition or multiplication.
Without loss of generality let Si=2ki. Let Wi:{0,1}ki→F to be a mapping which takes as input the binary representation of gate labels in layer i and gives the gate output.
Define "wiring predicate" in1,i,in2,i:{0,1}ki→{0,1}ki+1, which takes as input the gate label and gives the label of one of its input gates in the lower layer.
Lastly, define "gate switch" addi,multi:{0,1}ki+2ki+1→{0,1} which takes as input the in/out gate labels and gives its instruction type. For example, supposing that gate a in layer i is an addition gate,
Functions with a tilde on top of the function name indicates the multilinear extension of the mapping.
(TODO: Make polynomial extension page and set backlinks to it in pages Spartan, Lasso, etc.)

Lemma 1
The multilinear extension of the gate output polynomial can be represented as below.
Proof.
Obviously both hands match at {0,1}ki. As both hands are multilinear in z and a multlinear extension of such mapping is uniquely defined at domain {0,1}ki, both hands are identical.

Protocol
P sends a function D:{0,1}k0→F claimed to equal W0 (the function that maps output gate labels to output values).
V picks a random r0←$Fk0, and lets m0←D(r0). The remainder of the protocol is devoted to confirming that m0=W0(r0).
Now iterate for i=0,1,…,d−1. Each iteration is devoted to reduce the claim on Wi(ri) to the claim on Wi+1(ri+1).
Define the (2ki+1)-variate polynomial:fri(i)(b,c)=addi(ri,b,c)(Wi+1(b)+Wi+1(c))+multi(ri,b,c)(Wi+1(b)⋅Wi+1(c))
P's claim is identical to that ∑b,c∈{0,1}ki+1fri(i)(b,c)=mi.
So that V may check this claim, P and V apply the sumcheck protocol to fri(i), up until V's final check in that protocol, where V must evaluate fri(i) at a randomly chosen point (b∗,c∗)∈Fki+1×Fki+1.
Note that V only needs to know addi and multi and not the entire polynomial fri(i).
Now we need to reduce the two claims on Wi+1(b∗) and Wi+1(c∗) to one claim. Let ℓ be the unique line satisfying ℓ(0)=b∗ and ℓ(1)=c∗. P sends a univariate polynomial q of degree at most ki+1 to V, claimed to equal Wi+1 restricted to ℓ.
V now performs the final check in the sumcheck protocol, using q(0) and q(1) in place of Wi+1(b∗) and Wi+1(c∗).
Note we assume here that for each layer i, V can evaluate the multilinear extensions addi and multi in polylogarithmic time. This is necessary for keeping the verifier cost sublinear to circuit size S.
V chooses r∗∈F at random and sets ri+1←ℓ(r∗) and mi+1←q(ri+1).
Note that q(ri+1)=Wi+1(ri+1). The claim is successfully reduced to that of the next layer.
After the final round, V directly checks md=Wd(rd). As V knows the input vector, it can evaluate Wd at any point with O(n) time complexity where n=2kd, i.e. the length of input.

For a more detailed explanation and analysis on costs and soundness, please refer to Section 4.1 and 4.2 of the original paper.
In particular, the prover cost is reduced from O(S3) of the naive method of sumcheck protocol to O(S2) of the optimized method which exploits the fact that addi and multi are sparse and structured and thus can be evaluated very efficiently. That is, as most of the evaluations are guaranteed to be zero, instead of passing over the entire evaluation domain, it passes only over the values that are known to be nonzero. Furthermore, the prover cost can be reduced to O(S) when the computation can be parallelized by data, as shown in the next section.
Similarly in V's cost analysis, the cost for evaluating addi and multi is considered to be lower order and omitted. This owes to repeatedly structured wiring patterns that commonly arise in circuits of real world problems in interest. Even without structured wiring patterns, we also have an option to let the verifier itself commit to addi and multi (with O(S) preprocessing time) using polynomial commitment scheme and let the prover open it each time in need, which turns the interactive proof to an interactive argument.
Leveraging Data Parallelism
When the circuit has the data parallelism property; that is, the same sub-computation is applied independently on different pieces of data and then possibly aggregated, the performance of GKR is greatly improved. Fortunately, data parallel computation is pervasive in real-world computing, including the use cases in ZK era such as LogUp-GKR, Lasso, Libra and Ceno.

Let C be a circuit of size S with an arbitrary wiring pattern, and let C′ be a "super-circuit" of size B⋅S that applies C independently to B=2b input data before aggregating results. Labels of gates consists of two indices, one for the location in the layer and the other for which circuit among the copies to choose. For example, a=(a1,a2)∈{0,1}ki×{0,1}b.
Let h denote the polynomial Fki×b→F defined by
where
Then h extends Wi′, just as in the single data case. However since h is not multilinear like before, we multi-linearize h, and represent the multilinear extension of Wi′ as below. For any z∈{0,1}ki+b,
where
Note that as the same circuit structure is repeated, addi and multi do not depend on a2, which significantly reduces the cost of evaluating them for both parties and mitigates the bottleneck.
The protocol is the same as the single data version, except for that at layer i, the sumcheck protocol is applied to gri(i) instead.

Check Section 5 of the paper for detailed cost analysis.
References
Last updated