Lasso
Lasso by Srinath Setty, Justin Thaler and Riad Wahby
Overview
This paper introduces Lasso, which is a lookup argument that allows an untrusted prover to commit to a vector a∈Fm and prove that all entries of a reside in some predetermined table t∈Fn.
Lasso's performance characteristics unlock the so-called "Lookup singularity".
A key difference of Lasso from other lookup arguments is that, if the table t is structured(as we will explain later), then no party needs to commit to t, enabling the use of enormous tables (e.g. of size 2128 or larger). Most common use cases of lookup argument, such as range checks, bitwise operations or big-number arithmetic, obtains huge benefits in cost from this characteristic of Lasso.
Lookup arguments.
Imagine that a prover wishes to establish that at no point in a program's execution did any integer ever exceed 2128.
A naive approach to accomplish this inside a circuit-SAT instance is to have the circuit take 128 field elements as its advice inputs, which constitute the binary representation of x. The circuit checks that all 128 advice elements are in {0,1} and they indeed equal to the binary representation of x, i.e., x=∑i=01272i⋅bi. A single overflow check turns into at least 129 constraints and additional 128 field elements in witness.
Lookup tables offer a better approach. The verifier initializes a lookup table containing all integers between 0 and 2128−1. Then the overflow check above amounts to simply confirming that x is in the table. Of course a table of size 2128 is way too large to be represented, even for the prover. This paper is devoted to describe the techniques to enable such a table lookup without requiring the table to ever be explicitly materialized by either of them. Indeed, the table in this example is highly structured in a way that it can be optimized using the techniques we will introduce.
Lasso's starting point is Spark, which is an optimal polynomial commitment scheme for sparse multilinear polynomials from Spartan. Just as Spartan, Lasso can be instantiated with any multilinear PCS. Particularly for SNARKs that have the prover commit to the witness using a multilinear PCS, Lasso can be applied seamlessly to provide a lookup argument in either R1CS or Plonkish circuit satisfiability.
Jolt and Lasso.
Jolt is a RISC-V-powered zkVM which achieved substantial improvements in front-end design by exploiting Lasso. Jolt replaces the instruction execution part of a zkVM circuit with Lasso by letting it look up into a gigantic evaluation table rather than evaluating it in a circuit. For each of the primitive RISC-V instructions, the resulting evaluation table has the structure that we require to apply Lasso for a significant reduce in the cost. Thaler states that Lasso and Jolt together essentially achieve a vision outlined by Barry WhiteHat called the Lookup singularity - seeking to transform arbitrary computer program into circuits that only perform lookups.
Preliminiaries
As they already explained in previous presentations, I skip the parts for Multilinear extensions, SNARKs, Polynomial commitment scheme, Polynomial IOP and the sumcheck protocol.
Sparsity.
m-sparse refers to multilinear polynomials g:Fl→F in l variables such that g(x)=0 for at most m values of x∈{0,1}l. In other words, g has at most m non-zero coefficients in its multilinear Lagrange polynomial basis. If m≪2l, only a tiny fraction of the possible coefficients are non-zero. If m=Θ(2l), we refer to such g as a dense polynomial.
Dense representation of multilinear polynomials.
The fact that the MLE of a function is unique offers the following method to represent any multilinear polynomial, which is briefer than its point-value representation or coefficient representation.
A multilinear polynomial g:Fl→F can be represented uniquely by the list of tuples L such that for all i∈{0,1}l,(to-field(i),g(i))∈L if and only if g(i)=0.
Here, to-field is the canonical injection from {0,1}l to F and to-bits is the inverse mapping. You can think of them as transformations between a field element and its binary representation.
We refer to L as the dense representation of a sparse polynomial g.
Definition 1.
A multilinear polynomial g in l variables is a sparse multilinear polynomial if ∣DenseRepr(g)∣ is sub-linear in Θ(2l). Otherwise, it is a dense multilinear polynomial.
Technical overview
For a lookup argument from a∈Fm to t∈Fn, it suffices for a prover to show that it knows a sparse matrix M∈Fm×n such that (1) for each row of M, only one cell has a value of 1 and the rest are zeroes and that (2) M⋅t=a.
Below demonstrates a simple example of demonstrating that a=[2,3,0] is a valid lookup into the table t=[0,1,2,3]. You can see that M is indeed sparse.
Proving the matrix-vector multiplication above is equivalent to confirming that
Equation (1):
for an r∈Flogm randomly sampled by the verifier.
To run a sumcheck protocol on this equation to prove it, the verifier should be able to evaluate M(r,y)⋅t~(y) at some random y succintly, i.e. with a cost independent of n, regarding that the size of lookup table might be very large. Clearly, we cannot achieve this by committing to M and t~ using normal PCS.
M is a sparse matrix where only m elements out of m×n are non-zero. Hence, the above is efficiently provable by having the prover commit to the sparse polynomial M using Surge, which is a generalization of Spark for Lasso, and instead of commiting to the enormous t, sending only its succint description assuming that it is structured in a way we define. Once we have such a commitment scheme, we instantly obtain the desired lookup argument.
Spark
Spark, as explained above, is a polynomial commitment scheme for sparse polynomials. Let g be a (logN)-variate multilinear polynomial with sparsity m. The prover commits to a unique dense representation of the sparse polynomial g, which is the list that specifies only the multilinear Lagrange bases with non-zero coefficients. When the verifier requests an evaluation g(r), the prover returns the claimed evaluation v along with the evaluation proof.
Let c be such that N=mc without loss of generality. There is a simple algorithm that takes as input the dense representation of g and outputs g(r) in O(c⋅m) time.
Algorithm 1.
Decompose the logN=clogm variables of r into c blocks, each of size logm, writing r=(r1,...,rc)∈(Flogm)c. Then any (logN)-variate Lagrange basis polynomial evaluated at r can be expressed as a product of c smaller Lagrange basis polynomials. With O(c⋅m) cost of pre-computing a write-once memory, each basis polynomial can be calculated by c memory look-ups and extra field operations.
Example.
Let N=26, c=3 and m=22.
Let g(x)=1 for x=000001,...,000100. Otherwise, g(x)=0 for x∈{0,1}6. Subscript 1, 2, and 3 each denotes the 2-bits length partition of the 6-bits binary string. Xi(r) is equal to eq(i,r).
Observe that the evaluation of g(x) on some r can be decomposed in this way:
The memory is composed of c(=3) subtables t1,t2 and t3 where each ti is a table of size m populated by the values from X0,0(ri) to X1,1(ri). Each subtable can be computed by the prover in O(m), leading to a total computation cost of O(c⋅m).
X0,0(r1)
X0,0(r2)
X0,0(r3)
X0,1(r1)
X0,1(r2)
X0,1(r3)
X1,0(r1)
X1,0(r2)
X1,0(r3)
X1,1(r1)
X1,1(r2)
X1,1(r3)
Notice that this is cheaper with a factor of logm than a naive approach, whose cost is O(mlogN) since there are m Lagrange basis polynomials with logN variables.
The efficiency of the algorithm above comes from the fact that g is sparse. Spark is merely a SNARK as an argument that the prover correctly ran this algorithm on the committed description of g. For simplicity, in the demonstrations below, we talk about the special case where c=2 and then expand it for a general result.
Detailed explanation
Let D be a m-sparse (2logm)-variate multilinear polynomial, as we fixed the value of c to be 2. For now, you can think of D as an m×m matrix with only m nonzero elements.
When its dense representation is given, one can express its evaluation as below, by decomposing the (2logm)-variate Lagrange basis polynomial into two (logm)-variate Lagrange basis polynomial.
Let's introduce three (logm)-variable multilinear polynomials val, row and col, which extend the dense representation of D. That is, k∈{0,1}logm iterates the m nonzero elements in D in some canonical order, and for each k, there exist some i,j such that row(k)=to-field(i), col(k)=to-field(j) and val(k)=D(i,j).
val maps k to each value of the nonzero elements and row and col maps k to a corresponding location of it in D(or, equivalently, to the index of each two subtables in Algorithm 1).
Above equation can be simplified to:
Equation (2):
Let Erx(k)=eq(to-bits(row(k)),rx) and Ery(k)=eq(to-bits(col(k)),ry) each of which are (logm)-variate. We define a PIOP for proving the evaluation of D at (rx,ry). We assume that val is committed to beforehand.
P→V : Erx(k) and Ery(k) as oracles.
P↔V : run the sumcheck reduction to reduce this statement
to the following checks:
val(rz)=?vval
Erx(rz)=?vErx and Ery(rz)=?vEry.
where the claimed values are provided by P and rz is randomly sampled by V at the end of the sumcheck protocol.
V : check if the three equalities above hold with an oracle query to each of the polynomials.
A key part of achieving a sound argument with the PIOP above is to enable the verifier to check that the committed Erx and Ery are built correctly, i.e.
∀k∈{0,1}logm,Erx(k)=eq(to-bits(row(k)),rx)
∀k∈{0,1}logm,Ery(k)=eq(to-bits(col(k)),ry).
Checking the above is equivalent to checking if an untrusted prover ran Algorithm 1 above (which is the case where c=2 and the two subtables correspond to Erx and Ery), i.e., that the prover initialized the memory with correct values and the subsequent memory read operations occurred in a correct manner. We introduce offline memory checking, which is a protocol that reduces such memory consistency argument to a multiset equality check.
Offline memory checking
Offline memory checking is a protocol between an untrusted memory and a trusted checker, which is the prover and the verifier in our protocol's context, respectively.
The checker maintains two local state sets : RS and WS. RS is initially empty. For M-sized memory, WS is initialized as following : for all i∈[N1/c], the tuple (i,vi,0) is included. Here, the first element is the index, second one is the value and the third one is the accessed count of each memory slot.
For each read operation at address a, suppose that the untrusted memory responds with a value-count pair (v,t). Then the checker updates its local state as follows:
RS←RS∪{(a,v,t)}
store (v,t+1) at address a in the untrusted memory, and
WS←WS∪{(a,v,t+1)}.
Claim 1.
If for every read operation, the untrusted memory returns the tuple last written to that location, then there exists a set S with cardinality M consisting of tuples of the form (k,vk,tk) for all k∈[M] such that WS=RS∪S, where S is simply the current memory view. Conversely, if the untrusted memory ever returns a value v for a memory read k∈[M] which does not equal the value initially written to cell k, then there does not exist any set S such that WS=RS∪S.
I skip the proof of the claim, but the example below demonstrates how the first direction of the claim holds. Three memory slots with addresses 1, 2 and 3 are initialized with values 100, 101 and 102. Three read operations at address 2, 2 and 3.
(1, 100, 0)
(1, 100, 0)
(1, 100, 0)
(1, 100, 0)
(1, 100, 0)
(2, 101, 0)
(2, 101, 0)
(2, 101, 0) (2, 101, 1)
(2, 101, 0) (2, 101, 1)
(2, 101, 0) (2, 101, 1) (2, 101, 2)
(2, 101, 0) (2, 101, 1)
(2, 101, 0) (2, 101, 1) (2, 101, 2)
(2, 101, 2)
(3, 102, 0)
(3, 102, 0)
(3, 102, 0)
(3, 102, 0)
(3, 102, 0) (3, 102, 1)
(3, 102, 1)
You can see that there exist an S such that WS=RS∪S, and the S is equal to the final memory view.
Now we define the counter polynomials. Given the size M memory and a list of m addresses involved in read operations, one can compute two vectors Cr∈Fm and Cf∈FM where Cr[k] stores the 'correct' count that would have been returned for k-th read operation and Cf[j] stores the 'correct' final count stored at memory location j after the m read operations. Computing these requires O(m) operations over F.
Let read_ts=Cr, write_cts=Cr+1, and final_cts=Cf. We refer to these polynomials as counter polynomials.
The evaluation proof
Claim 2.
Regarding a (2logM)-variate multilinear polynomial, suppose that (row,col,val) is committed in advance, and
are the polynomials sent by the prover at the beginning of the evaluation proof. Notice that the verifier can manually compute write_ctsrow and write_ctscol.
If Erx is correctly computed, i.e. ∀k∈{0,1}logm,Erx(k)=eq(to-bits(row(k)),rx),
then the following holds : WS=RS∪S, where
WS={(to-field(i),eq(i,rx),0):i∈{0,1}logM}∪{(row(k),Erx(k),write_ctsrow(k)):k∈{0,1}logm};
RS={(row(k),Erx(k),read_tsrow(k)):k∈{0,1}logm}; and
S={(to-field(i),eq(i,rx),final_ctsrow(i)):i∈{0,1}logM}.
The high-level conclusion of the memory checking above is that when Erx is the polynomial that is produced by an untrusted memory, the checker (verifier) initializes the memory with eq(i,rx) which are the expected 'correct' values, and then by checking if the memory lookups are correctly done for each of the Erx(k), the verifier is convinced that the provided Erx is correct. The same logic is applied for Ery.
In reality, the verifier doesn't perform the check above manually. Instead, to check that Hτ,γ(WS)=Hτ,γ(RS)⋅Hτ,γ(S), two parties run a grand product argument, which is a multiset equality check that is performed via a GKR protocol whose circuit is simply a binary tree of multiplication gates. Since running a GKR protocol is reduced to a single evaluation of the input vector on a random value sampled by the verifier, and the verifier virtually has an oracle access to all of the required polynomials assuming they are committed by the prover (or is able to evaluate by himself), the memory check above can be done efficiently with a few evaluation proofs. For a more detailed explanation on the grand product argument, refer to Appendix E of the paper.
Generalization and Specialization
We will generalize the above protocol, Spark, so that it supports an arbitrary value for c rather than a fixed value of 2, which implies that the gigantic lookup table can be decomposed into even smaller subtables of size N1/c.
Subsequently we will specialize Spark for Lasso with some cost optimization exploiting the properties of lookup arguments. Eventually, we will further generalize Spark by relaxing the conditions that make a table 'structured' for Lasso to be applied. With these modifications altogether, we introduce Surge, as a sparse polynomial commitment scheme for Lasso and its final PIOP.
Supporting logN=clogm variables, rather than 2logm.
c can be interpreted as the number of the subtables which are the result of decomposition of the original table of size N. We can simply factor eqclogm into a product of c terms, instead of two terms, each of which are a (logm)-variate multilinear Lagrange basis polynomial. Hence, from this point, we use the notation of dimi(x) for i∈[c], for representing the injection from a boolean hypercube to an index in each subtable, instead of row(x) and col(x).
Specializing for Lasso
As we saw in Equation (1), the sparse m×N matrix M that we want to commit to can only have value 1 as its non-zero element. It means that the polynomial val(x) is also fixed to 1 and the commitment to it is unnecessary. We can save one polynomial to commit to with this per each lookup argument.
Furthermore, if we let c=2 and sample rx from Flogm and ry from FlogN (this means that each rx and ry represents the row index and column index of matrix M), since each row has exactly one non-zero element, to-bits(row(k)) is simply k. Erx(k)=eq(k,rx) can be evaluated by verifier on its own, thus there is no need for committing to it or proving its well-formness.
M is a 4 x 4 matrix with sparsity of 4. Let k be a two bits binary string for indexing 4 nonzero elements, i.e. k=00,01,10,11. The table below is the description of mapping row and col. It is obvious that the mapping row(k) is simply an iteration from 0 to 3, which is virtually equal to k. This is because there are exactly one nonzero element per each row. Therefore, it is unnecessary to commit to the polynomial Erx(k), as well as val(k). Check the table below.
00
0
2
1
01
1
3
1
10
2
0
1
11
3
0
1
Generalizing the struct of lookup tables
We redefine the term 'structured' to be more comprehensive and well-defined for the properties that the lookup tables may have. If (1) a lookup table t can be expressed as a tensor product of c≥2 smaller tables (i.e. decomposible), and (2) for each subtable its multilinear extension polynomial can be evaluated quickly (i.e. MLE-structured), we regard such table t as structured and Lasso can be applied.
Suppose that there is an integer k≥1 and α=k⋅c tables T1,...,Tα of size N1/c along with an α-variate multilinear polynomial g such that the following holds.
It means that each element of T can be briefly expressed by each corresponding element in T1,...,Tα.
Example.
A table for a range check in the first example ([1,...,2128]) can be easily expressed in the way above with k=1 and c=32, for instance.
Going back, we rewrite the Equation (2) as below:
If and only if M⋅t=a, the equation above holds, with a soundness error logm/∣F∣. In Lasso, after committing to M(by committing to dim1,...,dimc) and a~, the verifier sends random sampled r to see if above holds.
Let nz(i) denote the unique nonzero column in row i of M. We can rewrite the LHS as following:
And with the decomposibility of T,
Now the prover should prove that
for some E1,...,Eα, above is equal to a~(r), using the sum-check protocol and
E1,...,Eα are built with correct lookups into the subtables T1,...,Tα, using the offline memory checking.
Assuming that the verifier can evaluates the MLE of each subtables quickly, both parties can participate in the protocol only with the short descriptions about the subtables, without ever fully materializing the entire lookup table T throughout the protocol.
Final PIOP is described below.
P has committed to c multilinear polynomials dim1,...,dimc, each over logm variables.
P→V : 2α different (logm)-variate multilinear polynomials E1,...,Eα, read_ts1,..., read_tsα and α different (logN/c)-variate multilinear polynomials final_cts1, ..., final_ctsα.
// Ei is purported to specify the values of each of the m reads into Ti.
// read_ts and final_cts are the "counter polynomials" for each memory slot of the subtables after the computation.
V and P apply the sumcheck protocol to the polynomial h(k):=eq(r,k)⋅g(E1(k),...,Eα(k)) which reduces the check of the sum of h(k) on the boolean hypercube to : Ei(rz)=?vEi for i=1,...,α.
V : check if the above equalities hold with one oracle query to each Ei.
// Up to this point, the correctness of Ei is are not guaranteed yet. V and P runs the offline memory checking to check that Ei(j) equals Ti[dimi(j)] for all j∈{0,1}logm.
V→P : τ,γ∈RF.
// In practice, c instances of the sumcheck protocol below can be reduced to a single run of a sumcheck applied to random linear combinations of the polynomials.
V↔P : For i=1,...,α, run a GKR-based grand products protocol to reduce the check Hτ,γ(WS)=?Hτ,γ(RS)⋅Hτ,γ(S) to the following checks:
V : check the equalities above with an oracle query to each of Ei, dimi, read_tsi and final_ctsi.
Input : A polynomial commitment to the multilinear polynomials a~:Flogm→F, and a description of a structured table T of size N.
P sends a Surge-commitment to the multilinear extension M of a matrix M∈{0,1}m×N. This consists of c different (logm)-variate multilinear polynomials dim1,...,dimc.
V picks a random r∈Flogm and sends r to P. V makes one evaluation query to a~ to learn a~(r).
P and V apply Surge, allowing P to prove that ∑y∈{0,1}logNM(r,y)⋅T[y]=a~(r).
Cost
Cost of Spark

It's worth of noting that while at first glance it seems like increasing c incurs an increase in the commitment size and verification cost in a factor linear to c, since most of the PCS have efficient batching properties for evaluation proofs, the factor c can be omitted (i.e. the prover and verifier costs for verifying polynomial evaluations do not grow with c).
Prover time
The prover can compute its message for the sumcheck protocol with O(b⋅k⋅α⋅m) field operations where α=k⋅c and b is the number of monomials in g. For many tables of practical interest, b and k factor can be eliminated (where g has a 1 or 2 total degree, which is also the case of the range check example above), which gives O(c⋅m) total cost.
For the offline memory checking argument, the costs for the prover is similar to Spark : O(α⋅m+α⋅N1/c) field operations plus committing to a low-order number of field elements.
Verifier cost
The verifier performs O(k⋅logm) field operations. The costs of the memory checking argument, which can be batched, are identical to Spark.
References
Written by Carson Lee of Fractalyze
Last updated