Pianist
Introduction
Pianist (Plonk vIA uNlimited dISTribution) is a distributed proof generation system based on Plonk.
Protocol Explanation
Notation
M: number of machines
N: size of the entire circuit
T: size of each sub-circuit, where T=MN, a power of 2
Arithmetic Constraint System for Each Party
Each participant is responsible for a portion Ci of the global circuit C, and constructs a local constraint system to prove correctness of that sub-circuit.
This system consists of two main parts:
Gate Constraint — ensures that gate computations are correct
Copy Constraint — ensures that wire connections are consistent
Gate Constraint
Plonk targets fan-in 2 arithmetic circuits, where each gate has at most two inputs (left and right) and one output.
For each sub-circuit Ci assigned to machine Pi, the following univariate polynomials are defined for a circuit size T:
ai(X): left input
bi(X): right input
oi(X): output
Each of these is expressed in the Lagrange basis as:
where Lj(X) is a Lagrange polynomial, and can be computed in O(TlogT) time using the Number Theoretic Transform (NTT).
All gate operations are encoded in a single polynomial equation:
The q coefficients vary depending on the gate type:
Addition Gate
(qa,i,qb,i,qo,i,qab,i,qc,i)=(1,1,−1,0,0)⇒gi(X)=ai(X)+bi(X)−oi(X)Multiplication Gate
(qa,i,qb,i,qo,i,qab,i,qc,i)=(0,0,−1,1,0)⇒gi(X)=ai(X)bi(X)−oi(X)Public Input Gate
(qa,i,qb,i,qo,i,qab,i,qc,i)=(0,0,−1,0,ini,j)⇒gi(X)=−oi(X)+qc,i(X)
Finally, the gate constraint must hold at all evaluation points in the domain:
where ΩX={ωX0,…,ωXT−1}.
Copy Constraint
One of the core ideas in Plonk is modeling wire consistency using a permutation argument. To enable this, the verifier sends two random field elements η,γ∈F. The prover then constructs the following running product polynomial:
Where:
fi(X): actual wiring values
fi(X):=(ai(X)+η⋅σa,i(X)+γ)(bi(X)+η⋅σb,i(X)+γ)(oi(X)+η⋅σc,i(X)+γ)fi′(X): permuted reference wiring
fi′(X):=(ai(X)+ηkAX+γ)(bi(X)+ηkBX+γ)(oi(X)+ηkOX+γ)
Here, kA=1, and kB,kO are distinct non-residue constants.
What are non-residue constants? In general, when constructing permutation or lookup arguments over a finite field F, non-residue constants are used to apply distinct "shifts". Specifically, if ω is a primitive n-th root of unity generating a multiplicative subgroup H⊂F, then any k∈/H is considered a non-residue, or a coset shift. One can then define a new coset domain kH={kh∣h∈H} that is disjoint from H.
The prover must prove the following holds:
This is enforced via the following constraints:
Quotient Polynomial
To consolidate all constraints into a single polynomial relation, we prove the existence of a polynomial hi(X) satisfying:
where the vanishing polynomial over the evaluation domain is defined as:
Constraint System for Data-parallel Circuit
This section presents the core technique that aggregates locally proven gate and copy constraints from each machine into a single unified SNARK proof.
Assuming the sub-circuits C0,C1,…,CM−1 are independent, the goal is to:
Aggregate local proof polynomials into a single bivariate polynomial, and
Transform the distributed proof system into a globally verifiable SNARK
Naive Approach
The most straightforward idea is to aggregate sub-polynomials using powers of Y:
A multiplication gate would then be expressed as:
This introduces cross-terms such as Yi+jai(X)bj(X), making it difficult to reason about and distribute the proof efficiently.
Bivariate Lagrange Polynomial Aggregation
To address this, inspired by Caulk, we use Lagrange basis polynomials in Y:
Now, a multiplication gate becomes:
This removes any cross-terms and allows clean aggregation of constraints.
Global Constraint Definitions
Gate Constraint:
Copy Constraint:
Quotient Polynomial:
Verifier Strategy
The verifier checks that for every i∈[0,M−1], the constraint holds at Y=ωYi:
where VY(Y)=YM−1 is the vanishing polynomial for the Y-domain.
Constraint System for General Circuit
In the data-parallel setting, there are no connections between sub-circuits, so each machine can independently prove its portion. However, real-world circuits such as those in zkEVMs have highly interconnected gates.
In this general setting, the same wire value may appear across multiple sub-circuits, requiring a cross-machine copy constraint. To support this in a distributed setting, we must modify the permutation argument. The key idea is to include not just the position within a sub-circuit (X), but also the machine index (Y).
Modified Copy Constraint
The global permutation check requires each machine to construct a running product that satisfies:
Where:
fi(X): actual wire expression
fi(X)=(ai(X)+ηYδY,a,i(X)+ηXδX,a,i(X)+γ)(bi(X)+ηYδY,b,i(X)+ηXδX,b,i(X)+γ)(oi(X)+ηYδY,o,i(X)+ηXδX,o,i(X)+γ)fi′(X): sorted reference wiring
fi′(X)=(ai(X)+ηYY+ηXkAX+γ)(bi(X)+ηYY+ηXkBX+γ)(oi(X)+ηYY+ηXkOX+γ)
Now, the final value of the running product for each machine,
is no longer guaranteed to be 1. Thus, the constraints must be modified as follows:
The master node then collects the final values from each machine (z0∗,…,zM−1∗) and checks:
To do this, a new polynomial zlast(X) is constructed as:
This leads to the following constraints:
Quotient Polynomial
All constraints are combined into a single relation by proving the existence of hi(X) such that:
where VX(X)=XT−1 is the vanishing polynomial for the X-domain.
Global Constraint Definitions
Copy Constraints:
Here, F(Y,X) and F′(Y,X) are defined as:
Quotient Polynomial: All constraints are bundled into a single bivariate relation using HY(Y,X):
Polynomial IOP for Data-parallel and General Circuits
P→V: Send oracles for {A(Y,X),B(Y,X),C(Y,X)}
V→P: Send random challenges ηY,ηX,γ
P→V: Send oracles for {Z(Y,X),W(Y)}
V→P: Send random challenge λ
P→V: Send oracle for HX(Y,X)
HX(Y,X)=i=0∑M−1Ri(Y)⋅VX(X)gi(X)+λpi,0(X)+λ2pi,1(X)+λ4pi,3(X)V→P: Send random challenge α
P→V: Send oracle for HY(Y,α)
HY(Y,α)=VY(Y)G(Y,α)+λP0(Y,α)+λ2P1(Y,α)+λ3P2(Y)+λ4P3(Y,α)−VX(α)HX(Y,α)V queries all oracles at X=α,Y=β and checks:
G(β,α)+λP0(β,α)+λ2P1(β,α)+λ3P2(β)+λ4P3(β,α)−VX(α)HX(β,α)=?VY(β)HY(β,α)
🔶 The terms marked in orange are only required in the General Circuit case.
The soundness of this protocol is formally proven in Appendix A.
Communication Analysis
Except for HY(Y,X) and W(Y), all other polynomials can be distributed across M machines, with only their commitments needing to be sent.
For W(Y), it can be computed by the master node using the final values (z0∗,…,zM−1∗) received from all machines.
For HY(Y,X), the prover does not need to send the full polynomial. Instead, the prover collects evaluations at X=α from each machine and reconstructs HY(Y,α). For example:
A(Y,α)=i=0∑M−1Ri(Y)⋅ai(α)
Therefore, the overall communication complexity is O(M).
Proving Time Analysis
Each prover Pi computes zi(X) and hi(X) in O(TlogT)
The master node P0 computes W(Y) and HY(Y,α) in O(MlogM)
So the maximum latency per machine is O(TlogT+MlogM)
The total proving time across all machines is O(NlogT+MlogM)
This is a practical improvement over the original Plonk prover time complexity of O(NlogN).

Distributed KZG
The original KZG commitment scheme is designed for univariate polynomials. Since Pianist models the entire proof system as a bivariate polynomial f(Y,X), we need to extend KZG to support multiple variables. Furthermore, due to the distributed nature of the system, each machine must compute only a portion of the commitment and proof, while the master node assembles the full result.
Suppose we have the following bivariate polynomial:
Each machine holds a slice fi(X) defined as:
DKZG.KeyGen(1λ,M,T)
Generates the public parameters pp:
DKZG.Commit(f,pp)
Each prover Pi computes:
The master node P0 aggregates:
DKZG.Open(f,β,α,pp)
Each Pi computes:
fi(α)
q0(i)(X)=X−αfi(X)−fi(α)
π0(i)=[Ri(τY)⋅q0(i)(τX)]1
Then sends fi(α),π0(i) to P0
P0 computes:
π0=∑i=0M−1π0(i)
f(Y,α)=∑i=0M−1Ri(Y)fi(α)
Then computes:
f(β,α)
q1(Y)=Y−βf(Y,α)−f(β,α)
π1=[q1(τY)]1
Sends (z=f(β,α),πf=(π0,π1)) to V
DKZG.Verify(comf,β,α,z,πf,pp)
The verifier V parses πf=(π0,π1) and checks:
Proof:
The exponent in e(comf−[z]1,[1]2) is f(τY,τX)−f(β,α)
The exponent in e(π0,[τX−α]2) is:
i=0∑M−1Ri(τY)⋅(fi(τX)−fi(α))=f(τY,τX)−f(τY,α)The exponent in e(π1,[τY−β]2) is:
f(τY,α)−f(β,α)
Robust Collaborative Proving System for Data-parallel Circuit
Pianist introduces a new proof model to ensure security even in malicious settings.
This is called the Robust Collaborative Proving System (RCPS) — a protocol that ensures the final proof can still be correctly constructed and verified even when some machines are faulty or adversarial.
📌 For technical details, see Section 5 of the paper.
Conclusion

While standard Plonk (on a single machine) hits a memory limit at 32 transactions, Pianist can scale up to 8192 transactions simply by increasing the number of machines. With 64 machines, Pianist is able to prove 8192 transactions in just 313 seconds. This demonstrates that the number of transactions included in a single proof scales proportionally with the number of machines. Moreover, the time taken by the master node to gather and finalize the proof is just 2–16 ms, which is negligible compared to the overall proving time. This highlights that communication overhead in Pianist's parallel architecture is minimal.

While Figure 1 shows the performance on data-parallel circuits (e.g., zkRollups), Figure 2 extends the evaluation to general circuits with arbitrary subcircuit connections. In all circuit sizes tested, the proving time decreases nearly linearly with the number of machines, demonstrating strong scalability even in general circuits. In smaller circuits, the marginal benefit of scaling beyond 4–8 machines may diminish, but for larger circuits, the benefits of parallelization become increasingly significant. In other words, Pianist performs even better on larger circuits, showcasing a desirable scalability property.

As shown above, the memory usage per machine decreases significantly as the number of machines increases. For small circuits, memory savings are minor since the total memory requirement is already low, but for large circuits, scaling the number of machines directly determines whether proof generation is even feasible.
In conclusion, Pianist transforms the original Plonk proving system into a fully distributed, scalable framework that:
Enables parallel proving of large circuits across multiple machines,
Maintains constant proof size, verifier time, and communication overhead (all O(1)), and
Offers high practical utility for real-world zk systems like zkRollups and zkEVMs.
References
Written by Ryan Kim of Fractalyze
Last updated