Signed Bucket Index
AKA "NAF (non-adjacent form) method"
Note: this method is best used when negation for a group is cheap such as for an elliptic curve.
Motivation
When following a bucket-based method like Pippenger's algorithm, the amount of memory required for the buckets is demanding. How can we reduce the number of buckets per window from 2s−1 to 2s−1, where s is the number of bits per window, and reduce overhead?
Explanation
Naive Pippenger's
Let's take a look at an example of creating buckets with Pippenger's algorithm. Let's say the number of windows ⌈sλ⌉=2 and the bit size per window s=3. We'll look at the following example where k denotes scalars and P refers to points on an elliptic curve for the MSM problem.
With buckets , we create our windows W:
Introducing NAF for buckets
Note that any computation kP can be reduced to (k−2s)P+2sP where s is the bit width of k.
This means we can define the following NAF bucket method k′=NAF(k,P) for window Wj as such:
If 0<k<2s−1:
k′=k
Accumulate P into the k-th bucket Bk,j
If 2s−1≤k<2s:
k′=k−2s, then kP=−k′(−P)+2sP
Accumulate −P into the −k′-th bucket B−k′,j
Add P into the next window
Step 2c computes the 2sP part of the kP reduction and is also known as the "carry."
Let's change our naive Pippenger's to follow this method and generate our buckets and windows:


Notice that this method requires the presence of an extra carry bucket B1,3. This extra carry bucket should not exist. We'll take a look later at how to prevent this after this Naive Pippenger's to NAF method is fully shown.
Ashley Jeong: I'm unsure why they don't just add ki,j=2s−1 into the Bk,j bucket. Perhaps this is a source of optimization when implementing MSM.
Bucket Accumulation
B1,1
P1−P7
B2,1
P2−P6
B3,1
P3−P5
B4,1
−P4
B1,2
−P1
B2,2
P7−P2
B3,2
P6−P4−P3
B4,2
−P5
B1,3
P5+P4+P3+P2+P1
Bucket Reduction
Window Reduction
Preventing the formation of an extra bucket
1. Ensuring last window bit size is within limit
The formation of an extra carry bucket depends on the true bit size slast of the final window in relation to λ, the total bits of k, and s, the bits per windows.
Safe Case: slast<s−1
Example: λ=8, s=5 ⇒ k=k1+25k2
Then slast=3, and 0≤k2<23
Since k2+1<24, carry cannot occur ⇒ It’s safe to assume the final carry is 0.
Carry Case: slast≥s−1
Example: λ=8, s=4 ⇒ k=k1+24k2
Here slast=4, and 0≤k2<24
Since it is possible that k2+1≥23, carry can occur
Thus, if the restriction slast<s−1 is held, we can ensure there will be no new carry bucket created.
2. Naive Fix: Use Larger Bucket Only for Final Window
Use a larger bit-width bucket for the final window to absorb potential carry.
Cons: This is more complex and can be inefficient due to extra bucket space.
3. Use a Larger λ Than the Actual Modulus Bit Size
Let’s say we are working with BN254, where λ=256 and s=16. The actual scalar field is 254 bits.
We express k as:
Let M be the modulus of BN254. Since the maximum value of k is M−1, we have:
This ensures that no overflow occurs in the final window.
4. Use Yrrid’s Trick
Yrrid, the Zprize winner in 2022 for the category "Prize 1a: Accelerating MSM Operations on GPU" explains one more optimization they used for their solution to guarantee that no carry occurs in the final window. When the MSB of k is set to 1, they apply the following transformation to reset the MSB to 0, thereby preventing any carry:
Here, M is the modulus of the scalar field (i.e., curve order), and k∈[0,M).
Let mlast denote the s-bit chunk in the final window. Yrrid's trick is said to ensure that we don't overflow to a mlast+1 chunk.
For example, in F11 with λ=4, the values where the MSB of k is set to 1 are 8, 9, and 10. This means that -8, -9, and -10 will be transformed into 3, 2, and 1, respectively. With this transformation, the new maximum value of k becomes 23−1=7. If s=2, then:
In this case with the maximum value of k as 7, k2 will be at most 1.
Code Example
Here's a code snippet from Tachyon implementing NAF for bucket MSM:
References
Written by Ryan Kim & Ashley Jeong of Fractalyze
Last updated