Zero-Knowledge Proofs: An Introduction to the Fundamentals
A big thanks to Matt, Porter, Nick, Swen, and bl0ckpain for reviewing the articles in this series.
Introduction
Zero-knowledge proofs are among the most powerful tools devised by cryptographers. Unfortunately, the average person does not understand them. This article seeks to remedy this by providing a comprehensive overview of zero-knowledge proofs from their first principles. We cover the theory, math, and cryptography behind zero-knowledge proofs so that anyone can understand the latest developments on Solana, namely ZK Compression and the future of interoperability.
This article assumes knowledge of Solana’s programming model and cryptographic primitives inherent in blockchain systems (i.e., hash functions, hash pointers, Merkle trees, concurrent Merkle trees). If these concepts are new, I recommend reading these previous blog posts first:
- Cryptographic Tools 101 — Hash Functions and Merkle Trees Explained
- The Solana Programming Model: An Introduction to Developing on Solana
Note this article is designed with modularity in mind. While it is recommended that readers new to these topics read each section and subsection sequentially, if you are familiar with specific topics or want to learn about a particular topic, you will have no issue diving into a specific section directly.
This is also the first article in a two-part series on zero-knowledge proofs. It is highly recommended to read this article first before moving onto Zero-Knowledge Proofs: Its Applications on Solana.
The Theory Behind Zero-Knowledge Proofs
In 1989, MIT researchers Shafi Goldwasser, Silvio Micali (the founder of Algorand), and Charles Rackoff published The Knowledge Complexity of Interactive Proof Systems. They were working on systems where one party (i.e., the prover) exchanges messages with a second party (i.e., the verifier) to convince them that some mathematical statement is true. They were the first to ask: “What if the prover and verifier both don’t trust each other?” The concern here is how much information the verifier will learn during the course of these messages apart from the fact that the statement is true. For example, the prover might want to convince the verifier that they know a solution to a complex puzzle without revealing the solution itself.
What Kind of Problems Are We Trying to Solve in the First Place?
Graph Three-Coloring
Graphy three-coloring is a classic problem in computer science and graph theory. It involves coloring the vertices of a graph using three colors so that no adjacent vertices share the same color. For a graph with three vertices, this is straightforward. However, as we increase the number of vertices, it gets increasingly harder to do.
A real-world application of this would be university timetabling. In a large university, timetables need to be created such that no student has overlapping classes. Each class can be represented as a vertex in a graph, and edges represent shared students between classes. This ensures that no two classes that share a common student are scheduled at the same time. Additionally, constraints involve room capacities, preferred times for professors, and spreading out classes evenly over the week. Thus, time slots and rooms should be assigned to classes such that no two adjacent classes share the same time slot. This can be done using graph three-coloring.
Now, imagine the final schedule has to be verified by an external auditing firm. Due to specific privacy regulations, the university cannot share detailed student enrollment information with the auditors. Instead, the university must prove that the final schedule meets the required constraints without revealing the specifics of which students are enrolled in which classes.
To do so, the university must create a graph where each vertex represents a class. Edges would be drawn between two vertices if the corresponding classes have at least one student in common. The university would assign time slots to each class to ensure no two adjacent classes are scheduled simultaneously. The university would commit to its finished schedule using a cryptographic commitment scheme. This involves creating a cryptographic hash of each class’s assigned time slot and sharing the hashes with the verifier without revealing the time slots. The external auditing firm would then randomly select pairs of adjacent classes to challenge the assigned time slot. The university would reveal the committed time slots for the selected pair of adjacent classes and provide the original commitments (i.e., hashes) so the external auditing firm can verify the revealed values. The last few steps of challenging, revealing, and verifying are repeated until the external auditing firm is convinced no classes overlap. I highly recommend MIT’s interactive zero-knowledge 3-colorability demonstration to see these steps play out in real time.
The university wants to prove to the external auditing firm that they know of a correct schedule. That is, they want to prove they know something to another party. The nice thing about this problem is that it’s NP-complete.
NP-Complete
In computational complexity theory, a problem is NP-complete when:
- For any input to the problem, the output is either “yes” or “no”
- When the answer is “yes,” it can be demonstrated with a short solution
- The correctness of each solution must be able to be verified quickly and a brute force algorithm can find a solution by trying all possible solutions
NP-complete problems are important because they represent the hardest problems within the class NP (i.e., it’s a group of very difficult puzzle where verifying a guess solution is easy in polynomial time but finding the solution is hard). These problems are notable for their universal simulatability. That is, if we can solve one NP-complete problem quickly, we can reduce or transform any NP problem into an NP-complete problem and find its solution in polynomial time. Verifying solutions to NP-complete problems is also easy.
Thus, we have an entire class of problems that we can prove efficiently with zero-knowledge proofs. For example:
- Traveling Salesman Problem — Given a list of cities and distances between each pair, find the shortest possible route that visits each city once and returns to the origin city. This problem has numerous applications in logistics, route planning, manufacturing, and supply chain management
- Knapsack Problem — Given a set of items, determine the number of each item to include in a collection so the total weight is less than or equal to a given limit, and the total value is as large as possible. This is a common problem in the realm of finance and resource allocation
- Job Scheduling — Given a set of jobs with specific durations and deadlines, schedule them on a single machine to minimize the total penalty for late jobs. This has wide-ranging applications in computing, manufacturing, and project management
Moreover, the Cook-Levin theorem states that the Boolean satisfiability problem is NP-complete. That is, any problem where its variables can be replaced with true or false values in such a way that it eventually evaluates to true can be transformed into an NP-complete problem. This implies that any problem we can reduce to a series of true and false questions can be proved efficiently with zero-knowledge proofs.
Properties of a Zero-Knowledge Proof
Given the complexity and importance of NP-complete problems, proving solutions to this class of problems efficiently and securely becomes critical. Zero-knowledge proofs provide a method to do so without compromising the privacy of the information involved. Goldwasser, Micali, and Rackoff proposed that all zero-knowledge proofs must satisfy the following properties:
- Completeness — the Prover will eventually convince the Verifier if they are honest
- Soundness — a cheating Prover will never convince a Verifier of a false statement
- Zero-knowledge(ness) — the interaction between the Prover and Verifier only reveals if a statement is true and nothing else
By leveraging the robust properties of zero-knowledge proofs, we can prove facts or knowledge of certain information in a variety of contexts in a way that is both privacy-preserving and precise in its correctness. In future sections, we’ll explore why this is so valuable for applications requiring high security and efficiency levels, such as blockchains.
Interactive versus Non-Interactive
Zero-knowledge proofs tend to have the same three-step structure:
- The prover generates a solution to the computation (i.e., the witness), and then sends a commitment to the answer to the witness
- The verifier responds with a challenge value generated randomly
- The prover computes the final proof based on the commitment and the challenge
This structure is inherently interactive — the prover says they know something, and the verifier challenges them continuously until the likelihood of the prover tricking them is negligible. This isn’t ideal for most applications as the prover requires a single or multiple response(s) before generating the completed proof. This setup inherently presents the following challenges:
- The verifier could collude with the prover, allowing them to fake proofs
- The verifier could create fake proofs
- The verifier must store their secret values somewhere, which could be vulnerable to leaks or attacks
The Fiat-Shamir heuristic is a technique for taking an interactive proof of knowledge and creating a digital signature based on it. This way, some fact can be proven publicly without revealing the underlying information. The idea is that, instead of having the verifier send a random challenge value to the prover, the prover can compute this digital signature themselves by using a random function, such as a good cryptographic hash function. So, instead of having the verifier peek at the computation in 500 different places to check they’re all correct, the prover computes a Merkle root of the computation, uses the Merkle root to choose 500 indices pseudo-randomly, and provides the 500 corresponding Merkle branches of data. The key idea is that the prover does not know which branches they must reveal until after the data has been committed.
The astute reader may notice a fatal flaw in applying random sampling to spot-check computations — computation is inherently fragile. A malicious prover could flip a single bit in the middle of the computation, and the verifier could never find out. How can a verifier check every single piece of the computation without looking at each piece of the computation individually? Polynomials.
However, there’s a fair bit of math that we need to understand before we can talk about polynomials.
The Math Behind Zero-Knowledge Proofs
This is not meant to be an extensive introduction to the following mathematical fields — these sections could be entire articles in themselves. This is meant to be a brief introduction so that you can start to understand the mathematical fundamentals behind zero-knowledge proofs and how they actually work at a high level.
This article will also introduce you to proper mathematical notation. For example, in the following subsection on Set Theory, we introduce the symbols ∈, ∉, and ⊆. At the end of the day, these symbols are all placeholders for something else. Zero-knowledge proofs are not a beginner topic; thus, the vast majority of articles on the subject are not beginner-friendly. They will not delve into what these symbols mean, and it will be assumed that the reader understands this notation. Introducing this notation now is critical to making it seem less scary for those interested in diving deeper into zero-knowledge proofs. Try not to get lost in notation. Push through, as there will come a time when you’ll look at these symbols and see the underlying concepts instead of some Greek letter.
Set Theory
Set theory is a branch of mathematics that studies collections of objects. A set is a collection of distinct objects. These distinct objects are referred to as elements, or members, of the set. For example, consider a collection of fruits:
\(\text{Fruit} = \{\text{apple}, \text{orange}, \text{pear}, \text{banana}\}\)
Curly brackets are used in set notation to enclose a collection of elements to represent a set. This way, we know that apple, orange, pear, and banana are part of the set, and something like “potato” is not. The symbol ∈ is used to indicate set membership and is read as “is an element of.” Likewise, ∉ denotes that an element is not a member of a given set. So, we can say:
\(\text{apple} \in \text{Fruit} \text{ and } \text{potato} \notin \text{Fruit}\)
We would read this as “apple is an element of the set Fruit and potato is not an element of the set Fruit.”
Subsets
We can also have sets made up of other sets. A subset is a set that contains only elements found in another set. For instance, if we had:
\(\text{Citrus} = \{\text{orange}, \text{lemon}\}\)
\(\text{AllFruits} = \{\text{apple}, \text{orange}, \text{pear}, \text{banana}, \text{lemon}, \text{grapefruit}\}\)
We can say that the set Citrus is a subset of the larger set AllFruits. We could also use our Fruit set from earlier to say that Fruit is a subset of the larger set AllFruit. We would write this in set notation as
\(\text{Citrus} \subseteq \text{AllFruits}\)
\(\text{Fruit} \subseteq \text{AllFruits}\)
Why Should I Care?
Set theory is vital to understanding the concept of ranges and constraints. In the next sections on number theory and modular arithmetic, we’ll explore the idea of numbers being within a certain range. For example, we might have a set of possible values for a cryptographic key:
\(K = \{k_1, k_2, k_3, \ldots, k_n\}\)
Here, K defines the range of all possible keys. We can craft our zero-knowledge proofs such that certain constraints are applied to this set so that only certain values are valid. For example, we could say that the key must be a number between 1 and 5.
Thus, set theory provides the foundational language, tools, and notation for defining and analyzing sets of possible inputs, outputs, and states in cryptographic protocols. In zero-knowledge proofs, we often need to prove that an element belongs to a specific set or range without revealing the element itself.
I recommend checking out Khan Academy for some excellent practice questions on basic set notation.
Number Theory
Number theory is a branch of mathematics that studies integers and arithmetic functions. We can define integers as a set of whole numbers (i.e., numbers that aren’t fractions), including positive and negative numbers and zero. We can define the set of integers more formally as:
\(\mathbb{Z} = \{\ldots, -2, -1, 0, 1, 2, \ldots\}\)
Here, ℤ is used to denote the set of integers, and the ellipses show that integers go from negative infinity to positive infinity. For example, the number 12 is an integer, and -1978649832794275987235987239572395235355416847 is also an integer.
Rational Numbers
Rational numbers are integer numbers that we can express as a fraction where the denominator (i.e., the number below the line in a common fraction, a divisor) is not zero. For example, \(\frac{1}{2}\), 74, \(\frac{2}{3}\) are all rational numbers. We can define rational numbers more formally as a set of integers that can be expressed as the fraction pq where p is the numerator, q is the denominator, and q is not 0. The symbol ℚ is used to denote rational numbers. In set notation, we would write:
\(\mathbb{Q} = \left\{ \frac{p}{q} \mid p, q \in \mathbb{Z}, q \ne 0 \right\}\)
While this may look scary at first glance, this is exactly what the previous sentence describes. We would read this weird-looking mathematical jargon as “Q is the set of all fractions p over q, where p and q are integers, and q is not equal to zero.”
Real Numbers
Real numbers encompass both rational and irrational numbers. Irrational numbers are those that cannot be expressed as a simple fraction and have non-repeating, non-terminating decimal points. For example, pi (i.e., π) and \(\sqrt{2}\) (i.e., 1.4.1421…) are irrational numbers. We’ll skip over the set notation for now, but note that real numbers are denoted by the symbol ℝ.
Why Should I Care?
Number theory is deeply connected to set theory as it involves the study of specific sets of numbers (e.g., rational numbers). These sets are often the basis for defining ranges and constraints in mathematical and cryptographic problems.
We can also see how both number theory and set theory are connected. For example, we can say that the set of all integers ℤ is a subset of rational numbers ℚ. This is evident in our definition of real numbers in set notation above when we state that the numerator and denominator are integers.
Modular Arithmetic
Modular arithmetic, also known as clock arithmetic, is a system of numerical operations for integers where numbers “wrap around” after reaching a specific value, known as the modulus. The idea here is that, instead of having an infinite set of numbers, we work with the first n positive numbers to work with.
Clocks
Consider an analog clock (I hate that in this day and age I have to specify it has hands and isn’t digital) with the numbers 1 through 12. If it was 11 o’clock and we wanted to know the time two hours from now, we wouldn’t get 13 o’clock. Instead, we would wrap around to 1 o’clock. This can be expressed as \(11 + 2 \equiv 1 \pmod{12}\). The proper mathematical expression here would be \(13 \mod 12 = 1\). Programmers would be familiar with using the modulo operation in the format \(13 \% 12 = 1\).
Modulo Operation
When we write n mod k, it means we want the remainder when n is divided by k. This is known as the modulo operation. For example:
- 25 mod 3 means dividing 25 by 3, which gives a remainder of 1 because \(25 = 8 \times 3 + 1\)
- 15 mod 4 means dividing 15 by 4, resulting in a remainder of 3 because \(15 = 3 \times 4 + 3\)
In modular arithmetic, the remainder is always non-negative.
Why Should I Care?
Understanding modular arithmetic is crucial because it provides insight into the behavior of numbers under constraints, which is valuable to cryptography. Modular arithmetic is the foundation of many cryptographic algorithms and is used throughout computer science, engineering, and any field that requires the secure handling and encryption of data.
Consider the computation x + y = z. If we’re working with a finite field defined by a prime p = 17 (we’ll cover this very shortly. For now, think of it as the set of all integers from 0 to 16 that wraps around at 17). The computation would be (x + y) mod p = z in this field. If x = 12 and y = 15, the computation would be:
\((12 + 15) \mod 17 = z\)
\(27 \mod 17 = z\)
\(10 = z\)
The use of modular arithmetic here allows us to perform computations within a manageable range of values, defined by the prime p. This is especially important because computers and processors have limited space, so we typically work with fixed-size integers like u32 or u64. Modular arithmetic ensures that our values stay within these bounds. Additionally, using prime numbers adds a layer of complexity. This is vital from a cryptographic standpoint because it enhances security and makes certain mathematical properties more predictable and reliable.
For example, modular arithmetic is used in zk-SNARKs to ensure computed values remain within specific, manageable bounds. They’re also used to create arithmetic circuits over a given set of numbers. This is done so we can express computations while ensuring they can be verified efficiently. Here, the prover would have to prove that they performed this computation without revealing the values of x, y, and z.
I highly recommend checking out the Art of Problem Solving’s problem set and Joseph Zoller’s Modular Arithmetic Practice for more hands-on experience with solving modular arithmetic problems.
Group Theory
Group theory is a branch of mathematics that studies algebraic structures known as groups. A group is a set of elements with an operation satisfying the following conditions, known as the group axioms:
- Closure — The result of any arithmetic calculation will be another element within the set
- Associativity — When performing the same operation on three or more elements, the order in which the elements are grouped doesn’t matter; the result will be the same
- Identity Element — There is an element that you can perform any operation on with any other element, and its value will be unchanged
- Inverse Element — There is an element that you can perform any operation on with any other element, and it will result in the identity element
Formally, they are defined as:
- Closure — If a and b are members of the group, then the result of the operation (often denoted as \(a \times b\), a*b, \(a \circ b\), or ab) is also a member of the group. Formally, we can write \(\forall a, b \in G \mid a \circ b \in G\). We could read this as: “for all values of elements a and b in the set G, the result of the operation between a and b is in G”
- Associativity — If a, b, and c are members of the group, then (ab)c = a(cb). Formally, we can write \(\forall a, b, c \in G \mid (a \circ b) \circ c = a \circ (b \circ c)\). We could read this as: “for all values of the elements a, b, and c in the set G, the operation of a and b followed by c is equal to the operation of a followed by the operation of b and c
- Identity Element — There exists an element e in the group such that, for every element a in the group, the operation \(e \circ a = a \circ e = a\). Formally, we would write this as \(\exists e \in G \mid \forall a \in G \mid e \circ a = a \circ e = a\). We could read this as “there exists an element e in the set G, where for every element of a in the set G, the operation of e followed by a is equal to the operation of a followed by e, which is equal to a
- Inverse Element — For each element a in the group, there exists an element b in the group such that \(a \circ b = b \circ a = e\), where e is the identity element. Formally, we would write this as \(\forall a \in G \mid \exists b \in G \mid a \circ b = b \circ a = e\). We could read this as “for all values of the elements a in the set G, there exists an element b in the set G, where the operation of a followed by b is equal to the operation of b followed by a, which is equal to the identity element
We can break down all of this mathematical jargon into something more digestible with an example. Consider the set of integers under addition. We can say this set forms a group because it satisfies all four group axioms:
- Closure — If you add two integers together, you get another integer
- Associativity — \((5 + 4) + 3 = 5 + (4 + 3)\)
- Identity Element — The number zero is considered the identity element because any integer plus zero does not change its value. For example, \(7 + 0 = 0 + 7 = 7\)
- Inverse Element — The inverse of any integer is its negation because adding the two together would return the identity element. For example, \(5 + (-5) = 0\). We can generalize this to \(n + (-n) = 0\)
We can also expand this to a harder example, such as the set of non-zero rational numbers \(\mathbb{Q} = \left\{ \frac{a}{b} \mid a, b \in \mathbb{Z}, b \ne 0 \right\}\) with the operation of multiplication. This also forms a set:
- Closure — Multiplying two non-zero rational numbers will result in a non-zero rational number
- Associativity — \((ab \times cd) \times ef = ab \times (cd \times ef)\)
- Identity Element — The number 1 is considered the identity element because any non-zero rational number multiplied by one does not change its value. For example, \(\frac{1}{2} \times 1 = 1 \times \frac{1}{2} = \frac{1}{2}\)
- Inverse Element — The inverse of any non-zero rational number is its reciprocal (i.e., flip the numerator and denominator) because this will equal 1, the identity element. For example, \(\frac{3}{5} \times \frac{5}{3} = 1\)
Subgroups
A subgroup is a group within a group. If we were to say that subgroup H of group G is a subset of G, we would need to satisfy the following group axioms:
- Closure — If a and b are in H, then the result of the operation between the two must also be in H
- Associativity — This axiom is inherited from the larger group G
- Identity Element — The identity element of G must also be in H
- Inverse Element — For every element a in H, there must be some element b also in H such that ab and ba both equal the identity element
The classic example here is the set of even integers under addition being a subgroup of the set of integers under addition:
- Closure — Adding two even integers results in another even integer
- Associativity — This axiom is inherited from integers. For example, \(2 + (4 + 6) = (2 + 4) + 6\)
- Identity Element — The number zero is considered the identity element because any even integer plus zero does not change its value. Zero is also found in the set of integers
- Inverse Element — The inverse of any even number is also an even number. For example, the inverse of 4 is -4 because \(4 + (-4) = 0\), which is the identity element
We can apply this to a harder example. Consider the set of all non-zero rational numbers (i.e., ℚ*) under multiplication. We can prove that ℚ* is a subgroup of the set of non-zero real numbers (i.e., ℝ*) under multiplication:
- Closure — If a and b are non-zero rational numbers, their product ab is also a non-zero number. For example, \(\frac{1}{2} \times \frac{3}{4} = \frac{3}{8}\) which is a non-zero rational number
- Associativity — The multiplication of rational numbers is associative. For example, \((\frac{1}{2} \times \frac{3}{4}) \times \frac{5}{6} = \frac{1}{2} \times (\frac{3}{4} \times \frac{5}{6})\)
- Identity Element — The number 1 is considered the identity element because multiplying any non-zero rational number by 1 leaves it unchanged. For example, \(\frac{1}{2} \times 1 = 1 \times \frac{1}{2} = \frac{1}{2}\)
- Inverse Element — Every non-zero rational number \(a = \frac{p}{q}\) has a multiplicative inverse \(a^{-1} = \frac{q}{p}\), which is also a non-zero rational number and is equal to the identity element. For example, let a = \(\frac{2}{3}\). The inverse is \(\frac{3}{2}\) because \(\frac{2}{3} \times \frac{3}{2} = 1\)
Since ℚ* satisfies all the group axioms, it forms a group. Additionally, because ℚ* is a subset of ℝ* and inherits its properties, we can state that ℚ* is a subgroup of ℝ*.
Why Should I Care?
Groups form the foundation of various mathematical and cryptographic concepts and structures. For example, cryptosystems like RSA and elliptic curve cryptography heavily rely on the properties of groups and their operations. Understanding subgroups helps understand larger groups' structure by examining their smaller, more manageable subsets. Groups provide a basic framework for understanding symmetry, operations, and transformations, which will be vital as we move into the next section on fields.
Fields
A field is a set of elements that satisfies the field axioms for addition and multiplication and is a commutative division algebra (i.e., division, except by zero, is always possible). The field axioms are generally written in additive and multiplicative pairs:
- Addition
- Associativity — \((a + b) + c = a + (b + c)\)
- Commutativity — \(a + b = b + a\)
- Distributivity — \(a(b + c) = ab + ac\)
- Identity Element — \(a + 0 = 0 + a = a\)
- Inverse Element — \(a + (-a) = 0\)
- Multiplication
- Associativity — \((ab)c = a(bc)\)
- Commutativity — \(ab = ba\)
- Distributivity — \((a + b)c = ac + bc\)
- Identity Element — \((a + b)c = ac + bc\)
- Inverse Element — \(a \times a^{-1} = a^{-1} \times a = 1, \text{ if } a \ne 0\)
Finite Fields and Generators
A finite field is a field with a limited set of elements. Finite fields are also referred to as Galois Fields. The number of elements is called the order, or the cardinality, of the field. The number of elements will always be a prime power. The nice thing about finite fields is that any arithmetic operation performed on elements within the field will stay in the field. This is because all operations are performed modulo the order of the field, causing values to wrap around.
Every finite field has a generator. A generator can generate all the elements in the field via exponentiation. This means we can take the generator and increment its exponent by one until we have all the elements in the field. Thus, a generator is an element in the field that, through its powers, can produce every non-zero element of the field.
For example, imagine we take the set of integers modulo p = 7 and have the field \(\mathbb{Z}_7 = \{0, 1, 2, 3, 4, 5, 6\}\). If we want to find the generator g of \(\mathbb{Z}_7^*\) (i.e., the multiplicative group of non-zero elements of \(\mathbb{Z}_7\), we need to ensure that g1, g2,g3, etc. can generate all the non-zero elements of the field.
Let’s check if 3 is a generator:
- \(3_1 \equiv 3 \pmod{7}\)
- \(3_2 \equiv 2 \pmod{7}\)
- \(3_3 \equiv 6 \pmod{7}\)
- \(3_4 \equiv 4 \pmod{7}\)
- \(3_5 \equiv 5 \pmod{7}\)
- \(3_6 \equiv 1 \pmod{7}\)
The powers of 3 generate all non-zero elements of \(\mathbb{Z}_7\). Therefore, 3 is a generator of the multiplicative group \(\mathbb{Z}_7^*\).
Why Should I Care?
Cryptography is a science that deals with finite sets. It forms a fundamental understanding crucial to tackling topics like the Discrete Log Problem, encryption, Diffie-Hellman Exchange, and elliptic curves. Generators allow for arithmetic on encrypted polynomials without decrypting them (i.e., homomorphic encryption). That is, we can compute encrypted data while preserving the privacy of the underlying values. Understanding this section is critical to the zero-knowledge part of zero-knowledge proofs.
I recommend visiting Bill’s Security Site, which provides an interactive example of generating finite fields with specific parameters and the underlying theory in Python.
Functions
A function is an expression, rule, or law that defines a relationship between two variables — the independent variable and the dependent variable. These two variables are often described as the cause and the effect, respectively. This relationship is often denoted as y = f(x), which is read as “f of x.” For every x value, there is a unique y value, meaning that f(x) cannot have more than one value for the same x.
Imagine a line defined by \(y = 3x + 4\). This is a linear function where plugging in a value of x will return a corresponding y value. These two values together form a point on the line. For example, we can rewrite the equation as \(f(x) = 3x + 4\) and evaluate for when x = 1, which is \(f(1) = 7\). Functions can also have multiple variables. For example, take the formula for the area of a triangle: \(A = \frac{bh}{2}\). Here, A (i.e., area) is defined as a function of both b (i.e., base) and h (i.e., height).
Domain and Range
The domain of a function is the set of all possible input values (i.e., independent variables) that the function can accept. The range of the function is the set of all possible output values (i.e., dependent variables) that the function can produce.
For the function \(y = 2x + 2\):
- The domain is all real numbers because any number from negative infinity to positive infinity will work. For example:some text
- If x = 2.5, then \(y = 2(2.5) + 2 = 7\).
- If x = -9234525, then \(y = 2(-9234525) + 2 = -18469048\)
- The range is also all real numbers because any number from negative infinity to positive infinity can be produced. For example: some text
- To find y = -50, we solve -50 = 2x + 2, which gives x = -26.
- To find y = 0, we solve 0 = 2x + 2, which gives x = 0
Why Should I Care?
Functions are crucial to understanding polynomials. They are special functions involving variables raised to various powers and their coefficients. Polynomials are fundamental algebraic structures that provide the basis for constructing cryptographic protocols. In the next section, we’ll explore polynomials in detail, looking at their properties and significance in zero-knowledge proofs.
I recommend checking out Paul’s Online Notes and working through their practice problems to gain a better understanding of functions.
Polynomials
A polynomial is a function consisting of multiple variables and coefficients that involves only the operations of addition, subtraction, multiplication, and non-negative integer exponentiation of variables. Polynomials are typically written in the form:
\(P(x) = a_n x^n + a_{n-1} x^{n-1} + \ldots + a_1 x + a_0\)
Where \(a_n x^n + a_{n-1} x^{n-1} + \ldots + a_1 x + a_0\) are coefficients, and x is the variable. The highest power of the variable x with a non-zero coefficient is called the degree of the polynomial.
Polynomials can be classified as univariate, involving a single variable (which is the form written above), or multivariate, involving multiple variables (e.g., \(P(x, y) = a_nx^ny^n + a_{n-1}x^{n-1}y^{n-1} + \cdots + a_1xy + a_0\). Sum-Check is an example of a protocol that uses multivariate polynomials. However, most of the time, zero-knowledge proofs only require a single variable.
Common names assigned to polynomials based on their degree are:
- Degree 0 — Non-zero constant (e.g., \(P(x) = 6\))
- Degree 1 — Linear (e.g., \(P(x) = 2x - 7\))
- Degree 2 — Quadratic (e.g., \(P(x) = 8x^2 - 3x + 1\))
- Degree 3 — Cubic (e.g., \(P(x) = 3x^3 - 4x\))
If we have two non-equal polynomials of at most degree d, they can intersect at no more than d points (e.g., if we equate a linear function to a cubic function, they can intersect up to three times). This property flows from how we find shared points. If we want to find where two polynomials intersect, we equate them. In the following subsection, we’ll practice finding a polynomial’s roots, which is finding where a given polynomial intersects with the x-axis. The Fundamental Theorem of Algebra states that a degree d polynomial can have at most d solutions, and, therefore, at most d shared points.
Roots of Polynomials
The roots, or zeroes, of a polynomial are the values of x for which the polynomial equals zero. In other words, if P(x) is a polynomial, a root r is a solution to the equation P(r) = 0. To find the roots, we must be familiar with factoring polynomials. Factoring is solving what we multiply to get a given quantity. For example, there are several ways to factor 12:
- \(0.5 \times 24\)
- \(1 \times 12\)
- \(2 \times 6\)
- \((-2) \times (-6)\)
- \(3 \times 4\)
- \(3 \times (-2) \times (-2)\)
- \(2 \times 2 \times 3\)
A common method of factoring is completely factoring the number into positive prime factors. When factoring, it’s always best to start with the greatest common factor (GCF) that all terms have in common. For example,
\(6x + 3 \rightarrow 3(2x + 1)\)
In the example above, both terms (i.e., 6x and 3) are divisible by 3, so they share a GCF of 3. Thus, the factors are 3 and \(2x + 1\). We reverse the distributive property: \(3 \times 2x\) and \(3 \times 1\). Finding the roots is solving for x when \(P(x) = 0\).
Factoring is straightforward for polynomials with two terms. And it’s even more straightforward when given a graph, as the roots are where the polynomial intersects with the x-axis. However, adding three or more degrees can make it more complex. I recommend reading the article How to Factor Polynomials Explained for a more in-depth explanation.
Knowing the exact nuances of factoring different polynomials isn’t crucial to reading the rest of this article. For our purposes, we are interested in when a polynomial is equated to another value. In this instance, we are interested in when a polynomial equals zero. Later on, we’ll be interested in when one polynomial is equal to another polynomial or the difference between two polynomials is identically zero (i.e., all the coefficients are zero), which involves checking whether a given polynomial has certain roots.
The Schwartz-Zippel Lemma
The Schwartz-Zippel Lemma is a probabilistic tool for checking whether a polynomial equation is always true. It evaluates the polynomial at random points and checks whether the result is zero.
Imagine a complex equation involving the variables \(x_1, x_2, \ldots, x_n\). If this equation is a polynomial and not just some random collection of terms, the Schwartz_Zippel Lemma helps us verify if it holds true for all possible values of these variables.
This is how it works:
- Let \(P(x_1, x_2, \ldots, x_n)\) be a polynomial with total degree d (i.e., the highest sum of exponents in any term)
- Choose a finite set S from the field (like picking a set of numbers)
- Randomly select values for each variable \(x_1, x_2, \ldots, x_n\) from the set S
The lemma states the probability of P being zero at these randomly chosen points is at most \(\frac{d}{S}\). This means that if the polynomial is not zero, it’s very unlikely that it will appear to be zero by some random chance. This is particularly useful for zero-knowledge proofs, where we need to verify polynomial identities efficiently.
Lagrange Interpolation
Lagrange Interpolation is a method of constructing a polynomial that passes through a given set of points. The Lagrange polynomial is the polynomial of the least degree that passes through each given point. For n points, a n-1 degree polynomial that goes through all the points can be created. For example, if you have two points on a plane, we can define a straight line that goes through both points. If we have three points on a plane, we can define a quadratic polynomial (i.e., \(y = ax^2 + bx + c\)) that goes through all the points. And so forth.
Why Should I Care?
Polynomials are a single mathematical object that can contain an unbounded amount of information — think of a polynomial as a list of integers, and this becomes self-evident. Thus, a single equation between polynomials can represent an unbounded number of equations between numbers. If someone can verify a given equation between polynomials, they are implicitly verifying all the possible equations simultaneously. This is how we secure non-interactive proofs from the perils of a malicious prover and having to rely on randomly spot-checking a given computation.
Polynomials also have several properties that make them useful for creating proofs:
- Given enough points for a given polynomial, the entire polynomial can be reconstructed
- A small change in a polynomial’s input can lead to a significant change in its output, making errors easier to detect
- Polynomials can detect and correct errors in computations, similar to how erasure coding makes data fault-tolerant (which is crucial to how Turbine works)
Zero-knowledge proofs are in the business of proving certain computations. Polynomials are invaluable to this business as we can craft them with specific characteristics in mind. Say you have a computation or a set of data points that you’d want to prove. The easiest way to go about this is to encode it into a polynomial and use its properties to create a proof:
- Encode the data into a polynomial \(P(x)\)) such that evaluating \(P(x)\) at certain points results in the original data or the result of a given computation
- To ensure the polynomial meets the given criteria (e.g., all values being within a range), create a constraint polynomial \(C(x)\). For example, \(C(x) = (P(x) - 0)(P(x) - 1)\) ensures \(P(x)\) is either 0 or 1
- Transform the problem into proving \(P(x)\) satisfies certain conditions for your data set or computation
- Create a known polynomial \(H(x)\) that is a multiple of \(P(x)\) that encodes these conditions
- The prover commits values of \(P(x)\) and any related polynomials by creating a Merkle tree of the evaluations and sends the root hash to the verifier
- The verifier randomly selects a few points and asks the prover to provide the values of \(P(x)\) and \(C(x)\) at those points
- The verifier checks the provided values against the committed root hash and the expected polynomial relationships
It doesn’t matter how large the polynomial is; since we’re using the polynomial commitment, we can verify equations between polynomials in a short amount of time. This is a highly succinct and efficient way to create proofs. Any errors are amplified, and, using techniques such as the Fiat-Shamir heuristic, these proofs can become non-interactive so anyone can verify it without further interaction.
To further your understanding, I recommend checking out the following practice problems:
- Polynomial Practice Problems
- Finding Zeroes of Polynomials Practice Problems
- Factoring Polynomials: Very Difficult Problems with Solutions
Khan Academy also has an extensive unit on polynomial expressions, equations, and functions.
Now, to better understand polynomial commitments, it’s necessary to explore the cryptography behind zero-knowledge proofs.
The Cryptography Behind Zero-Knowledge Proofs
Symmetric Encryption
Symmetric encryption is an encryption technique where the same key is used to encrypt plaintext and decrypt ciphertext. This key is often referred to as the secret key, or the private key because the use of a single key necessitates that it must be kept secret. However, this also means that the secret key must be shared with the two parties before they can communicate securely. Thus, managing and distributing the secret key in a secure manner can be challenging and is susceptible to leakage if not handled properly. Despite this drawback, symmetric encryption is fast, efficient, and requires less computational power and memory than other encryption schemes.
Common symmetric encryption algorithms include:
- Advanced Encryption Standard (AES) — a variant of the Rijndael block cipher that is widely used around the world for securing data. It supports key sizes of 128, 192, and 256 bits
- ChaCha20 — a modern, efficient stream cipher developed by Daniel J. Bernstein. It is a variant of the Salsa20 stream cipher, which takes advantage of the add-rotate-XOR (ARX) operations. It maps a 256-bit key, a 64-bit nonce, and a 64-bit counter to a 512-bit block of the keystream, meaning a user can efficiently seek any position on the keystream in constant time
While symmetric encryption is robust and efficient, it requires a secure method to exchange keys. One such method is the Diffie-Hellman key exchange, which allows two parties to securely share a secret key over an insecure channel. However, this method is based on asymmetric encryption principles, which we’ll cover in the next section.
Asymmetric Encryption
Asymmetric encryption, also known as public key encryption, is a method that uses a pair of related keys (i.e., a public key and a private key) to encrypt and decrypt information. The public key is shared openly, while the private key is kept secret. When a sender wants to encrypt a message, they use the recipient’s public key. Upon receipt, the recipient decrypts the message using their corresponding private key. Data encrypted with the public key can only be decrypted with the private key. Thus, asymmetric encryption allows for secure communication over insecure channels, as the decryption key is never shared.
Asymmetric encryption is advantageous because it provides a high level of security since the private key is never shared. It also simplifies key distribution since the public key can be shared openly and enables digital signatures. However, asymmetric is more intensive and slower computationally than symmetric encryption. Managing key pairs can also become complex, especially in systems with a large number of users and where key pairs aren’t intuitive.
Common asymmetric encryption algorithms include:
- Rivest-Shamir-Adleman (RSA) — one of the oldest and most widely used public key cryptosystems for secure data transmission. It was developed in the 1970s and relies on the practical difficulty of factoring the product of two large prime numbers
- Elliptic Curve Cryptography (ECC) — an approach to public key cryptography based on the algebraic structure of elliptic curves over finite fields. It offers similar security to RSA but with smaller key sizes, leading to faster computations and reduced storage requirements. Solana uses the Ed25519 elliptic curve for its keypair generation
Digital Signatures
Digital signatures are a key aspect of public key cryptography, providing a way to verify the authenticity and integrity of a message, software, or digital document. A digital signature is created using the sender’s private key and can be verified by anyone with access to the corresponding public key. This ensures that the message was sent by a legitimate sender and has not been altered.
Common algorithms used for digital signatures include:
- Digital Signature Algorithm (DSA) — an approach based on modular exponentiation (i.e., exponentiation performed over a modulus) and the discrete logarithm problem
- Elliptic Curve Digital Signature Algorithm (ECDSA) — a variant of DSA that uses elliptic curve cryptography to provide a higher level of security with smaller key sizes
Discrete Logarithm Problem
The discrete logarithm problem is the task of finding the exponent k in the equation \(g^k \equiv h \pmod{p}\), where:
- g is a known base (i.e., a generator)
- h is a known result (i.e., an element of the group)
- p is a prime number (i.e., the order of the group)
- k is the unknown exponent (i.e., the discrete logarithm of h to the base g)
To break this down, if you know the values of g, h, and p, the discrete logarithm problem is to find k. For example, given the equation \(2^k \equiv 9 \pmod{23}\), the goal is to find k.
The discrete logarithm problem is considered difficult to solve efficiently, especially for large numbers. Because of this difficulty, it forms the basis for the security of various cryptographic systems, including Solana, ElGamal Encryption, Digital Signature Algorithms (i.e., DSA and ECDSA), and the Diffie-Hellman Key Exchange
Diffie-Hellman Key Exchange
The Diffie-Hellman key exchange is a method of securely exchanging cryptographic keys over a public channel. The simplest and original implementation (i.e., the Finite Field Diffie-Hellman) is as follows:
- Alice and Bob publicly agree on two numbers — a large prime number p (i.e., the modulus) and a base g (i.e., the generator), which is a primitive root modulo p
- Alice selects a secret integer a and then sends Bob \(A \equiv g^a \pmod{p}\)
- Bob selects a secret integer b and then sends Alice \(B \equiv g^b \pmod{p}\)
- Alice computes \(s \equiv B^a \pmod{p}\)
- Bob computes s = \(A^b \pmod{p}\)
Alice and Bob now have the same secret value. This is because both computations result in the same secret s because:
\(s \equiv (g^b)^a \pmod{p} \equiv (g^a)^b \pmod{p} \equiv g^{ab} \pmod{p}\)
This shared secret s can then be used as a key for symmetric encryption, allowing both Alice and Bob to communicate securely. The security of the Diffie-Hellman key exchange relies on the difficulty of the discrete logarithm problem. Without knowing the secret values a and b, it is computationally infeasible for someone eavesdropping to derive the shared secret. This is known as a one-way function—it is relatively easy to compute but extremely difficult to invert.
While the Finite Field Diffie-Helman key exchange is secure and widely used, it requires large key sizes to ensure security. For example, if Alice and Bob both publicly chose a modulus of 23, then it would be a lot easier to crack since there are only 23 possible results of n mod 23. Thus, this can be computationally intensive and less efficient. To address these challenges, elliptic curve cryptography (ECC) provides a more efficient alternative since it offers the same level of security with significantly smaller key sizes and faster computation.
Elliptic Curves
An elliptic curve is defined by the equation \(y^2 = x^3 + ax + b\), where a and b are constants. Elliptic curve cryptography is simply working with points on a given elliptic curve. These curves have a number of unique properties, which make them useful for cryptography. For example:
- Addition of Points — Given two points, P and Q, on a given elliptic curve, their sum R = P + Q will also be a point on the curve. Preethi Kasireddy’s article A Painless Guide to Cryptography for Zero-Knowledge Proofs has a nice explanation of how to add points on an elliptic curve
- Scalar Multiplication — Given a point P on a given elliptic curve and an integer k, scalar multiplication is the process of adding the point P to itself k times. This will produce another point (i.e. kP) on the curve. This is used to generate public keys from private keys
- Discrete Logarithm Problem — The discrete logarithm problem on elliptic curves is much harder to solve than its integer equivalent. Given the points P and q = kP, it is computationally infeasible to determine k if the curve parameters are chosen correctly. This means that elliptic curves provide the same security as traditional systems with a much smaller key size, making them more efficient
With our newly found knowledge of group theory, we can say that certain elliptic curve equations will satisfy the group of axioms:
- Every two points can be added to give a third point
- It does not matter what order the two points are added
- If you have more than two points to add, it does not matter what order they are added in
- There is an identity element (i.e., zero added to any point on the curve results in the same point)
I highly recommend reading Elliptic Curve Cryptography by Georgie Bumpus to explore this group structure more thoroughly.
Elliptic curves offer the same level of security as other traditional cryptosystems, like RSA, but with much smaller key sizes. For example, a 256-bit key in ECC provides comparable security to a 3072-bit key in RSA. This is beneficial because:
- Smaller key sizes lead to faster encryption and decryption
- Keys and certificates require less space
- Smaller keys reduce the amount of data transmitted, which is beneficial in environments with limited bandwidth, such as a blockchain
Montgomery Curves
Montgomery curves are an elliptic curve defined by the equation \(By^2 = x^3 + Ax^2 + x\) over a finite field where A and B are constants, B does not equal zero, and A is not -2 or 2. These curves are special because elliptic curve multiplication can be implemented more efficiently using a Montgomery ladder.
A Montgomery ladder essentially takes a point P on a Montgomery curve and a scalar k, initializes two points from infinity to P and updates each bit of the scalar k from the most significant bit to the least significant point. The key idea is to maintain two points and update them in a constant sequence of operations regardless of the bits of the scalar k.
This is important for a few reasons:
- It is resistant to side-channel attacks, which are any attack based on the extra information that can be gathered due to the implementation or design of a given protocol or algorithm. This is a really, really nerdy rabbit hole I’d recommend going down. These types of attacks range from the varying power consumption by the hardware during computation to leaked electromagnetic radiation
- There is no need for the y-coordinate as scalar multiplication can be performed using the x-coordinates only
- It operates in constant time, meaning the time taken for a given computation is independent of the input value
Montgomery curves are widely used in cryptographic protocols, such as the X25519 algorithm for key exchange, which uses the Montgomery form of the curve Curve25519. This algorithm is the foundation of modern secure communications, including implementations in popular protocols like TLS.
Edward Curves
Edwards Curves are a type of elliptic curve defined by the equation \(x^2 + y^2 = 1 + d x^2 y^2\) where d is a non-zero constant different from 1.
These curves are important because:
- Efficiency in Point Operations — The addition of two points on an Edwards curve is more efficient compared to other forms of elliptic curves. The formulas for point addition and doubling are simpler and involve fewer field operations, meaning they are faster to compute
- Unified Addition Formula — Edwards curves use a unified addition formula, meaning the same formula can be used for point addition and point doubling. This reduces the potential for implementation errors and enhances security
- Completeness — For certain values of d, Edwards curves are complete. This means that the addition law covers all possible inputs without any exceptions.
- Resistance to Side-Channel Attacks — Like Montgomery curves, Edwards curves are resistant to side-channel attacks due to their uniform and predictable operation patterns.
A widely used Edwards curve is Edwards25519, which is defined by the equation \(x^2 + y^2 = 1 - \frac{121665}{121666} x^2 y^2\). This curve is known for its efficient arithmetic and its 256-bit key size. Its signature scheme is implemented in various security protocols and systems, including Solana, OpenSSH, and Tor. Monero uses Edwards25519 as a basis for its key pair generation.
Why Should I Care?
Elliptic curves are critical in zero-knowledge proofs due to their efficiency and security-enhancing properties. (Note The use of elliptic curves allows for the creation of smaller and faster proofs, which is essential for any sort of practical implementation. We’ll analyze this further when talking about zero-knowledge-related developments, but the need for small and fast proofs is critical in a high-computing environment with various account and transaction restraints. This is what makes zero-knowledge proofs attractive for building rollups, as one could generate a succinct proof that all operations on an L2 are valid and validate it on the L1.
At the end of the day, think of elliptic curves as a replacement for modular arithmetic. With elliptic curves, it is significantly harder to get a specific point. If we were to use traditional modular arithmetic with \(g^a \bmod n\), where g is a generator, n is a large prime number, and a is the secret key. As explored previously with the discrete logarithm problem, you’ll need a very large prime number to secure the secret key. Elliptic curves offer a more efficient alternative with smaller key sizes, providing the same level of security with significantly improved performance.
Randomness
Nothing else in this article would matter without randomness, a fundamental aspect of cryptography. How can you expect a system to be secure if its values are predictable and biased? True randomness can be challenging to achieve, but it is essential for several reasons:
- Key Generation — Cryptographic keys must be generated randomly to ensure they are unpredictable and secure
- Nonces and Salts — Nonces (i.e., numbers used once) and salts (i.e., random values added to data before hashing) prevent replay attacks and protect against precomputed attacks, respectively
- Secure Protocols — Randomness is used to ensure fairness and security, preventing predictability and patterns that attackers could exploit
Most random number generators fail to produce a random number that can be cryptographically verified. This leaves them vulnerable to manipulation and limits their use cases. However, verifiable random functions fix this.
Verifiable Random Functions
A Verifiable Random Function (VRF) is a cryptographic primitive that produces a random output and a proof that the output was generated correctly from a given input. A VRF must be unpredictable, meaning its output is indistinguishable from a random value to anyone who does not know the secret input. Its security relies on the RSA assumption that it’s hard to compute \(y = h^d \bmod n\) without knowing the secret exponent d and the security of the hash function H.
The main steps of a given VRF are as follows:
- Key Generation — The user generates a pair of RSA keys (e, n) as the public key and (d, n) as the private key. The public key e is the exponent, and n is the modulus. The private key d is the secret exponent
- Computation — Given an input x, the user computes the VRF output y and the proof π. First, the hash of h = H(x) is computed, where H is a cryptographic hash function. Then \(y = h^d \bmod n\) is computed, which is the RSA signature of the hash. Finally, the proof π = (h, y) is computed
- Verification — Given the public key (e, n), the input x, the output y, and the proof π = (h, y), anyone can verify the correctness of the VRF output by checking the hash h equals \(H(x)\) and \(y^e \equiv h \pmod{n}\) to verify the RSA equation
VRFs are often used in consensus protocols, where randomness needs to be unpredictable yet verifiable. L1s, including Algorand, Cardano, Internet Computer, and Polkadot, use VRFs in their consensus mechanisms to randomly select block producers. Chainlink offers Chainlink VRF as an abstraction layer between the user and the blockchain to generate probably fair and verifiable values. Pyth Entropy also offers a reliable and secure source of randomness.
Ceremonies and Trusted Setups
Cryptographic ceremonies are protocols or events where critical cryptographic computations are done in a secure, controlled environment. There are several types of cryptographic ceremonies, namely:
- Key Generation Ceremonies — These involve generating cryptographic keys to ensure that no single entity has control over the key generation process
- Parameter Generation Ceremonies — These involve creating cryptographic parameters that will be used by multiple parties
- Multi-Party Computation (MPC) Ceremonies — These involve multiple parties jointly performing a cryptographic computation to ensure that no single party can compromise the process
A trusted setup ceremony is a special event or process designed to generate a set of cryptographic parameters needed to run a cryptographic protocol. In our section on interactive and non-interactive zero-knowledge proofs, we identified that the first step of a proof is to have the prover and verifier agree on some value to be used. In a trusted setup ceremony, multiple participants contribute randomness to the setup to ensure no single participant controls the process. Each participant generates some random value that gets combined with the values provided by other participants. The combined output becomes a set of parameters that everyone can trust.
This process is crucial because if all participants colluded, they could break the system by generating a proof for an invalid claim. However, having just a single honest participant ensures the security of the parameters.
Zcash famously used a trusted ceremony to bootstrap the chain’s privacy features. Ethereum also had the KZG Ceremony, a coordinated public ritual to provide a cryptographic foundation for their scaling efforts (e.g., EIP-4844 / proto-danksharding).
Note that some zero-knowledge proof systems, such as zk-STARKs, do not require a trusted setup. We’ll explore this more in the second article.
Conclusion
With this article, we’ve explored the underlying theory, mathematics, and cryptography that powers zero-knowledge proofs. This is everything you need to know to start answering the question of what zero-knowledge proofs are. Now, we can begin to apply these learnings to networks such as Solana to contribute to the overall discussion and development of zero-knowledge proofs.
We continue this analysis in the second and final article in our two-part series on zero-knowledge proofs, aptly named Zero-Knowledge Proofs: Its Applications on Solana.
If you’ve read this far, thank you, anon! Be sure to enter your email address below so you’ll never miss an update about what’s new on Solana. Ready to dive deeper? Explore the latest articles on the Helius blog and continue your Solana journey, today.