Understanding zero knowledge proofs and zk-SNARK.

zk-SNARK (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) is the protocol of zero knowledge proof. ZCash cryptocurrency will be the one of first opened blockchain projects, which will utilize the implementation of this fascinating algorithm in the mechanics of blockchain. Zero knowledge proofs have huge potential and can be used or even revolutionize a lot of data exchange processes.

In this article I explain:

  • what is the zero knowledge proof;
  • what is, how works and what kind of math mechanics are hidden behind zk-SNARK.

What exactly is the zk-SNARK?

zk-SNARK is the protocol, manual, instructions of realization the algorithm which solves extremely interesting problem: how to prove that we know some value without necessity of revealing that value. In other words, how to prove that we know the value without showing (giving someone) the value. In the typical interaction of such process there are 2 participants: the prover and the verifier. The process described simply is basically: the prover has to prove the verifier that knows some value without revealing the value.

Algorithms which solve this kind of problem are accurately called zero knowledge proofs algorithms. The zero knowledge means that the prover delivers some kind of proof to the verifier which should be enough to verify that the prover has the expected value (information). zk-SNARK is non-interactive what means that the proof should be short (in practice consist of few values). The prover sends the value to the verifier and this is the end of the interaction. This information should be enough for the prover to decide if the proof is valid – if the prover has the correct value which he claim to have presenting the proof.

Math gears of zk-SNARK

enigma
zk-SNARK is the set of precise math operations.

Zero knowledge proofs require a lot of complex math operations (algebraic). Implementing the protocol which is non interactive, so doesn’t require the interoperable conversation between parties is the demanding problem.

zk-SNARK is the description of the math operations set which if made collectively, allows to create the protocol of proving some statements (with zero knowledge). In other words it is the instruction how to design and then utilize the functions allowing: creating the zero proof (for prover) and verification of the proof (for verifier). What it means is that there is no single implementation of zk-SNARK which can prove every statement. Single statement to prove requires concrete implementation of zk-SNARK based algorithm.

In this article I focus on explaining the really interesting math used in zk-SNARK, and I don’t focus on concrete usage of Zero Knowledge Proofs (ZKP). This is kind of simplified form explaining main concepts and elements of zk-SNARK puzzles. More detailed explanation can be found on ZCash blog where the in-depth articles series about ZKP can be found. There is also good Ethereum blog article. Here below you can find quite complex math presented in simply form. This knowledge will help you understand the math “magic” behind the zero knowledge proofs.

1. Homomorphic Hiding

Homomorphic Hiding (HH) is one of the main ingredients of zk-SNARK. Lets describe $E(x)$ informally as homomorphic function. The function has given characteristics:

  • knowing the value of $E(x)$, so evaluated function for the argument $x$, it’s very hard to find the value of $x$;
  • homorphic function should return different results for different arguments (so different for $x = 5$, different for $x = 6$, etc.);
  • if someone knows $E(x)$ e.g. $E(5)$ and knows $E(y)$ e.g. $E(8)$ then also can calculate $E(x+y)$, so $E(13)$;

Let’s imagine Alice wants to prove Bob that knows numbers $x$ i $y$ for which true is $x + y = 7$. Alice doesn’t want to share these values with Bob so the only thing Bob knows is that sum of these values is $7$. Alice and Bob choose together some HH function. Using the function, Alice an send to Bob the values of $F(x)$ and $F(y)$. Having these two values Bob calculates $F(x +y)$ and $F(7)$. Finally Bob believes Alice that she knows correct values of the formula if the Bob’s value of $F(x + y)$ is equal $F(7)$.

Above mechanism uses only some math properties of homomorphic function and few algebraic transformations. We designed kind of truth evaluation process where without disclosing the value, verifier can decide whether the value is correct or not. The concept of HH function is not yet zero proof. These functions are used in zk-SNARK however in more wider context. Mechanics of HH functions are not enough to propose the universal protocol of proving any statements (usually much more complex than given example). In order to design and implement different kind of zero proof protocols zk-SNARK must be more complex.

2. Blind evaluation

Let’s assume Alice knows some polynomial $P$ of degree $d$. Bob would like to know what will be the value of $E(P(s))$ – so the evaluation of some HH function in the value of evaluated polynomial $P$ in point $s$. How this scenario can be easily realized?

  1. Alice sends the polynomial $P$ to Bob so he can made calculations alone.
  2. Bob sends the value of $s$ to Alice and she returns the result expected by Bob.

Let’s think if Bob could know the value of $E(P(s))$ without knowing the polynomial $P$ and without necessity of sending value of $s$ to Alice. Using homomorphic functions and linear combinations which are used in operations used in polynomial calculations we can propose given steps:

  1. Bob sends to Alice the values: $E(s^0), E(s^1), …, E(s^d)$ (where $d$ is the degree of polynomial $P$)
  2. Alice using the values received from Bob calculates the $E(P(s))$. She can do it because homomorphic function $E(x)$ supports linear combinations and the value of $P(s)$ is the combination of values $s^0, s, …, s^d$

Given algorithm allows to find out by Bob the value of $E(P(s))$ in some confidential manner. Bob doesn’t know the function $P(s)$ and the Alice doesn’t know the value $s$. This process is described as blind evaluation and is another important puzzle in zk-SNARK protocol.

There is however some problem: how can Bob be sure that Alice indeed evaluated correct polynomial $P$? The fact that she can do it doesn’t unfortunately give the Bob certainty that she really did it. The above process allows Alice to deliver value expected by Bob – however doesn’t allow Bob to verify that she really provided expected value. In other words: Alice can provide any value and Bob will not be detect it.

3. Knowledge of Coefficient Test and Knowledge of Coefficient Assumption

Our goal is now to somehow force Alice to use the correct polynomial P when calculating $E(P(s))$. Let’s describe kind of tool which used in zk-SNARK can help to achieve the goal.

The KC test is given:

  1. Bob chooses the value of coefficient $α$ and some numbers pair (a,b) called $α$-pair, which fulfill the equation $b = α⋅a$
  2. Bob sends to Alice given pair saying that this is $α$-pair
  3. Now Alice must return another different $α$-pair to Bob – the pair $(a’,b’)$
  4. Bob accepts received pair only if it’s really $α$-pair. He can easily verify it because new pair should also fulfill the equation $b’ = α⋅a’$

Alice can easily deliver value expected by Bob if she knows $α$. However the point is that she doesn’t know the value $α$. In that case providing new $α$-pair would be made by Alice simply by returning the pair $(a’,b’) = (a⋅γ, b⋅γ)$. New pair calculated as a product of values $a$ and $b$ and some value $γ$. The knowledge of Alice about the coefficient $γ$ is described as KCA. The pair calculated in that way will be correct and will match the Bob’s equation.

In the scenario of extended KCA Bob sends to Alice $n$ $α$-pairs, but the $α$ is constant, only $a$ and $b$ are changing. For every received pair Alice generates another $α$-pair $(a’,b’)$. Again thanks to linear combinations, Alice can encapsulate many generated value to single $α$-pair which she will return to Bob. For example if Alice must generate 2 $α$-pairs for $(a_1,b_1)$ i $(a_2,b_2)$, and calculation of new pair is based on multiplication respectively by $c1$ i $c2$ so the truth is: $b’ = c_1\cdot b_1 + c_2 \cdot b_2 = c_1 (\alpha \cdot a_1) + c_2(\alpha\cdot a_2) = \alpha (c_1\cdot a_1 + c_2\cdot a_2) = \alpha\cdot a’.$

4. The Verifiable Blind Evaluation Protocol

Let’s combine all the things we concepts we already presented in order to define verifiable blind evaluation protocol which will “force” Alice to generate “correct” polynomial:

  1. Bob has some homomorphic function $E(x)=x⋅g$. He chooses some value of coefficient $α$. He sends to Alice the values: $E(x)$ : $E(s^0), E(s^1), …, E(s^d)$ (so the values $s^0⋅g, s^1⋅g,…,s^d⋅g$) and also the values of $E(x⋅α)$ – what means exactly: $α⋅(s^0⋅g), α⋅(s^1⋅g),…, α⋅(s^d⋅g)$
  2. Alice has some polynomial $P(x)$ degree of $d$. She calculates $a= E(P(s)) = P(s)⋅g$ and $b=α⋅E(P(s)) = α⋅P(s)⋅g$. She can do it because values calculated by her are linear combinations of Bob’s values of homomorphic function (algorithm from point 2 – blind evaluation).
  3. Bob verifies if true is the equation $b=α⋅a$ and accepts only if so.

Using the blind evaluation protocol and KCA we proposed the algorithm which allows to verify the correctness of Alice’s polynomial. In given way Alice can provide to Bob the value $E(P(s))$ not knowing the $s$ and Bob can verify if it’s correct. In that moment correct polynomial means: having expected degree.

The mechanisms we now learned already allows us to define the zk-SNARK protocol. The goal is allowing to prove some statements and then to verify them. The statement in that meaning is some kind of math formula. The prover side presenting the proof would like to indeed prove that can solve the formula however the verifier side has to be able to verify the proof without knowing the formula solution. Let’s present then the important zk-SNARK problem: how to present the formulas (statements) in some generic way – giving the possibility to define any statement as a formula?

5. Quadratic Arithmetic Program

QAP is general method for presenting the statement which should be proved with zero knowledge. The statement written in QAP gives the possibility to decompose it to polynomials for whose it’s easy to find the solution. In other words: we have possibility to define statements to prove in some QAP form for which we can easy verify if the solution is correct.

It’s worth to mention that the statement to prove with zk-SNARK algorithm should already be polynomial. QAP is not the method to translate the statement implemented in some programming language to the polynomial form (e.g. the program which implements the function $C(x,y)$, which proves that Alice knows the value $x$ for which the hash is $y$, so $C(x,y) == true$). QAP is the method to translate statement to prove which are already in form of polynomial (so in practice the program code already presented in form of polynomial) to QAP polynomials form. It’s generic method – algorithm to transform statements to prove to some required form, which thanks to algebraic transformations and homomorphic function will later be used to implement concrete zk-SNARK functions (for proving and verifying the proofs).

6. Pinocchio Protocol

Using QAP and taking all the previously described features we can propose given mechanism: Alice can prove to Bob that she knows the solution of $(c1⋅c2)⋅(c1+c3)=7$ (some example polynomial) what means that she knows c1, c2, c3 and she doesn’t need to discolose these values in order to prove it to Bob. Protocol (called Pinocchio Protocol) of proceeding given scenario is given:

  1. The statement polynomial $(c1⋅c2)⋅(c1+c3)=7$ presented in QAP form consists of polynomials $P, L, R, O$ which are created by Alice using the solution of the polynomial she knows. If she has the correct solution – she knows such $c1, c2, c3$ for whose $(c1⋅c2)⋅(c1+c3)=7$ and it means that defined polynomials by her also fulfill the equity: $P = L⋅R – O$. Moreover if she knows correct $c1, c2, c3$ she can also define such polynomial $H$ that $P = H⋅T$.
  2. Bob chooses some value of $s$ and sends to Alice the value of $E(T(s))$. The polynomial T is known for Bob because it’s also the result of QAP presented statement to prove.
  3. Alice sends back to Bob values: $E(L(s)),E(R(s)),E(O(s)),E(H(s))$
  4. Bob checks if $E(P(s)) = E(T(s)*H(s))$, exactly if $E(L(s)⋅R(s)−O(s))=E(T(s)⋅H(s))$

This protocol gives the possibility to present by Alice the proof of having solution without exposing the solution. Bob can verify the proof on the assumption that Alice returned correct value, expected by Bob.

Above scenario is already good starting point to define general protocol for defining easy to prove statements (using QAP) and verification the solutions of these statements (blind evaluation and QAP). Unfortunately given protocol is still vulnerable: Alice can provide the proof to Bob, however there still not the certainty that she already did it – theoretically she can return the values which Bob will evaluate as correct (in point 4), but these values will not be exactly the “correct” values possessed by Alice.

7. zk-RKS, almost zk-SNARK

Pinocchio Protcol is the base for creating complete zero proof knowledge algorithm. In order to get complete zk-SNARK protocol we need to solve few issues yet.

7.1 Problem: Bob must have correct homomorphic function E

Correct means the one which supports multiplication. Previously we didn’t introduce any concrete function. The solution of that problem is using the homomorphic function based on ecliptic curves.

7.2 Problem: Alice must generate correct polynomials – having correct degree and be based on the correct solution of the statement to prove

This is the problem we deal from the beginning. In the Pinocchio Protocol Alice generates the polynomials $P, L, R, O$ using own solution. The protocol allows Bob to verify if Alice has the correct QAP solution, so she has the solution for the statement to prove. However Bob has to sure  that Alice really provided correct QAP solution based on the statement solution.

Solving this problem in practice is possible by using the characteristics of QAP. Presenting the statement in QAP form allows to verify the solution. Using also homomorphic function adds the possibility to “hide” the real value. This is what was done in Pinocchio Protocol. Mixing above with mechanics of linear combinations allows also the real verification of statements in QAP. Let’s define the group of polynomials $Fi$ which are the result of combination polynomials $L, R, O$ with the QAP solution. Bob can ask Alice for proving him that Alice can write $F(s)$ – it’s the result of combining many $Fi$ values. This proof will allow Bob to verify that Alice indeed uses real solution of the QAP and the real solution of the statement. Now we have complete zero-proof-knowledge.

8. Complete zk-SNARK

The Pinocchio Protocol with the fixes included in point 7 so far is zk-RKS (Zero Knowledge Argument Of Knowledge). It means that we can already define zero knowledge proofs and verify them, however the process is not yet fast enough and still requires interaction between participants (too complex and too long interaction).

How to make it zk-SNARK (Zero Knowledge Succinct Non Interactive Argument of Knowledge)? How to reduce the required amount of interactions between participants, reduce it single message between verifier and prover? The solution of that is simply based on removing the necessity of sending by Bob values to Alice which are required by her to generate the proof.

Creating complete zk-SNARK is simple: required values can prepare and set in the protocol (CRS). In other words implementation of zk-SNARK for concrete statement already contains the values required by Alice in order to provide by her the complete zero knowledge proof. Thanks to that the verifier doesn’t have to send any values to the prover. Providing proof can be done only by prover so finally we have complete zk-SNARK protocol.

Single zk-SNARK proof consist of few values which all together allow to verify the proof, making sure that the solution represented by proof is correct.

9. The word about CRS (Common Reference String) and toxic waste

toxic

Describing complete zk-SNARK we introduced the element set in the protocol. This element, this data is later used for creating by zero knowledge proofs by the prover. The data which are set in the protocol (being encrypted) and which are necessary for creating proofs for every statement are called CRS (Common Reference String).

The value of CRS is the encrypted, secure data set which contains among the others: the calculated values of homomorphic function eveluation for random point $s$. These values already described in previous part of this article: $E(s^0), E(s^1), …, E(s^d)$. I also mentioned a lot about the values which had to be random. The mechanics of HH functions (including some combination on these functions) gives the possibility to hide the value of the point (argument) for which the function was evaluated. The parameter $s$ is such a value: it must be used by verifier for evaluation the HH function in order to get the values later used for the prover. However the parameter $s$ must be absolutely unknown for the prover. It must be the value which will be unknown in the whole zk-SNARK implementation for particular statement – it’s randomness and secrecy makes the zk-SNARK correct and safe.

If we define the protocol where some data must be used once and then removed – the values such as $s$ (and some others which were not mentioned here) used for generating CRS for given zk-SNARK must be generated confidentially, used and then removed. These data are described as toxic waste. Knowing these values can make whole zk-SNARK useless because of the possibility to create fake proofs. This is why on it’s blog – ZCash wrote about parameters generation ceremony which ensured that everything was extremely confidential and secure.

Summary

In this article I introduced the process of creation and definition the zk-SNARK protocol for proving statements with zero knowledge proof. The values representing CRS and also zk-SKARK proofs are in practice extremely optimized. The data set includes the required minimum of data represented in some optimal form – the goal is to reduce the data processing and network overhead. The process of creating proofs included consist of some additional combinations which prevent abusing the protocol and creating fake proofs.

In my opinion zero knowledge proofs can really cause some revolution in the digital data flow. This is extremely beautiful mathematical mechanism, kind of Holy Grail of privacy and confidentiality. The zk-SNARK protocol is the instruction of implementation the algorithm to prove the ownership of information without necessity to disclose the information. In the blockchain world this mechanism is already used in ZCash, but also other projects like Ethereum started to integrate it to the blockchain. Zero knowledge proofs have the potential to find real world applications, changing the approach to the information and proving the truth. In the same time this beautiful symbiosis of math and technology makes us sure that everything can work fine and we can trust it because… math doesn’t make mistakes.