# Yao's - Homework Solutions

# Crypto Notions

- What does each of zk, S, N, and ARG mean in zk-SNARG? What does K mean in zk-SNARK? Give a potential application of zk-SNARG/ZKSNARK in practice. (Try to come up with one that was not covered in class!)

zk stands for zero-knowledge, meaning that nothing is revealed to the verifier beyond the fact that the statement is true.

S stands for succinct, meaning that both the proof size and verification time are succinct. In particular, for circuit satisfiability, the proof size is polynomial in the security parameter and polylogarithmic in the circuit size; the verification time is polynomial in the security parameter and the input size, and polylogarithmic in the circuit size.

N stands for non-interactive, meaning that the proof is a single message sent from the prover to the verifier.

ARG stands for argument, meaning that it is a proof system that satisfies completeness and soundness against PPT provers.

K stands for proof of knowledge, meaning that the prover cannot generate a valid proof unless it has a valid witness.

- In one sentence, what is secure two-party/multi-party computation? Give a potential application of secure 2PC/MPC in practice. (Try to come up with one that was not covered in class!)

A secure two-party/multi-party computation is a protocol that allows two/multiple distrusting parties to jointly compute a function on their private inputs with the guarantee that only the output of the function is revealed and nothing beyond that.

# Commitment Schemes

Consider a commitment scheme \((\mathsf{Commit}, \mathsf{Open})\) for a finite message space \(\mathcal{M}\) (for instance, \(\mathcal{M}=\{0,1\}\)). Recall the hiding and binding properties:

**Hiding:** The hiding property states that for any \(m_0, m_1 \in \mathcal{M}\), \(m_0 \neq m_1\), \(\mathsf{Commit}(m_0)\) is indistinguishable from \(\mathsf{Commit}(m_1)\). More specifically, perfect hiding states that \(\mathsf{Commit}(m_0)\) and \(\mathsf{Commit}(m_1)\) have the same distribution and are indistinguishable to any receiver. Computational hiding states that \(\mathsf{Commit}(m_0)\) is computationally indistinguishable from \(\mathsf{Commit}(m_1)\) to a PPT receiver.

**Binding:** The binding property states that it is hard for the sender to find \(m_0, m_1 \in \mathcal{M}\), \(m_0 \neq m_1\) and randomness \(r, s\) such that \(\mathsf{Commit}(m_0;r)=\mathsf{Commit}(m_1;s)\). More specifically, perfect binding states that for any \(m_0, m_1 \in \mathcal{M}\), \(m_0 \neq m_1\) and any randomness \(r, s\), \(\mathsf{Commit}(m_0;r)\neq \mathsf{Commit}(m_1;s)\). Computational binding states that it is computationally hard for a PPT sender to find \(m_0, m_1 \in \mathcal{M}\), \(m_0 \neq m_1\) and randomness \(r, s\) such that \(\mathsf{Commit}(m_0;r)=\mathsf{Commit}(m_1;s)\).

- Why can't a commitment scheme be both perfectly hiding and perfectly binding?

If a commitment scheme is perfectly binding, then for any \(m_0, m_1 \in \mathcal{M}\), \(m_0 \neq m_1\) and any randomness \(r, s\), \(\mathsf{Commit}(m_0;r)\neq \mathsf{Commit}(m_1;s)\). In other words, \(\mathsf{Commit}(m_0)\) and \(\mathsf{Commit}(m_1)\) are disjoint, so they cannot be the same distribution and it would be easy to distinguish for a computationally unbounded receiver.

- Recall the Pedersen commitment scheme on a cyclic group \(\mathbb{G}\) of prime order \(q\) with generator \(g\). Let the receiver first sample a random group element \(h\in \mathbb{G}\). The message space is \(\mathcal{M} = \mathbb{Z}_q\). To commit to a message \(m \in \mathbb{Z}_q\), the sender randomly samples \(r \gets \mathbb{Z}_q\) and computes \(\mathsf{Commit}(m;r) = g^m \cdot h^r\). Why is this scheme perfectly hiding and computationally binding (under what computational assumption)?

Pedersen commitment is perfectly hiding because \(h^r\) is a random group element in \(\mathbb{G}\) and we can think of it as a one-time pad to mask \(g^m\), which information theoretically hides \(m\).

It is computationally binding under the DLOG assumption. In particular, if there exists a PPT sender who can find \(m_0, m_1 \in \mathbb{Z}_q\), \(m_0 \neq m_1\) and randomness \(r, s \in \mathbb{Z}_q\) such that \(\mathsf{Commit}(m_0;r)=\mathsf{Commit}(m_1;s)\), namely \(g^{m_0} \cdot h^r = g^{m_1}\cdot h^s\), then we can derive \(g^{m_0-m_1} = h^{s-r}\), and hence \(g^{(m_0-m_1)(s-r)^{-1} } = h\), solving the discrete log of \(h\).

- Recall the Merkle tree construction on a message space \(\mathcal{M} = \{0,1\}^{2^d}\). For any message \(m\in\{0,1\}^{2^d}\), we builds up a binary tree of depth \(d\) where the leaf nodes are the \(2^d\) bits of \(m\) and each non-leaf node is computed as \(H(\mathsf{lc} \| \mathsf{rc})\) where \(\mathsf{lc}\) and \(\mathsf{rc}\) are its left and right children. Finally, we use the root of the tree as the commitment. Why is this construction computationally binding (under what computational assumption)?

The Merkle tree construction is computationally binding assuming collision resistance of the hash function \(H\). In particular, suppose there exists a PPT sender who can find \(m_0, m_1 \in \{0,1\}^{2^d}\), \(m_0 \neq m_1\) such that \(\mathsf{Commit}(m_0)=\mathsf{Commit}(m_1)\) (note that there is no randomness involved in the scheme). Assume without loss of generality that the first bits of \(m_0\) and \(m_1\) are different, namely \(m_0[1] \neq m_1[1]\). Then \(H(m_0[1] \| m_1[2])\neq H(m_1[1] \| m_1[2])\) given collision resistance of \(H\). We can apply an induction from the leaf node to the root node and conclude that \(\mathsf{Commit}(m_0)\neq \mathsf{Commit}(m_1)\), contradiction.

- Why is the above Merkle tree construction
*not*hiding? How can we make it hiding?

For any \(m_0, m_1 \in \{0,1\}^{2^d}\), \(m_0 \neq m_1\), a receiver can simply compute \(\mathsf{Commit}(m_0)\) and \(\mathsf{Commit}(m_1)\) and easily distinguish.

To make it perfectly/computationally hiding, we can first apply a perfectly/computationally hiding commitment scheme to each bit of the message at the leaf nodes, and then build up the Merkle tree on these commitments.

# Oblivious Transfer

Our OT protocol only supports 1-out-of-2 OT, which is sufficient for garbled circuits. However, 1-out-of-\(n\) OT (for \(n>2\)) may also be useful in certain applications. In 1-out-of-\(n\) OT, the sender has \(n\) messages \(m_0, m_1, \dots, m_{n-1} \in \{0,1\}^{\ell}\) and the receiver has an input \(c \in \{0,1,\dots, n\}\). From the OT protocol, we would like the receiver to learn only \(m_c\) and the sender to learn nothing.

- Extend the Diffie-Hellman-based OT protocol to a semi-honest 1-out-of-\(n\) OT for an arbitrary \(n\). You don't have to prove security for the protocol.
**Hint:**1-out-of-2 OT should be a special case of the extended protocol.

The protocol proceeds as follows.

- Sender randomly samples \(a\gets \mathbb{Z}_q\) and sends \(A=g^a\) to the receiver.
- Receiver randomly samples \(b\gets \mathbb{Z}_q\) and sends \(B = g^b\cdot A^c\) to the sender.
- Sender computes \(k_i\) as \(H((B/A^i)^a)\) and computes \(e_i \gets \mathsf{Enc}_{k_i}(m_i)\) for each \(i \in \{0,1,\dots, n-1\}\). Sender sends \(\{e_0,e_1,\dots,e_{n-1}\}\) to the receiver.
- Receiver decrypts \(e_c\) using \(H(A^b)\).

- Given a semi-honest 1-out-of-2 OT protocol (as a black box), construct a semi-honest 1-out-of-\(n\) OT for an arbitrary \(n\). You don't have to prove security for the protocol.

The sender and receiver runs \(n\) 1-out-of-2 OTs. For the \(i\)-th OT (\(i\in\{0,1,\dots, n-1\}\)), the sender feeds in two messages \((m_0^i = 0^\ell; m_1^i = m_i)\), and the receiver feeds in \(1\) if \(c=i\) and \(0\) otherwise.

# Malicious Garbler

As is, our 2PC protocol doesn't protect against malicious adversaries. For instance, the garbler doesn't have to garble the circuit that the two parties agreed on.

- Consider the private dating example where the two parties want to jointly compute a single AND gate on their private inputs. Describe an attack that allows the garbler to learn the evaluator's input bit (assuming the garbler is malicious and the evaluator is honest).

Instead of garbling an AND gate, the garbler can garble an XOR gate, which allows the garbler to learn the evaluator's input from the output of the computation.

- Suppose we wish to boost security of this protocol by enforcing the garbler to generate each garbled gate honestly; for instance, an AND gate is actually an AND gate, and not an OR gate. This way, the evaluator can be sure that the function they're evaluating is the one they agreed on. What cryptographic primitive might be useful in this situation? A single-phrase response suffices.

Zero-knowledge proofs (or cut-and-choose).

# (Hard Bonus) KZG Commitments

In this problem, we will work towards constructing the **KZG polynomial commitment scheme**, or the **Kate commitment scheme**, one of the most popular and foundational polynomial commitment schemes in the world. This commitment scheme is used in practical zk-SNARGs/SNARKs because it allows constant time verification and constant sized proofs for polynomial evaluation. The drawbacks are that it relies on **bilinear pairings** and requires a **trusted setup**, which may be hard to achieve in real life without relying on MPC.

We start by partially presenting the scheme's setup. Consider a cyclic group \(\mathbb{G}\) of prime order \(Q\) with generator \(g\). A trusted party is going to run **trusted setup** by computing a random secret group element \(s \in \mathbb{G}\), then \(g^{s^i}\), for \(i = 0, 1, \ldots, n\). Critically, the trusted party will completely delete \(s\) - without doing this, the protocol is not secure. Finally, the trusted party will share the \(g^{s^i}\) as public parameters.

- How could we use MPC to conduct the trusted setup phase, so that no party knows the secret \(s\)? Assume we have two non-colluding semi-honest parties.

The two parties each sample a random \(s,t\gets \mathbb{Z}_Q\). Then we construct a circuit that, given two inputs \(s, t\), will output \(g^{(s+t)^i}\) for \(i = 0, 1, \ldots, n\). The two parties can then use any generic 2PC to jointly compute this circuit (based on garbled circuits or GMW).

Next, we wish to commit to a polynomial \(p(x)\) of degree at most \(n\). The commitment to the polynomial \(p(x)\) is computed as \(g^{p(s)}\). Since we don't know \(s\), we'll have to use the public parameters to compute it.

- How can we compute the commitment \(g^{p(s)}\) using the public parameters?

We can compute \(g^{p(s)} = g^{\sum_{i=0}^n p_i s^i} = \prod_{i=0}^n (g^{s^i})^{p_i}\) where \(p_i\) is the \(i\)-th coefficient of the polynomial.

- As an aside, it may be very useful to be able to homomorphically add commitments together. Show how we can homomorphically add up two commitments of \(p(x)\) and \(q(x)\) to retrieve a commitment of \(p(x) + q(x)\).

We can multiply the two commitments together, since \(g^{p(s)} \cdot g^{q(s)} = g^{p(s) + q(s)} = g^{(p+q)(s)}\).

Now that we've committed to a polynomial, we want to be able to generate proofs that \(p(z) = y\) for some \(z\) and \(y\), without revealing \(p(x)\). Notice that proving this statement is equivalent to proving that there exists some polynomial \(q(x)\) such that \(p(x) - y = q(x)\cdot (x - z)\).

- Why are the above two statements equivalent?

Since \(p(z) = y\), then \(p(x) - y\) will be equal to 0 at \(z\), which means that the polynomial \(p(x) - y\) has a root at \(z\). Thus the term \((x - z)\) can be factored out and the polynomial \(q(x)\) is the remainder.

Naturally, our proof that \(p(z) = y\) will be the commitment to the remainder polynomial \(q(x)\). To verify, we want to use the commitments to check that \(p(s) - y = q(s) \cdot (s-z)\), namely that the relationship above holds when evaluated at \(s\). We only have these elements in the exponents (namely, we have access to \(g^{p(s)}, g^{q(s)}, g^s\) and can easily compute \(g^y, g^z\)), and so we will verify them in the exponents. While we can compute \(g^{p(s) - y}\) quite easily, we can't compute \(g^{q(s) \cdot (s-z)}\). What should we do?

Thankfully, when we selected \(\mathbb{G}\), we chose \(\mathbb{G}\) to have special properties. In particular, for this group there exists an operation \(e: \mathbb{G} \times \mathbb{G} \rightarrow \mathbb{G}_T\) where \(\mathbb{G}_T\) is another group of order \(Q\) with a generator \(g_T\). For any \(a, b \in \mathbb{Z}_q\), it holds that \(e(g^a, g^b) = g_T^{ab}\).

Such an operation is called a **bilinear pairing** and such a group \(\mathbb{G}\) is called a **pairing-friendly group**. (So far, pairings only work in elliptic curve groups, so we have that \(\mathbb{G}\) is an elliptic curve.)

- Given a pairing-friendly group \(\mathbb{G}\) with a bilinear pairing \(e\), how can we verify the above relationship?

We can verify that \(e(g^{p(s) - y}, g) = e(g^{q(s)}, g^{s - z})\) holds. This is because, by pairings, \(e(g^{p(s) - y}, g) = g_T^{p(s) - y}\) and \(e(g^{q(s)}, g^{s - z}) = g_T^{q(s)\cdot(s-z)}\).

We should inspect some of the security properties of this scheme. To do so, we'll introduce a neat fact about polynomials. Consider a polynomial \(p(x)\) of degree \(n\). We know from high school that \(p(x)\) has at most \(n\) roots; that is, \(p(x) = 0\) in at most \(n\) places. Moreover, we know that if another degree-\(n\) polynomial \(q(x)\) is equal to \(p(x)\), then \(p(x) - q(x)\) should be equal to 0 everywhere since their difference is the zero polynomial. However, if the polynomials are not equal, then the resulting polynomial is of degree at most \(n\), and thus has at most \(n\) roots. Given that \(n\) is much smaller than the field size \(Q\) the polynomials are defined in. Therefore, at a random point \(s\), it is extremely unlikely that \(p(s) - q(s) = 0\).

- Why is it hard for the prover to find another degree-\(n\) polynomial \(q(x)\) with the same commitment as \(p(x)\) while \(q(x) \neq p(x)\)?

Doing so would mean finding \(q(x)\) such that \(q(s) = p(s)\). If \(q(x) \neq p(x)\), then as we discussed above, \(q(s) \neq p(s)\) with overwhelming probability for a random \(s\). Hence, without knowing \(s\), it is hard to find such a polynomial \(q(x)\).

Recall the trusted setup; we now explore why knowledge of \(s\) breaks security of this scheme.

- How could a malicious prover construct a degree-\(n\) polynomial \(q(x)\) with the same commitment as \(p(x)\) while \(q(x) \neq p(x)\), knowing \(s\)?

Knowing \(s\) and the polynomial \(p(x)\), we can easily construct a polynomial \(q(x)\) such that \(q(s) = p(s)\) by randomly sampling all but one coefficients of \(q(x)\) and computing the final coefficient to satisfy the equation \(q(s) = p(s)\).

- How could a malicious prover fake a proof, knowing \(s\)? Extrapolate why without knowing \(s\) it is hard to create a fake proof.

We can compute a fake proof by computing \(\pi_{\mathsf{fake} } = g^{(p(s) - y)(s - z)^{-1} }\). This will verify, since \(e(\pi_{\mathsf{fake} }, g^{s - z}) = g_T^{p(s) - y}\)

The rabbit-hole goes deeper: Kate proofs can actually be combined to prove multiple evaluations of a polynomial all at once, or they can be used as a vector commitment like a Merkle tree. Other forms of commitments require no trusted setup, or are post-quantum secure, or both. We hope this interlude into commitments has been exciting!