# Signal - Homework Solutions

# Encryption Schemes

Recall the RSA and ElGamal encryption schemes from class. Recall also the definition of chosen plaintext attack (CPA) security. In short, an encryption scheme is CPA-secure if for any polynomial-time adversary \(\mathcal{A}\) who can query for encryptions of arbitrary messages, \(\mathcal{A}\) cannot distinguish between an encryption of \(m_0\) and an encryption of \(m_1\). All the messages and \(m_0, m_1\) are chosen by \(\mathcal{A}\).

- Why isn't RSA encryption CPA-secure?

There are many ways to explain this. For example, an adversary \(\mathcal{A}\) can choose two arbitrary messages \(m_0, m_1\) and encrypt them by computing \(c_0 = m_0^e \mod N\) and \(c_1 = m_1^e \mod N\). Comparing \((c_0, c_1)\) with the challenge ciphertext \(c\), the adversary can easily tell if \(c\) is an encryption of \(m_0\) or \(m_1\).

The fundamental reason behind it is that RSA encryption is deterministic, namely encrypting the same message always results in the same ciphertext. For an encryption scheme to be CPA-secure, there must be randomness involved in the encryption scheme.

- What should we pay attention to, regarding the plaintext, when we use RSA encryption in practice? Why?

Intuitively speaking, the plaintext must be sampled from a sufficiently large domain; more formally it must have high min-entropy. Since RSA encryption is deterministic, if the plaintext is from a relatively small set, then an adversary could perform a brute force search on all the possible messages.

Another caveat is that the plaintext and \(N\) should be coprime, otherwise an adversary can learn the factoring of \(N\) (and break the scheme) by taking the gcd of \(N\) and the ciphertext. (This is a minor issue as it will only happen with negligible probability if the plaintext is sampled randomly from \([1,N-1]\).)

- Why is ElGamal encryption CPA-secure?

The CPA-security of ElGamal encryption is based on the Diffie-Hellman assumptions. Given the public key \(pk = (\mathbb{G}, q,g,g^x)\) and a ciphertext \(c=(g^y, g^{xy} \cdot m)\), note that from \(g^x\) and \(g^y\), it is computationally hard to find \(g^{xy}\) (based on the CDH assumption). Furthermore, \(g^{xy}\) is computationally indistinguishable from a random group element \(g^z\) (based on the DDH assumption), hence it hides \(m\) like a one-time pad.

- Given that public-key encryption can be used to construct a symmetric-key encryption, why do we ever want to use symmetric-key encryption over public key encryption, in practice?

There are two main reasons: 1) the practical constructions of symmetric-key encryption schemes (in particular, block cipher) are much more efficient than public-key encryption; b) public-key encryption schemes rely on stronger computational assumptions and most of the existing schemes are vulnerable to quantum attacks.

# Authenticated Encryption

Recall the definition of chosen message attack (CMA) security for a MAC scheme. In short, a MAC scheme is CMA-secure if for any polynomial-time adversary \(\mathcal{A}\) who can query for tags of arbitrary messages, \(\mathcal{A}\) cannot generate a new valid message-tag pair \((m,t)\) where \(m\) hasn't been queried before.

Given a CPA-secure symmetric-key encryption scheme \(\Pi_1=(\mathsf{Gen}_1,\mathsf{Enc}_1, \mathsf{Dec}_1)\) and a CMA-secure MAC scheme \(\Pi_2 = (\mathsf{Gen}_2,\mathsf{Mac}_2,\mathsf{Vrfy}_2)\), we construct an authenticated encryption scheme \(\Pi=(\mathsf{Gen},\mathsf{Enc},\mathsf{Dec})\).

- What is
**Encrypt-and-MAC**and why is it not necessarily secure?

- \(\mathsf{Gen}(1^\lambda):\) run \(k_1\gets \mathsf{Gen}_1(1^\lambda)\) and \(k_2\gets\mathsf{Gen}_2(1^\lambda)\), and output \(k = (k_1, k_2)\);
- \(\mathsf{Enc}_k(m):\) compute \(c_1 \gets\mathsf{Enc}_1(k_1,m)\) and \(t_2 \gets\mathsf{Mac}_2(k_2,m)\), and output \(c = (c_1, t_2)\);
- \(\mathsf{Dec}_k(m):\) compute \(m:=\mathsf{Dec}_1(k_1,c_1)\) and \(b:=\mathsf{Vrfy}_2(k_2, (m, t_2))\). If \(b=1\), then output \(m\); otherwise output \(\bot\).

This encryption scheme \(\Pi\) is not necessarily secure mainly because a CMA-secure MAC scheme does *not* necessarily hide the message. Imagine an extreme case where the MAC scheme includes the entire message in the tag. It could still be CMA-secure, but the resulting encryption scheme \(\Pi\) is certainly not CPA-secure.

- What is
**Encrypt-then-MAC**and why is it both CPA-secure and unforgeable?

- \(\mathsf{Gen}(1^\lambda):\) run \(k_1\gets \mathsf{Gen}_1(1^\lambda)\) and \(k_2\gets\mathsf{Gen}_2(1^\lambda)\), and output \(k = (k_1, k_2)\);
- \(\mathsf{Enc}_k(m):\) compute \(c_1 \gets\mathsf{Enc}_1(k_1,m)\) and \(t_2 \gets\mathsf{Mac}_2(k_2,c_1)\), and output \(c = (c_1, t_2)\);
- \(\mathsf{Dec}_k(m):\) compute \(m:=\mathsf{Dec}_1(k_1,c_1)\) and \(b:=\mathsf{Vrfy}_2(k_2, (c_1, t_2))\). If \(b=1\), then output \(m\); otherwise output \(\bot\).

Since the MAC is generated on the ciphertext, we know for certain that an adversary cannot generate another ciphertext with a valid tag (satisfying unforgeability).

Moreover, generating the MAC on the ciphertext means that it can only leak information about the ciphertext and not the underlying message. The CPA-security of \(\Pi\) follows directly from the CPA-security of \(\Pi_1\).

# Man-in-the-Middle (MitM) Attack

In the Signal project, our protocol so far ensures that two parties can establish shared secrets, but it does *not* ensure that they know exactly who they are talking to. Indeed, an adversary could pretend to be who they are talking to. Describe a man-in-the-middle attack that compromises the security of our application.

Suppose two parties Alice and Bob are running the protocol in our Signal project, and there is an adversary Eve who mediates all communication (that is, all messages from Alice to Bob and vice versa go through Eve; Eve can see, drop, or modify any message). Imagine Eve sits between Alice and Bob and talks to both of them. Eve pretends to be Bob when talking to Alice and pretends to be Alice when talking to Bob. In both interactions Eve would follow our Signal project. This way Eve can see all the interactions throughout the protocol while Alice and Bob would not notice they are talking to a man-in-the-middle attacker.

# Replay Attack

In the Signal project, our protocol is *not* entirely secure against replay attacks. In particular, once a secure channel is established, the same encrypted messages could be sent multiple times (before the parties switch to a new key). Consider an application that upon receiving a suitable message, will send a dollar to charity.

- Describe a replay attack that exploits this system.

An adversary Eve can simply re-send the same encrypted message multiple times (before Alice and Bob switch to a new key).

- Propose a mechanism for protecting against this attack.

We need a way for messages to be **uniquely** identified. A simple approach is to append a timestamp, a sequence number, or a randomly sampled nonce to the message before encryption, and Bob would ignore messages with the same identifier.

The Signal symmetric-key ratchet can also solve this problem.

# (Bonus) Chained CBC

Describe a concrete chosen plaintext attack against block cipher chained CBC mode. In your attack, the two challenge messages (\(M_0, M_1\)) should be of equal length, namely, the adversary should *not* differentiate the two possible ciphertexts purely based on the length.

The adversary \(\mathcal{A}\) first queries a message \(m=0^\lambda\) and gets a ciphertext \(c = IV \| c_1\). Then \(\mathcal{A}\) chooses two challenge messages \((M_0, M_1)\) each of length \(\lambda\), where \(M_0 = IV \oplus c_1\), and \(M_1\) is an arbitrary message (different from \(M_0\)). Once receiving the challenge ciphertext \(\hat c\) (which is an encryption of either \(M_0\) or \(M_1\)), \(\mathcal{A}\) compares it with \(c_1\). If \(\hat c = c_1\), then \(\mathcal{A}\) outputs \(0\); otherwise \(\mathcal{A}\) outputs \(1\).

# (Bonus) Diffie-Hellman Ratchet

An alternative version of the Diffie-Hellman ratchet it to send a new public value for every message sent, regardless of whether or not the flow of communication changed. In what situations does this change help? In what situations does it do nothing? Your answer should be phrased in terms of what information an attacker has managed to obtain.

Suppose Alice's last Diffie-Hellman public value was \(g^a\), Bob's last public value was \(g^b\), and the last message was sent from Bob to Alice. Their agreed secret key is \(k = g^{ab}\). Now Bob sends a new public value \(g^{b'}\). If an attacker already managed to learn \(a\), then sending a new \(g^{b'}\) doesn't do anything as the attacker can still figure out the new agreed key \(k'=g^{ab'}\). If the attacker only managed to learn \(b\) or \(k\), then \(k'\) would be computationally hard to figure out (based on the Diffie-Hellman assumptions).