Signal

Theme Song: Text Me

In this assignment, you'll compose the cryptographic primitives we've been exploring to build a secure communication framework. In particular, we build a system wherein an adversary cannot decrypt or modify messages without our knowledge. You'll become much more familiar with cryptographic libraries and be prepared to take up larger protocols later in the course.

NOTE: You should start early on this assignment; it is harder than the last (warmup) assignment!


Background Knowledge

In this assignment, you will build a secure communication platform. On startup, each client specifies whether it will listen or connect to another client; once two clients have connected, they will run a key exchange protocol to reach a shared secret. Once we have a shared secret, both parties run a key derivation function on it to derive a secure key for a block cipher, which they can then use to encrypt and decrypt messages in tandem. To ensure that messages aren't tampered with, each message is also tagged with a message authentication code, which are generated from another secure key. We go over each of these components in detail below.

Key Exchange

We've already explored key exchange in the previous assignment, but it's worth revisiting. Often, when two parties wish to communicate securely, it is useful to have some shared secret key that they can use to encrypt and decrypt their messages. Indeed, the block cipher that we use in this assignment requires such a secret key. Diffie-Hellman key exchange is one method for coming to a shared secret key.

Diffie-Hellman works in three steps. First, system parameters are generated and shared amongst the parties. These parameters include a group \(\mathbb{G}\) (e.g., \(\mathbb{Z}_p^*\)), a generator \(g \in \mathbb{G}\), and the order \(q\) of the group generated by \(g\). Once these parameters are shared, the two parties randomly sample secrets \(a, b \in \mathbb{Z}_q^*\) respectively, and share the public values \(g^a, g^b \in \mathbb{Z}_p^*\) with each other. Finally, the secret and public values are combined to create a shared secret \(g^{ab} \in \mathbb{Z}_p^*\).

Block Ciphers

Often, we wish to encrypt large amounts of data without having to generate new secret keys each time nor comprimising the security of our existing key. A block cipher is an encryption scheme that works on fixed-length blocks at a time. They are very useful for encrypting large or arbitrary-length data, such as messages or videos. AES (Advanced Encryption Standard) is one such block cipher. It was adopted by NIST in 2001 after winning a 5-year public competition to become the new standard secure block cipher. To use AES, simply provide it with a message and a suitable key; it will then output the ciphertext (the API for AES is quite simple despite its complexity). The inner workings of AES are extremely complicated and out of scope for this class.

There are many modes in which a block cipher can operate, but the one we'll be using is known as cipher-block chaining mode, or CBC mode. One nuance of this mode is that it requires each encryption to be initialized with an initialization vector, or IV for short, and each decryption should use the same IV when decrypting. The IV does not need to be kept secret, but it should be chosen at random.

Message Integrity

An important aspect of communication is message integrity; we want to be sure that our messages haven't been tampered with in transit. A MAC (Message Authentication Code) is one way of cryptographically ensuring message integrity. A MAC generator takes in a shared secret and a message and outputs a tag for the message. A MAC verifier takes in the shared secret, a message, a MAC tag, and outputs "Verified" (or some other positive value) if and only if the MAC is valid for the given message. Otherwise, it rejects the tag, indicating that the value has been tampered with or that the MAC was generated incorrectly. It must be difficult for those without the shared secret to generate valid MACs for any message (otherwise, this wouldn't be secure). The MAC that is computed on some value can be thought of as a signature but with symmetric keys. In this assignment we use HMAC (Hashed Message Authentication Code), a widely used MAC algorithm, to tag our messages.

We need to be careful about the order in which we apply our cryptographic primitives. In particular, do we compute MACs on the plaintext and then encrypt the message along with the tag, or do we encrypt our plaintext first and then compute MAC on the resulting ciphertext? It turns out that only latter approach is secure.

Key Derivation

The key we generate from the Diffie-Hellman key exchange may not be sufficient for AES or HMAC. For example, it may not be long enough, it may be of the wrong size, or may have a distribution that doesn't preserve the security guarantees of AES. Moreover, we want to use different keys for AES and HMAC to preserve security. To this end, we use a secure key derivation function to convert a shared secret into an acceptable key. In this assignment, we use HKDF (HMAC-based Key Derivation Function), a widely used key derivation function, to generate secure keys. To ensure that HKDF generates different keys for AES and HMAC, we salt the Diffie-Hellman shared secret in the HKDF calls. Salts have been provided for you already in the stencil code, and should be passed into the HKDF call as a parameter. We will use HKDF two times in this assignment, once to generate a key for AES and once to generate a key for HMAC.

Diffie-Hellman Ratchet

An undesirable property of using a single key throughout the whole session is that if the key is leaked or cracked at any point, an adversary could read all messages in the entire session. How can we remedy this? A solution can be found in the Double Ratchet Algorithm developed by the titular Signal company. Criticially, they implement what is known as a Diffie-Hellman Ratchet which has parties constantly switching their shared secret. Then, even if you break in at time \(T\), the keys will eventually be switched, limiting the number of messages the adversary can read before needing to break in again. Please read the Diffie-Hellman Ratchet section of the spec, linked above.

In short, this is how the ratchet works. Each time the direction of communication changes (i.e. Alice starts sending messages after the last thing she did was receive a message), the sender will generate a new Diffie-Hellman keypair and send their new public value. The sender will generate a new encryption key by combining their new private value with the receiver's old (i.e. last heard of) public value, and the receiver will generate a new decryption key by combining the sender's newly received public value with their old (i.e. last generated/current) private value. Both parties will use this shared key in AES and HMAC by first running it through HKDF, as before.

Architecture DH

Putting it all together

The following diagrams explain how the protocol works together.

First, the parties need to run the Diffie-Hellman key exchange to derive secret keys.

Architecture DH

Then, the parties are able to securely communicate. One secret key is used for encryption/decryption and the other is used for MAC generation/verification. The following diagram shows the sequence for the event where Alice sends a message to Bob. The event where Bob sends a message to Alice is symmetrical.

Architecture Msg

In short, we proceed in two steps: key exchange and then secure communication.

Key Exchange

When our client starts up, we'll first connect to the other client we're interested in communicating with, then we'll set up to exchange keys. One of the clients (the one that called "connect") will generate and send the Diffie-Hellman parameters (\(p, g, q\)) to the other (DH_generate_params), which will listen for them. Once the parameters are received, both clients will generate their secret and public values (DH_generate_shared_key) and send the public value to the other client. Lastly, each client will combine the value they received with their secret value to reach a shared secret.

Communication

Using the Diffie-Hellman shared secret, each client derives a key for each of AES (AES_generate_shared_key) and HMAC (HMAC_generate_shared_key) using HKDF. Each time a client wishes to send a message, they first encrypt it using AES (AES_encrypt) and tag the entire message (as in, the IV concatenated with the public value concatenated with the ciphertext) with an HMAC (HMAC_generate). Each time a client receives a message, they verify that the HMAC is valid (HMAC_verify) and decrypt it using AES (AES_decrypt).


Assignment Specification

Please note: you may NOT change any of the function headers defined in the stencil. Doing so will break the autograder; if you don't understand a function header, please ask us what it means and we'll be happy to clarify.

Functionality

You will primarily need to edit src/drivers/crypto_driver.cxx and src/pkg/client.cxx. The following is an overview of relevant files:

The following roadmap should help you compartmentalize the project's main goals:

Some tips:

Diffie-Hellman Key Exchange

Implement the Diffie-Hellman Key Exchange by editing the following functions. Once you do so, clients will be able to establish a shared key upon connection with each other.

Cryptographic functions:

Application functions:

AES Encryption

Implement AES encryption by editing the following functions. Once you do so, clients will be able to encrypt and decrypt messages.

Cryptographic functions:

Application functions:

HMAC Authentication

Implement HMAC authentication by editing the following functions. Once you do so, clients will be able to tag and verify messages.

Cryptographic functions:

Application functions:

Diffie-Hellman Ratchet

Implement the Diffie-Hellman Ratchet to allow cycling of keys.

Application functions:

Please note that in our protocol, we run full Diffie-Hellman key exchange first, then continue using the Diffie-Hellman Ratchet for future messages. You can think of the first messages being sent in HandleKeyExchange as dummy messages, then future ones being real messages. This allows you to build the project incrementally without having to remove key exchange later on.

Support Code

Read the support code header files before coding so you have a sense of what functionality we provide. This isn't a networking class, nor is it a software engineering class, so we try to abstract away as many of these details as we can so you can focus on the cryptography.

The following is an overview of the functionality that each support code file provides.

Messaging

The following example shows how to use our message library alongside our networking library; we'll be using this pattern for the entire course, so it's good to get it down now.

// Declare the message struct that we want to send and populate its fields
PublicValue_Message msg_s;
msg_s.value_1 = "bar";
msg_s.value_2 = "foo";

// Declare a data vector and serialize the message into the data vector
std::vector<unsigned char> msg_data;
msg_s.serialize(msg_data);

// Send the data
this->network_driver->send(msg_data);

// ----

// Read the data
std::vector<unsigned char> their_msg_data = this->network_driver->read();

// Declare a message struct and deserialize the data into the struct.
PublicValue_Message their_msg_s;
their_msg_s.deserialize(their_msg_data);

Libraries: CryptoPP

Most of the code you'll be writing in this assignment will be similar to that in the links provided below. We highly recommend perusing those wikis before starting. In particular, the Pipelining wiki page is useful for understanding how the CryptoPP library works.

You may find the following wiki pages useful during this assignment:

Note: A SecByteBlock is a variable where we can securely store keys in memory; many library functions that you will be calling require this type. When initializing SecByteBlocks, you should pass in the length of the value it will be holding; for example, dh.PrivateKeyLength() or dh.AgreedValueLength() or SHA256::DIGESTSIZE. We've included helper functions for you to print out the contents of a SecByteBlock.


Getting Started

To get started, get your stencil repository here and clone it into the devenv/home folder. From here you can access the code from both your computer and from the Docker container.

Running

To build the project, cd into the build folder and run cmake ... This will generate a set of Makefiles building the whole project. From here, you can run make to generate a binary you can run, and you can run make check to run any tests you write in the test folder.

To run the binary, run ./signal_app <listen|connect> [address] [port] (in the build folder) on two terminals with different ports, one using listen and the other connect (listen before connect). This should run the app and allow you to communicate.

Tip: Use localhost as the address, and choose any number from 1025 up to 65,535 for the port. (e.g. ./signal_app listen localhost 8080)

Testing

You may write tests in any of the test/**.cxx files in the Doctest format. We provide test/test_provided.cxx, feel free to write more tests in this file directly. If you want to add any new test files, make sure to add the file to the cmake variable, TESTFILES, on line 7 of test/CMakeLists.txt so that cmake can pick up on the new files. Examples have been included in the assignment stencil. To build and run the tests, run make check in the build directory. If you'd like to see if your code can interoperate with our code (which is what it will be tested against), feel free to download our binaries here - we try to keep these up to date, so if you're unsure about the functionality of our binaries, please ask us on Ed!


FAQ