This is the sixth crypto challenge in set 1 from cryptopals, which is the qualifying set.
The difficulty level is relatively easy, to solve this problem I have used the programming language Go.
Problem description
This challenge isn't conceptually hard, but it involves actual error-prone coding. The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you're probably just fine up to Set 6.
There's a file to download at the cryptopals page. It's been base64'd after being encrypted with repeating-key XOR.
Decrypt it.
Here's how:
- Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
-
Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:
this is a test
andwokka wokka!!!
is 37. Make sure your code agrees before you proceed.
For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
Solve each block as if it was single-character XOR. You already have code to do this.
For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.
This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.
Solution
The Hamming distance is a clear requirement to start the solution with.
This could be expressed with the following unit test.
func TestHammingDistance(t *testing.T) {
str1 := "test"
str2 := "test"
expectedDistance := 0
distance, err := basics.HammingDistance(str1, str2)
if err != nil {
t.Errorf("Unexpected error: %v", err)
}
if distance != expectedDistance {
t.Errorf("Expected distance: %d, got: %d", expectedDistance, distance)
}
str1 = "this is a test"
str2 = "wokka wokka!!!"
expectedDistance = 37
distance, err = basics.HammingDistance(str1, str2)
if err != nil {
t.Errorf("Unexpected error: %v", err)
}
if distance != expectedDistance {
t.Errorf("Expected distance: %d, got: %d", expectedDistance, distance)
}
}
The second test with the expected distance indicates that we want to calculate distance on the binary level of the strings.
To fulfill the unit test we could create this function to solve that.
func HammingDistance(str1, str2 string) (int, error) {
if len(str1) != len(str2) {
return 0, errors.New("strings have different lengths")
}
distance := 0
for i := range str1 {
b1 := []byte(str1[i : i+1])
b2 := []byte(str2[i : i+1])
// The ^ operators sets to 1 only the bits that are different
val := b1[0] ^ b2[0]
// We then count the bit set to 1
for v := val; v > 0; v >>= 1 {
distance += int(v & 1)
}
}
return distance, nil
}
Where we loop through each character as a byte and count the number of bits set to 1 in each byte.
Next up is to implement a function that finds the best key size/length to use.
The algorithm will be using our newly created HammingDistance
function. The algorithm is described by the points 1 to 4 in the problem description above. We start with writing a unit test for this algorithm.
func TestFindKeySize(t *testing.T) {
plaintext := []byte("Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal")
ciphertext := basics.RepeatingKeyXOR(plaintext, []byte("ICE"))
keySize, err := basics.FindKeySize(ciphertext)
if err != nil {
t.Errorf("Unexpected error: %v", err)
}
if keySize != 3 {
t.Errorf("Expected key size: %d, got: %d", 3, keySize)
}
}
Here we are using the, RepeatingKeyXOR
, function from a previous challenge to build a ciphertext with a known length of the key. Now we can verify that our algorithm is solving this in a correct way.
Here we loop through the key sizes from 2
until 40
. Then we loop divide the ciphertext in blocks and loop through them to calculate the hamming distance. For each key size we are checking if it's the best size (smallest distance) for the given ciphertext.
func FindKeySize(ciphertext []byte) (int, error) {
minKeySize := 2
maxKeySize := 40
bestKeySize := 0
bestDistance := float64(len(ciphertext))
for keySize := minKeySize; keySize <= maxKeySize; keySize++ {
doubleKeySize := keySize * 2
blocks := len(ciphertext)/doubleKeySize - 1
if blocks <= 2 {
// Not enough blocks to calculate a meaningful distance
continue
}
distance := 0
for block := 0; block < blocks; block++ {
block1 := ciphertext[block*doubleKeySize : block*doubleKeySize+keySize]
block2 := ciphertext[block*doubleKeySize+keySize : block*doubleKeySize+2*keySize]
d, err := HammingDistance(string(block1), string(block2))
if err != nil {
return 0, err
}
distance += d
}
normalizedDistance := (float64(distance) / float64(keySize)) / float64(blocks)
if normalizedDistance < bestDistance {
bestDistance = normalizedDistance
bestKeySize = keySize
}
}
return bestKeySize, nil
}
Next up is to write an algorithm to solve the points 5 to 8, which will produce the key for decrypting the content of the file.
Lets create a new test that is a variant of the previous test.
func TestFindKey(t *testing.T) {
actualKey := []byte("ICE")
plaintext := []byte("Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal")
ciphertext := basics.RepeatingKeyXOR(plaintext, actualKey)
key := basics.FindKey(3, ciphertext)
if string(key) != string(actualKey) {
t.Errorf("Expected key: %s, got: %s", string(actualKey), string(key))
}
}
The implementation to solve this looks like this.
func FindKey(keySize int, ciphertext []byte) []byte {
// Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
blocks := make([][]byte, keySize)
for i := 0; i < len(ciphertext); i += keySize {
block := make([]byte, 0)
for j := i; j < len(ciphertext); j++ {
block = append(block, ciphertext[j])
}
blocks = append(blocks, block)
}
// Now transpose the blocks: make a block that is the first byte of every block, and a
// block that is the second byte of every block, and so on.
transposedBlocks := make([][]byte, keySize)
for i := 0; i < keySize; i++ {
block := make([]byte, len(blocks))
for j := 0; j < len(blocks); j++ {
if i < len(blocks[j]) {
block[j] = blocks[j][i]
}
}
transposedBlocks[i] = block
}
// Solve each block as if it was single-character XOR. You already have code to do this.
// For each block, the single-byte XOR key that produces the best looking histogram is the
// repeating-key XOR key byte for that block. Put them together and you have the key.
theKey := make([]byte, 0)
for _, block := range transposedBlocks {
bestKeyByte := byte(0)
bestScore := 0.0
for k := range 256 {
key := byte(k)
decrypted := singleByteXOR(block, key)
score := calculateScore(decrypted)
if score > bestScore {
bestScore = score
bestKeyByte = key
}
}
theKey = append(theKey, bestKeyByte)
}
return theKey
}
The function is divided in three steps. First we break the ciphertext into blocks of the key length (see point 5). Second step is to transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on (see point 6). Then the last step is to solve each block as if it was single-character XOR (see point 7 and 8).
Now we have all building blocks needed to solve the complete challenge.
func BreakRepeatingKeyXOR(filename string) ([]byte, error) {
ciphertext, err := decodeBase64File(filename)
if err != nil {
return nil, err
}
keySize, err := FindKeySize(ciphertext)
if err != nil {
return nil, err
}
key := FindKey(keySize, ciphertext)
decrypted := RepeatingKeyXOR(ciphertext, key)
return decrypted, nil
}
The complete solution can be found on GitHub.
Conclusion
This challenge emphasized on a complete solution to decrypt a message with the help the Vigenère Cipher. As always it's good to divide the problem into smaller pieces, especially so that you can unit test the pieces.