Break repeating-key XOR

Stefan Alfbo - May 12 - - Dev Community

This is the sixth crypto challenge in set 1 from cryptopals, which is the qualifying set.

The difficulty level is relatively easy, to solve this problem I have used the programming language Go.

Problem description

This challenge isn't conceptually hard, but it involves actual error-prone coding. The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you're probably just fine up to Set 6.

There's a file to download at the cryptopals page. It's been base64'd after being encrypted with repeating-key XOR.

Decrypt it.

Here's how:

  1. Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
  2. Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:

    this is a test and wokka wokka!!!

    is 37. Make sure your code agrees before you proceed.

  3. For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.

  4. The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.

  5. Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.

  6. Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.

  7. Solve each block as if it was single-character XOR. You already have code to do this.

  8. For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR ("Vigenere") statistically is obviously an academic exercise, a "Crypto 101" thing. But more people "know how" to break it than can actually break it, and a similar technique breaks something much more important.

Solution

The Hamming distance is a clear requirement to start the solution with.

This could be expressed with the following unit test.

func TestHammingDistance(t *testing.T) {
    str1 := "test"
    str2 := "test"
    expectedDistance := 0
    distance, err := basics.HammingDistance(str1, str2)
    if err != nil {
        t.Errorf("Unexpected error: %v", err)
    }
    if distance != expectedDistance {
        t.Errorf("Expected distance: %d, got: %d", expectedDistance, distance)
    }

    str1 = "this is a test"
    str2 = "wokka wokka!!!"
    expectedDistance = 37
    distance, err = basics.HammingDistance(str1, str2)
    if err != nil {
        t.Errorf("Unexpected error: %v", err)
    }
    if distance != expectedDistance {
        t.Errorf("Expected distance: %d, got: %d", expectedDistance, distance)
    }
}
Enter fullscreen mode Exit fullscreen mode

The second test with the expected distance indicates that we want to calculate distance on the binary level of the strings.

To fulfill the unit test we could create this function to solve that.

func HammingDistance(str1, str2 string) (int, error) {
    if len(str1) != len(str2) {
        return 0, errors.New("strings have different lengths")
    }

    distance := 0
    for i := range str1 {
        b1 := []byte(str1[i : i+1])
        b2 := []byte(str2[i : i+1])
        // The ^ operators sets to 1 only the bits that are different
        val := b1[0] ^ b2[0]

        // We then count the bit set to 1
        for v := val; v > 0; v >>= 1 {
            distance += int(v & 1)
        }
    }

    return distance, nil
}
Enter fullscreen mode Exit fullscreen mode

Where we loop through each character as a byte and count the number of bits set to 1 in each byte.

Next up is to implement a function that finds the best key size/length to use.

The algorithm will be using our newly created HammingDistance function. The algorithm is described by the points 1 to 4 in the problem description above. We start with writing a unit test for this algorithm.

func TestFindKeySize(t *testing.T) {
    plaintext := []byte("Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal")

    ciphertext := basics.RepeatingKeyXOR(plaintext, []byte("ICE"))

    keySize, err := basics.FindKeySize(ciphertext)
    if err != nil {
        t.Errorf("Unexpected error: %v", err)
    }
    if keySize != 3 {
        t.Errorf("Expected key size: %d, got: %d", 3, keySize)
    }
}
Enter fullscreen mode Exit fullscreen mode

Here we are using the, RepeatingKeyXOR, function from a previous challenge to build a ciphertext with a known length of the key. Now we can verify that our algorithm is solving this in a correct way.

Here we loop through the key sizes from 2 until 40. Then we loop divide the ciphertext in blocks and loop through them to calculate the hamming distance. For each key size we are checking if it's the best size (smallest distance) for the given ciphertext.

func FindKeySize(ciphertext []byte) (int, error) {
    minKeySize := 2
    maxKeySize := 40

    bestKeySize := 0
    bestDistance := float64(len(ciphertext))

    for keySize := minKeySize; keySize <= maxKeySize; keySize++ {
        doubleKeySize := keySize * 2
        blocks := len(ciphertext)/doubleKeySize - 1

        if blocks <= 2 {
            // Not enough blocks to calculate a meaningful distance
            continue
        }

        distance := 0

        for block := 0; block < blocks; block++ {
            block1 := ciphertext[block*doubleKeySize : block*doubleKeySize+keySize]
            block2 := ciphertext[block*doubleKeySize+keySize : block*doubleKeySize+2*keySize]

            d, err := HammingDistance(string(block1), string(block2))
            if err != nil {
                return 0, err
            }
            distance += d
        }

        normalizedDistance := (float64(distance) / float64(keySize)) / float64(blocks)

        if normalizedDistance < bestDistance {
            bestDistance = normalizedDistance
            bestKeySize = keySize
        }
    }

    return bestKeySize, nil
}
Enter fullscreen mode Exit fullscreen mode

Next up is to write an algorithm to solve the points 5 to 8, which will produce the key for decrypting the content of the file.

Lets create a new test that is a variant of the previous test.

func TestFindKey(t *testing.T) {
    actualKey := []byte("ICE")
    plaintext := []byte("Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal")

    ciphertext := basics.RepeatingKeyXOR(plaintext, actualKey)

    key := basics.FindKey(3, ciphertext)
    if string(key) != string(actualKey) {
        t.Errorf("Expected key: %s, got: %s", string(actualKey), string(key))
    }
}
Enter fullscreen mode Exit fullscreen mode

The implementation to solve this looks like this.

func FindKey(keySize int, ciphertext []byte) []byte {
    // Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
    blocks := make([][]byte, keySize)
    for i := 0; i < len(ciphertext); i += keySize {
        block := make([]byte, 0)
        for j := i; j < len(ciphertext); j++ {
            block = append(block, ciphertext[j])
        }
        blocks = append(blocks, block)
    }

    // Now transpose the blocks: make a block that is the first byte of every block, and a
    // block that is the second byte of every block, and so on.
    transposedBlocks := make([][]byte, keySize)
    for i := 0; i < keySize; i++ {
        block := make([]byte, len(blocks))
        for j := 0; j < len(blocks); j++ {
            if i < len(blocks[j]) {
                block[j] = blocks[j][i]
            }
        }
        transposedBlocks[i] = block
    }

    // Solve each block as if it was single-character XOR. You already have code to do this.
    // For each block, the single-byte XOR key that produces the best looking histogram is the
    // repeating-key XOR key byte for that block. Put them together and you have the key.
    theKey := make([]byte, 0)
    for _, block := range transposedBlocks {
        bestKeyByte := byte(0)
        bestScore := 0.0
        for k := range 256 {
            key := byte(k)
            decrypted := singleByteXOR(block, key)

            score := calculateScore(decrypted)

            if score > bestScore {
                bestScore = score
                bestKeyByte = key
            }
        }
        theKey = append(theKey, bestKeyByte)
    }

    return theKey
}
Enter fullscreen mode Exit fullscreen mode

The function is divided in three steps. First we break the ciphertext into blocks of the key length (see point 5). Second step is to transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on (see point 6). Then the last step is to solve each block as if it was single-character XOR (see point 7 and 8).

Now we have all building blocks needed to solve the complete challenge.

func BreakRepeatingKeyXOR(filename string) ([]byte, error) {
    ciphertext, err := decodeBase64File(filename)
    if err != nil {
        return nil, err
    }

    keySize, err := FindKeySize(ciphertext)
    if err != nil {
        return nil, err
    }

    key := FindKey(keySize, ciphertext)
    decrypted := RepeatingKeyXOR(ciphertext, key)

    return decrypted, nil
}
Enter fullscreen mode Exit fullscreen mode

The complete solution can be found on GitHub.

Conclusion

This challenge emphasized on a complete solution to decrypt a message with the help the Vigenère Cipher. As always it's good to divide the problem into smaller pieces, especially so that you can unit test the pieces.

References

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .