Merkle trees, sometimes referred to as hash trees, were invented in 1979 by Ralph Merkle. Merkle Trees are a data structure that expands upon the topics of binary trees and cryptographic hashes. Merkle trees have unique applications in distributed systems like Interplanetary File Storage Service(IPFS), Bitcoin, and Ethereum.

The Binary Tree

A key distinction of a binary tree is that it can only have two children per node, whereas a regular tree can have many children for each node. A child node with no children is called a leaf node. Here is a simple diagram of a tree of depth = 2. The depth is the amount of levels of parent nodes. The binary tree can only have two children per node.

Cryptographic Hashes

A cryptographic Hash is a mathematical function that is "one way" meaning once you have a hash of a value, you can't use the hash function to obtain the original input value. A unique hash value is produced by first breaking all data into a numerical representation that is then fed to a hash function. Good hash functions also have the following properties:

It is extremely unlikely that two values would produce the same hash (called a hash collision)
It is computationally difficult for an individual to obtain the data from a hash
The hash is fast to Compute
They are deterministic, meaning the same value will always produce the same hash
A small change in the input should produce a drastically different change in the output

Merkle Trees

Now that we have an overview of the foundational concepts that make up a merkle tree we are ready to dive in. To introduce a Merkle tree, first imagine a distributed system that needs to distribute data across a network. The data would first be broken down into files or chunks and then sent out to corresponding network nodes. But how would you know that an untrusted entity didn't tamper with any of the data? This is where the Merkle Trees come in. The Merkle Tree is a binary tree where the nodes are hashed. For example say you have four files noted 1-4 that contain pertinent information. Each file would be hashed and the hash values would make up the leaves of our tree, where node D = H(1), E = H(2), F = H(3), G = H(4). Then the parents of the nodes are hashes of the concatenation of their child nodes. For example node B = H(H(1), H(2)) and node C = H(H(3),H(4)). The root node is computed in the same way as a hash of the hash values of node B and C.

With this data structure, as long as the root node is from a trusted source, one can verify the integrity of any of the other files with information from an untrusted source.

What is a Merkle Tree?

The Binary Tree

Cryptographic Hashes

Merkle Trees