A Merkle tree is a data structure that repeatedly hashes pairs of items until a single hash - the Merkle root - remains. Each leaf node stores the hash of a data block, while every internal node stores the hash of its two children concatenated.
Because hash functions are deterministic and collision‑resistant, any change to a leaf ripples up to the root, making the root a compact commitment to the whole dataset.
When you need to prove that a particular leaf belongs to a known Merkle root, you build a Merkle proof - a short list of sibling hashes that, when combined with the leaf hash, reconstruct the root.
The resulting list typically contains log₂(N) entries, where N is the number of leaves. Even for a million‑item tree, the proof is under 20 hashes.
Verification mirrors construction but starts with the provided leaf hash and the proof list.
If any step fails, the proof is invalid - either the data is not part of the tree, or the proof was tampered with.
Blockchains are the most visible playground for Merkle proofs.
Merkle proofs are lightweight, but they rely heavily on the properties of the underlying hash function. A weak hash can lead to collisions, breaking the security guarantee.
Common choices today are SHA‑256 (used by Bitcoin) and Keccak‑256 (Ethereum’s base). Both provide ~128‑bit pre‑image resistance, which is sufficient for most public‑blockchain scenarios.
Two practical concerns:
Below is a concise Python snippet that builds a Merkle proof for a list of byte strings. It uses SHA‑256 from the standard library.
import hashlib
def hash_pair(a, b):
return hashlib.sha256(a + b).digest()
def merkle_root(leaves):
layer = [hashlib.sha256(l).digest() for l in leaves]
while len(layer) > 1:
if len(layer) % 2:
layer.append(layer[-1])
layer = [hash_pair(layer[i], layer[i+1]) for i in range(0, len(layer), 2)]
return layer[0]
def merkle_proof(leaves, index):
proof = []
layer = [hashlib.sha256(l).digest() for l in leaves]
while len(layer) > 1:
if len(layer) % 2:
layer.append(layer[-1])
sibling_index = index ^ 1
proof.append(layer[sibling_index])
index //= 2
layer = [hash_pair(layer[i], layer[i+1]) for i in range(0, len(layer), 2)]
return proof
# Example usage
items = [b'alpha', b'beta', b'gamma', b'delta']
root = merkle_root(items)
proof = merkle_proof(items, 2) # prove 'gamma'
print('Root:', root.hex())
print('Proof:', [p.hex() for p in proof])
When verifying, recompute hashes using the same order and compare the final value to the known root.
Aspect | Merkle Proof | Simple Hash Verification | Bloom Filter |
---|---|---|---|
Data Size Needed | Log₂(N) hashes | Full data block | Fixed‑size bit array |
Proof of Inclusion | Yes, cryptographically strong | No, only equality check | Probabilistic (false positives possible) |
Update Overhead | Recompute affected branches only | Rehash entire dataset | Add bits, no rehash needed |
Collision Resistance | Depends on hash function (high) | Depends on hash function (high) | Not applicable |
Security comes from the hash function’s collision resistance. Because each node’s hash depends on its children, any alteration to a leaf changes every hash up to the root. Without knowing the root, an attacker cannot forge a valid proof.
Absolutely. Any scenario where you need to prove inclusion in a large dataset works - for example, verifying logs, syncing distributed file systems, or proving that a user’s record exists in a privacy‑preserving database.
SHA‑256 is the de‑facto choice for Bitcoin and many public blockchains. Ethereum uses Keccak‑256. Both are widely supported and have no known practical collisions.
The client downloads the block header containing the Merkle root, then requests a Merkle proof for the specific transaction. By recomputing the root from the proof and comparing it to the header, the client confirms inclusion.
Most implementations duplicate the last leaf at that level before hashing. This keeps the tree balanced and ensures every node has two children.
If your proof verification fails, consider these checks:
Once you’ve ironed out these details, you can confidently integrate Merkle proofs into wallets, decentralized apps, or any system that needs cheap, trustworthy inclusion checks.
Written by Eldridge Fairweather
View all posts by: Eldridge Fairweather