Merkle Trees in Investment Data: Why One Hash Verifies Thousands
How Merkle trees compress thousands of investment performance records into a single 32-byte root, and why deterministic ordering is the part everyone gets wrong.
- A Merkle tree is a binary tree of SHA-256 hashes that compresses any number of records into a single 32-byte root, while still allowing O(log N) proof of inclusion for any individual record.
- Determinism is the entire game: NakedPnL sorts leaves lexicographically by entityId + chainType before hashing, so the same set of inputs always yields the same root regardless of database query order.
- Verifying a leaf is in the tree requires only the leaf, the root, and ⌈log₂ N⌉ sibling hashes — for a tree of 10,000 traders, that is 14 hashes (~448 bytes).
- NakedPnL builds one Merkle tree per day from every active entity's chain head and submits the root to OpenTimestamps for Bitcoin anchoring. The implementation is in lib/ots/merkle.ts.
Merkle trees are one of the rare cryptographic primitives that have escaped the academy and become load-bearing infrastructure for the public internet. Git uses them to identify commits. Bitcoin uses them to commit to every transaction in a block. Certificate Transparency uses them to detect rogue TLS certificates. They show up wherever a system needs to commit to a large set of items with a single fixed-size identifier — and where any party should later be able to prove that a specific item was in the set, without having to download the rest.
NakedPnL uses a Merkle tree to compress every active trader's chain head into a single 32-byte root each day. That root is what we send to OpenTimestamps to anchor in Bitcoin. This article explains the construction, the part most implementations get wrong (deterministic ordering), and how a verifier can prove a specific trader's chain was committed on a specific date in fewer than 500 bytes.
What a Merkle tree is
A Merkle tree is a binary tree built bottom-up by hashing. Given a list of N data items:
- Hash each item to produce N leaf digests.
- Pair adjacent leaves and hash each pair: hash(left || right). This produces ⌈N/2⌉ digests at level 1.
- Repeat at every level until a single digest remains. That digest is the Merkle root.
- If a level has an odd count, duplicate the last hash and pair it with itself before hashing.
The root has two structural properties that make it useful. First, it is a 32-byte fingerprint of the entire input set — modify any data item and the root changes (with the avalanche property of SHA-256). Second, you can prove that a specific item is in the tree by providing only the path from its leaf up to the root, which contains roughly log₂ N sibling hashes. The verifier recomputes the root using the candidate leaf and the proof path; if it matches the published root, the item was in the tree.
| Level | Index 0 | Index 1 | Index 2 | Index 3 |
|---|---|---|---|---|
| 0 (leaves) | L0 = H(d0) | L1 = H(d1) | L2 = H(d2) | L3 = H(d3) |
| 1 | N0 = H(L0 || L1) | N1 = H(L2 || L3) | — | — |
| 2 (root) | R = H(N0 || N1) | — | — | — |
To prove that data item d2 was in the tree, you provide L3 (the right sibling of L2) and N0 (the left sibling of N1). The verifier computes L2 = H(d2), then N1 = H(L2 || L3), then R' = H(N0 || N1), and compares R' against the published R.
The deterministic-ordering trap
The single biggest source of broken Merkle implementations is non-deterministic leaf ordering. If two parties build the same conceptual tree but feed the leaves in different orders, they get different roots — even when the input data is identical. SQL queries do not guarantee row order without an explicit ORDER BY, in-memory hash maps iterate in implementation-defined order, and concurrent code can interleave inserts. Any of these will produce a Merkle root that fails to match the one a counterparty independently computes.
NakedPnL solves this by sorting leaves on a deterministic key before hashing. The implementation in lib/ots/merkle.ts uses entityId + "|" + chainType, with localeCompare for stable lexicographic ordering:
function sortKey(leaf: MerkleLeaf): string {
return `${leaf.entityId}|${leaf.chainType}`;
}
function hashLeaf(leaf: MerkleLeaf): string {
// The leaf hash itself binds entityId, chainType, and the chain head.
return sha256Hex(`${leaf.entityId}|${leaf.chainType}|${leaf.chainHash}`);
}
export function buildMerkleRoot(leaves: MerkleLeaf[]): string {
if (leaves.length === 0) return EMPTY_REGISTRY_HASH;
const sorted = [...leaves].sort((a, b) =>
sortKey(a).localeCompare(sortKey(b)),
);
let level = sorted.map(hashLeaf);
while (level.length > 1) {
const next: string[] = [];
for (let i = 0; i < level.length; i += 2) {
const left = level[i];
const right = i + 1 < level.length ? level[i + 1] : level[i];
next.push(sha256Hex(left + right));
}
level = next;
}
return level[0];
}Two things to notice. First, the leaf hash itself includes entityId and chainType — so even if a malicious actor reordered the input array, the leaves would still be bound to their identities. Second, the spread copy [...leaves] guarantees the input array is not mutated; the sort runs on a fresh array.
Worked example with five chain heads
Suppose on a given day NakedPnL has five active traders with the following chain heads (abbreviated to 6 hex chars for readability):
| entityId | chainType | chainHash (head) | leaf hash |
|---|---|---|---|
| alice | NAV | ab12cd | L0 = H("alice|NAV|ab12cd") |
| bob | NAV | ef34gh | L1 = H("bob|NAV|ef34gh") |
| carol | NAV | ij56kl | L2 = H("carol|NAV|ij56kl") |
| dave | NAV | mn78op | L3 = H("dave|NAV|mn78op") |
| eve | NAV | qr90st | L4 = H("eve|NAV|qr90st") |
With 5 leaves, level 1 has ⌈5/2⌉ = 3 nodes (the last leaf is duplicated to pair with itself):
- N0 = H(L0 || L1)
- N1 = H(L2 || L3)
- N2 = H(L4 || L4) // last leaf paired with itself
Level 2 again has ⌈3/2⌉ = 2 nodes:
- M0 = H(N0 || N1)
- M1 = H(N2 || N2) // odd count again
Level 3 has the root: R = H(M0 || M1). To prove carol's chain head was committed in this Merkle tree, you provide:
- carol's leaf data: entityId="carol", chainType="NAV", chainHash="ij56kl". The verifier computes L2 = H("carol|NAV|ij56kl").
- Sibling at level 0: L3 (position right). Verifier computes N1' = H(L2 || L3).
- Sibling at level 1: N0 (position left). Verifier computes M0' = H(N0 || N1').
- Sibling at level 2: M1 (position right). Verifier computes R' = H(M0' || M1).
If R' equals the published root R, carol's chain head is proven to be in the tree. The proof is 3 hashes (96 bytes) plus the leaf data — and the verifier never had to know about alice, bob, dave, or eve.
Why proof of inclusion matters in practice
When NakedPnL anchors a daily Merkle root to Bitcoin via OpenTimestamps, the root is the only thing committed. Six months later, a third party — an allocator doing a forensic check, a journalist verifying a viral track record, a regulator running due diligence — wants to confirm that a specific trader's chain head on a specific date was in the anchored tree. Three options:
- Re-fetch every active chain head from that date and rebuild the entire Merkle tree, comparing the resulting root against the anchored root. This is the strongest check but the most expensive.
- Request a Merkle inclusion proof from NakedPnL: the leaf data plus ⌈log₂ N⌉ sibling hashes. Verify the proof against the anchored root with the verifyMerkleProof function from lib/ots/merkle.ts.
- Trust the published trader profile alone. (No cryptographic guarantee — included only to be explicit about the trust model.)
Option 2 is the design point. For 10,000 active entities, ⌈log₂ 10000⌉ = 14, so the proof is 14 × 32 = 448 bytes plus a small leaf record. The verifier can run entirely in the browser with the Web Crypto API, against an anchored root that they independently fetched from a Bitcoin block header. NakedPnL is not in the trust path.
async function sha256Hex(input: string): Promise<string> {
const bytes = new TextEncoder().encode(input);
const digest = await crypto.subtle.digest("SHA-256", bytes);
return Array.from(new Uint8Array(digest))
.map((b) => b.toString(16).padStart(2, "0"))
.join("");
}
interface ProofStep {
hash: string;
position: "left" | "right";
}
async function verifyMerkleProof(
leafHash: string,
proof: ProofStep[],
expectedRoot: string,
): Promise<boolean> {
let current = leafHash;
for (const step of proof) {
current = step.position === "left"
? await sha256Hex(step.hash + current)
: await sha256Hex(current + step.hash);
}
return current === expectedRoot;
}
// Example: prove carol's chain head was in the tree.
const carolLeafHash = await sha256Hex("carol|NAV|ij56kl");
const proof: ProofStep[] = [
{ hash: "<L3>", position: "right" },
{ hash: "<N0>", position: "left" },
{ hash: "<M1>", position: "right" },
];
const ok = await verifyMerkleProof(carolLeafHash, proof, "<published root>");Performance characteristics
The asymptotics are favourable in every direction that matters:
| Operation | Time | Space | Notes |
|---|---|---|---|
| Build full tree | O(N) | O(N) | One SHA-256 per node, ~2N nodes total |
| Update one leaf | O(log N) | O(log N) | Recompute the affected path only |
| Inclusion proof | O(log N) | O(log N) | Sibling hashes from leaf to root |
| Verify inclusion | O(log N) | O(1) | Replay the proof, compare to root |
For NakedPnL's daily build, even at 100,000 active entities, the full tree is ~200,000 SHA-256 operations — well under a second on any reasonable server. The /api/cron/ots-anchor cron runs in ~2–5 seconds end-to-end, dominated by the four parallel calendar HTTP submissions, not the tree build.
Why a Merkle tree and not a flat list of hashes
An apparently simpler design is to publish a daily list of all chain heads and anchor a single hash of the list. That works for whole-set verification but is strictly worse along three axes:
- Inclusion proofs require the entire list. To prove one chain head was committed, you must download all N chain heads. With a Merkle tree, you need O(log N) sibling hashes.
- Determinism still requires sorting. The flat-list approach has the same ordering problem and gains nothing from skipping the tree structure.
- Updates are O(N). If we ever needed to maintain a running tree, a flat list would require rehashing the entire concatenation on every change. A Merkle tree updates only the affected log-depth path.
Merkle tree vs Patricia trie
Ethereum and several other systems use a Merkle Patricia trie — a compressed radix trie where each node also carries a hash. The trie supports efficient key-value lookups (proves both inclusion and the value at a key) at the cost of significantly more complex construction and serialisation. NakedPnL does not need key-value lookups in the anchor: the leaves are unique by (entityId, chainType), and the per-trader chain hash is an opaque digest. A plain binary Merkle tree is the right tool — simpler implementation, identical security guarantees for our use case, and a much smaller proof format.
Composing with the per-trader hash chain
The Merkle tree commits to chain heads, not to individual NavSnapshots. To verify that a specific NavSnapshot from three months ago was part of the registry on its anchor date, the verifier walks two cryptographic structures:
- Within the trader's chain: walk forward from genesis, recomputing every chainHash, until reaching the chain head that was committed on the anchor date.
- Across the registry: provide a Merkle inclusion proof showing that chain head was a leaf in the daily Merkle tree, whose root was anchored to Bitcoin via OpenTimestamps.
Together these two proofs produce a single end-to-end claim: "this NavSnapshot was committed to a Bitcoin block at time T, and any retroactive modification would require breaking SHA-256 second-preimage resistance or rewriting Bitcoin history." The full chain is independently re-verifiable using only public data and standard cryptographic tools — no proprietary verifier, no privileged API access, no trust in NakedPnL's ongoing existence.