Verification guide

Merkle Trees in Investment Data: Why One Hash Verifies Thousands

How Merkle trees compress thousands of investment performance records into a single 32-byte root, and why deterministic ordering is the part everyone gets wrong.

By NakedPnL Research·May 7, 2026·13 min read

TL;DR

A Merkle tree is a binary tree of SHA-256 hashes that compresses any number of records into a single 32-byte root, while still allowing O(log N) proof of inclusion for any individual record.
Determinism is the entire game: NakedPnL sorts leaves lexicographically by entityId + chainType before hashing, so the same set of inputs always yields the same root regardless of database query order.
Verifying a leaf is in the tree requires only the leaf, the root, and ⌈log₂ N⌉ sibling hashes — for a tree of 10,000 traders, that is 14 hashes (~448 bytes).
NakedPnL builds one Merkle tree per day from every active entity's chain head and submits the root to OpenTimestamps for Bitcoin anchoring. The implementation is in lib/ots/merkle.ts.

Merkle trees are one of the rare cryptographic primitives that have escaped the academy and become load-bearing infrastructure for the public internet. Git uses them to identify commits. Bitcoin uses them to commit to every transaction in a block. Certificate Transparency uses them to detect rogue TLS certificates. They show up wherever a system needs to commit to a large set of items with a single fixed-size identifier — and where any party should later be able to prove that a specific item was in the set, without having to download the rest.

NakedPnL uses a Merkle tree to compress every active trader's chain head into a single 32-byte root each day. That root is what we send to OpenTimestamps to anchor in Bitcoin. This article explains the construction, the part most implementations get wrong (deterministic ordering), and how a verifier can prove a specific trader's chain was committed on a specific date in fewer than 500 bytes.

What a Merkle tree is

A Merkle tree is a binary tree built bottom-up by hashing. Given a list of N data items:

Hash each item to produce N leaf digests.
Pair adjacent leaves and hash each pair: hash(left || right). This produces ⌈N/2⌉ digests at level 1.
Repeat at every level until a single digest remains. That digest is the Merkle root.
If a level has an odd count, duplicate the last hash and pair it with itself before hashing.

The root has two structural properties that make it useful. First, it is a 32-byte fingerprint of the entire input set — modify any data item and the root changes (with the avalanche property of SHA-256). Second, you can prove that a specific item is in the tree by providing only the path from its leaf up to the root, which contains roughly log₂ N sibling hashes. The verifier recomputes the root using the candidate leaf and the proof path; if it matches the published root, the item was in the tree.

Level	Index 0	Index 1	Index 2	Index 3
0 (leaves)	L0 = H(d0)	L1 = H(d1)	L2 = H(d2)	L3 = H(d3)
1	N0 = H(L0 \|\| L1)	N1 = H(L2 \|\| L3)	—	—
2 (root)	R = H(N0 \|\| N1)	—	—	—

A 4-leaf Merkle tree (illustrative)

To prove that data item d2 was in the tree, you provide L3 (the right sibling of L2) and N0 (the left sibling of N1). The verifier computes L2 = H(d2), then N1 = H(L2 || L3), then R' = H(N0 || N1), and compares R' against the published R.

The deterministic-ordering trap

The single biggest source of broken Merkle implementations is non-deterministic leaf ordering. If two parties build the same conceptual tree but feed the leaves in different orders, they get different roots — even when the input data is identical. SQL queries do not guarantee row order without an explicit ORDER BY, in-memory hash maps iterate in implementation-defined order, and concurrent code can interleave inserts. Any of these will produce a Merkle root that fails to match the one a counterparty independently computes.

NakedPnL solves this by sorting leaves on a deterministic key before hashing. The implementation in lib/ots/merkle.ts uses entityId + "|" + chainType, with localeCompare for stable lexicographic ordering:

function sortKey(leaf: MerkleLeaf): string {
  return `${leaf.entityId}|${leaf.chainType}`;
}

function hashLeaf(leaf: MerkleLeaf): string {
  // The leaf hash itself binds entityId, chainType, and the chain head.
  return sha256Hex(`${leaf.entityId}|${leaf.chainType}|${leaf.chainHash}`);
}

export function buildMerkleRoot(leaves: MerkleLeaf[]): string {
  if (leaves.length === 0) return EMPTY_REGISTRY_HASH;

  const sorted = [...leaves].sort((a, b) =>
    sortKey(a).localeCompare(sortKey(b)),
  );

  let level = sorted.map(hashLeaf);
  while (level.length > 1) {
    const next: string[] = [];
    for (let i = 0; i < level.length; i += 2) {
      const left = level[i];
      const right = i + 1 < level.length ? level[i + 1] : level[i];
      next.push(sha256Hex(left + right));
    }
    level = next;
  }
  return level[0];
}

Deterministic leaf ordering — lib/ots/merkle.ts

Two things to notice. First, the leaf hash itself includes entityId and chainType — so even if a malicious actor reordered the input array, the leaves would still be bound to their identities. Second, the spread copy [...leaves] guarantees the input array is not mutated; the sort runs on a fresh array.

Worked example with five chain heads

Suppose on a given day NakedPnL has five active traders with the following chain heads (abbreviated to 6 hex chars for readability):

entityId	chainType	chainHash (head)	leaf hash
alice	NAV	ab12cd	L0 = H("alice\|NAV\|ab12cd")
bob	NAV	ef34gh	L1 = H("bob\|NAV\|ef34gh")
carol	NAV	ij56kl	L2 = H("carol\|NAV\|ij56kl")
dave	NAV	mn78op	L3 = H("dave\|NAV\|mn78op")
eve	NAV	qr90st	L4 = H("eve\|NAV\|qr90st")

Five active chain heads, sorted by sortKey

With 5 leaves, level 1 has ⌈5/2⌉ = 3 nodes (the last leaf is duplicated to pair with itself):

N0 = H(L0 || L1)
N1 = H(L2 || L3)
N2 = H(L4 || L4) // last leaf paired with itself

Level 2 again has ⌈3/2⌉ = 2 nodes:

M0 = H(N0 || N1)
M1 = H(N2 || N2) // odd count again

Level 3 has the root: R = H(M0 || M1). To prove carol's chain head was committed in this Merkle tree, you provide:

carol's leaf data: entityId="carol", chainType="NAV", chainHash="ij56kl". The verifier computes L2 = H("carol|NAV|ij56kl").
Sibling at level 0: L3 (position right). Verifier computes N1' = H(L2 || L3).
Sibling at level 1: N0 (position left). Verifier computes M0' = H(N0 || N1').
Sibling at level 2: M1 (position right). Verifier computes R' = H(M0' || M1).

If R' equals the published root R, carol's chain head is proven to be in the tree. The proof is 3 hashes (96 bytes) plus the leaf data — and the verifier never had to know about alice, bob, dave, or eve.

Why proof of inclusion matters in practice

When NakedPnL anchors a daily Merkle root to Bitcoin via OpenTimestamps, the root is the only thing committed. Six months later, a third party — an allocator doing a forensic check, a journalist verifying a viral track record, a regulator running due diligence — wants to confirm that a specific trader's chain head on a specific date was in the anchored tree. Three options:

Re-fetch every active chain head from that date and rebuild the entire Merkle tree, comparing the resulting root against the anchored root. This is the strongest check but the most expensive.
Request a Merkle inclusion proof from NakedPnL: the leaf data plus ⌈log₂ N⌉ sibling hashes. Verify the proof against the anchored root with the verifyMerkleProof function from lib/ots/merkle.ts.
Trust the published trader profile alone. (No cryptographic guarantee — included only to be explicit about the trust model.)

Option 2 is the design point. For 10,000 active entities, ⌈log₂ 10000⌉ = 14, so the proof is 14 × 32 = 448 bytes plus a small leaf record. The verifier can run entirely in the browser with the Web Crypto API, against an anchored root that they independently fetched from a Bitcoin block header. NakedPnL is not in the trust path.

async function sha256Hex(input: string): Promise<string> {
  const bytes = new TextEncoder().encode(input);
  const digest = await crypto.subtle.digest("SHA-256", bytes);
  return Array.from(new Uint8Array(digest))
    .map((b) => b.toString(16).padStart(2, "0"))
    .join("");
}

interface ProofStep {
  hash: string;
  position: "left" | "right";
}

async function verifyMerkleProof(
  leafHash: string,
  proof: ProofStep[],
  expectedRoot: string,
): Promise<boolean> {
  let current = leafHash;
  for (const step of proof) {
    current = step.position === "left"
      ? await sha256Hex(step.hash + current)
      : await sha256Hex(current + step.hash);
  }
  return current === expectedRoot;
}

// Example: prove carol's chain head was in the tree.
const carolLeafHash = await sha256Hex("carol|NAV|ij56kl");
const proof: ProofStep[] = [
  { hash: "<L3>", position: "right" },
  { hash: "<N0>", position: "left" },
  { hash: "<M1>", position: "right" },
];
const ok = await verifyMerkleProof(carolLeafHash, proof, "<published root>");

Browser-side Merkle proof verification (port of lib/ots/merkle.ts)

Performance characteristics

The asymptotics are favourable in every direction that matters:

Operation	Time	Space	Notes
Build full tree	O(N)	O(N)	One SHA-256 per node, ~2N nodes total
Update one leaf	O(log N)	O(log N)	Recompute the affected path only
Inclusion proof	O(log N)	O(log N)	Sibling hashes from leaf to root
Verify inclusion	O(log N)	O(1)	Replay the proof, compare to root

Merkle tree complexity for N leaves

For NakedPnL's daily build, even at 100,000 active entities, the full tree is ~200,000 SHA-256 operations — well under a second on any reasonable server. The /api/cron/ots-anchor cron runs in ~2–5 seconds end-to-end, dominated by the four parallel calendar HTTP submissions, not the tree build.

Why a Merkle tree and not a flat list of hashes

An apparently simpler design is to publish a daily list of all chain heads and anchor a single hash of the list. That works for whole-set verification but is strictly worse along three axes:

Inclusion proofs require the entire list. To prove one chain head was committed, you must download all N chain heads. With a Merkle tree, you need O(log N) sibling hashes.
Determinism still requires sorting. The flat-list approach has the same ordering problem and gains nothing from skipping the tree structure.
Updates are O(N). If we ever needed to maintain a running tree, a flat list would require rehashing the entire concatenation on every change. A Merkle tree updates only the affected log-depth path.

Merkle tree vs Patricia trie

Ethereum and several other systems use a Merkle Patricia trie — a compressed radix trie where each node also carries a hash. The trie supports efficient key-value lookups (proves both inclusion and the value at a key) at the cost of significantly more complex construction and serialisation. NakedPnL does not need key-value lookups in the anchor: the leaves are unique by (entityId, chainType), and the per-trader chain hash is an opaque digest. A plain binary Merkle tree is the right tool — simpler implementation, identical security guarantees for our use case, and a much smaller proof format.

Composing with the per-trader hash chain

The Merkle tree commits to chain heads, not to individual NavSnapshots. To verify that a specific NavSnapshot from three months ago was part of the registry on its anchor date, the verifier walks two cryptographic structures:

Within the trader's chain: walk forward from genesis, recomputing every chainHash, until reaching the chain head that was committed on the anchor date.
Across the registry: provide a Merkle inclusion proof showing that chain head was a leaf in the daily Merkle tree, whose root was anchored to Bitcoin via OpenTimestamps.

Together these two proofs produce a single end-to-end claim: "this NavSnapshot was committed to a Bitcoin block at time T, and any retroactive modification would require breaking SHA-256 second-preimage resistance or rewriting Bitcoin history." The full chain is independently re-verifiable using only public data and standard cryptographic tools — no proprietary verifier, no privileged API access, no trust in NakedPnL's ongoing existence.

Frequently asked questions

How does a Merkle tree differ from a Merkle Patricia trie?

A plain Merkle tree is a binary tree of hashes built bottom-up over an ordered list of leaves. A Merkle Patricia trie is a radix-trie where each node carries a hash, used to prove key-value mappings (e.g., Ethereum state). Patricia tries support more operations (key lookup, range proofs), at the cost of more complex serialisation and larger proofs in many cases. NakedPnL only needs proof-of-inclusion over a fixed daily snapshot, so a binary Merkle tree is simpler and produces shorter proofs.

Why not just publish a flat list of all chain heads each day?

A flat list would force any verifier proving one trader's inclusion to download every other trader's chain head. With a Merkle tree, the verifier needs only ⌈log₂ N⌉ sibling hashes — about 14 hashes (~448 bytes) for 10,000 traders, regardless of total set size. The succinct proof property is the entire reason to choose a tree structure.

What if two traders had exactly the same chain head?

It cannot happen if both chains share genesis — SHA-256 second-preimage resistance prevents two distinct histories from producing the same chain head. But even in a hypothetical collision, the leaf hash includes entityId and chainType (see lib/ots/merkle.ts: hashLeaf). Two leaves with identical chainHashes but different entity IDs still produce distinct leaf hashes, so the tree structure remains well-defined.

What are the performance characteristics on a real-world dataset?

Building a Merkle tree over N leaves takes O(N) hashes and O(N) memory; with N = 100,000 that is ~200,000 SHA-256 operations and a few megabytes of intermediate state — well under a second on a modern server. Verifying a single inclusion proof takes O(log N) hashes, about 17 SHA-256 operations for 100,000 leaves, fast enough to run in any browser. The bottleneck in our daily anchor cron is HTTP latency to the OpenTimestamps calendars, not the tree build itself.

Can a malicious operator construct a fake Merkle tree?

Constructing any Merkle tree is trivial — anyone with a list of leaves can do it. What is hard is constructing a tree whose root matches a specific previously-anchored Bitcoin attestation while including chosen fake leaves. That requires a SHA-256 collision or second-preimage attack, which is computationally infeasible. The OpenTimestamps anchor is what binds the daily root to a moment in time — without it, the tree is just arithmetic.

References