Why Canonical JSON Matters for Cryptographic Hashing of API Responses
Two semantically identical JSON documents can produce different SHA-256 digests. RFC 8785 (JCS) is the discipline that closes the gap — and the reason NakedPnL chains hold up.
- Two JSON documents with the same logical content can serialise into different bytes, and SHA-256 will report them as different. Canonicalisation is the discipline of forcing one representation per logical document.
- RFC 8785 (JSON Canonicalization Scheme, JCS) specifies the rules: lexicographic sort of object keys, no insignificant whitespace, ECMA-404-compliant number formatting, and Unicode-normalised strings.
- NakedPnL canonicalises every venue API response (Binance, Bybit, OKX, IBKR, Kalshi, Polymarket) before SHA-256 hashing, so a re-fetched response from the same exchange snapshot always produces the same contentHash.
- JSON.stringify is not enough. It preserves insertion order, can produce non-deterministic number formatting, and does not normalise Unicode — three independent sources of unstable hashes.
Hashing is sensitive to bytes, not meaning. Two JSON documents that any human (or any JSON parser) would call identical can serialise into completely different byte sequences, and SHA-256 will report them as different objects. For a system whose entire integrity story rests on "this is the same data we hashed before", that is a fatal property — unless every byte sequence that ever gets hashed has been pinned to a canonical form first.
This article walks through the canonicalisation problem, the relevant standard (RFC 8785, the JSON Canonicalization Scheme), and how NakedPnL applies it to raw venue API responses before computing contentHash. If you take only one thing away: never hash the result of vanilla JSON.stringify on data you did not produce yourself.
Why JSON is non-canonical by default
JSON has a permissive grammar. The spec (RFC 8259, ECMA-404) treats several presentational choices as equivalent. The serialiser is free to:
- Order object keys however it likes — JSON does not specify an order, and most implementations preserve insertion order, which depends on how the object was built.
- Add or omit insignificant whitespace between tokens. Two-space indent vs four-space indent vs single line are all valid.
- Format the same number multiple ways: 1, 1.0, 1e0, 0.1e1 are all valid representations of the integer 1.
- Choose Unicode escape forms. The character ö can be a single byte 0xC3 0xB6 in UTF-8, or escaped as \u00f6, or written as the decomposed pair \u006f\u0308 (o + combining diaeresis).
Each of these freedoms is irrelevant for parsing — JSON.parse handles all variants identically. But each is fatal for hashing. Two semantically equivalent documents that differ in any of these dimensions will produce different SHA-256 digests, and any verifier downstream will report the chain as broken.
A worked example
Consider three byte sequences that any human would call "the same JSON":
// A: pretty-printed, keys insertion-ordered.
{
"asset": "BTC",
"balance": 1.5,
"trader": "alice"
}
// B: minified, keys reordered.
{"trader":"alice","balance":1.5,"asset":"BTC"}
// C: same as A, but balance written as 1.500.
{
"asset": "BTC",
"balance": 1.500,
"trader": "alice"
}# Hashing the byte sequences directly:
echo -n '{"asset":"BTC","balance":1.5,"trader":"alice"}' | sha256sum
# 6b7e... (some digest)
echo -n '{"trader":"alice","balance":1.5,"asset":"BTC"}' | sha256sum
# 9c2a... (different digest)
echo -n '{"asset":"BTC","balance":1.500,"trader":"alice"}' | sha256sum
# 3f81... (different digest)Three different digests for one logical document. Now apply canonicalisation: sort keys lexicographically, strip insignificant whitespace, normalise the number 1.500 to 1.5. All three documents collapse to the same canonical form — and the same SHA-256 digest:
# Canonical form (RFC 8785 / JCS):
echo -n '{"asset":"BTC","balance":1.5,"trader":"alice"}' | sha256sum
# 8d6c... (one consistent digest)
# Inputs A, B, and C all canonicalise to the byte sequence above.
# Once canonicalised, contentHash is stable across SDK versions,
# language runtimes, and re-fetches from the same venue.RFC 8785 — the JSON Canonicalization Scheme
RFC 8785, published in 2020, specifies the JSON Canonicalization Scheme (JCS). It is the most widely cited standard for this problem and the one NakedPnL targets. JCS makes four normative requirements:
- Object keys must be sorted in code-point order — strictly the lexicographic ordering of the UTF-16 code units, applied recursively to nested objects.
- No insignificant whitespace anywhere in the output. The serialised form is a single line with no spaces between tokens.
- Number formatting must follow ECMA-262's Number.prototype.toString algorithm, ensuring 1, 1.0, and 1e0 all serialise to "1"; 1.5 and 1.500 both serialise to "1.5"; very small or very large numbers use the shortest round-trippable form.
- Strings are serialised with a fixed escape table — only U+0000 through U+001F, plus \\, \" and \b/\f/\n/\r/\t are escaped; all other characters are emitted as their UTF-8 byte sequences.
The output is bytewise stable: any two JCS-conformant serialisers, given the same in-memory JSON value, produce the exact same byte sequence. That is the property a hash chain needs.
How NakedPnL canonicalises in practice
NakedPnL targets the JCS spec without taking a JCS dependency. The implementation in lib/calculation/audit-hash.ts uses Node's built-in crypto module and a stableStringify helper that recursively sorts object keys lexicographically. For the venue responses NakedPnL hashes (numeric strings from Binance/Bybit/OKX, ISO date strings from IBKR, integer cent values from Kalshi), this implementation is JCS-compatible because the venues emit ASCII-only field names and they ship numbers as decimal strings, not floats.
function stableStringify(value: unknown): string {
return JSON.stringify(value, (_key, val) => {
if (val !== null && typeof val === "object" && !Array.isArray(val)) {
const sorted: Record<string, unknown> = {};
for (const k of Object.keys(val).sort()) {
sorted[k] = val[k];
}
return sorted;
}
return val;
});
}
export function contentHash(rawResponse: object): string {
const canonical = stableStringify(rawResponse);
return createHash("sha256").update(canonical).digest("hex");
}The function works by leveraging the second argument of JSON.stringify (a replacer callback). For any plain object encountered during serialisation, the replacer returns a new object with the keys sorted via Object.keys(val).sort(). Arrays are passed through unchanged — JSON specifies array element order as semantically meaningful, and JCS preserves it. The result is a deterministic, single-line JSON byte sequence ready to feed into SHA-256.
Three implementation notes that matter for review:
- Recursion is implicit: the JSON.stringify replacer is invoked on every nested value during serialisation, so deeply nested objects also have their keys sorted at every level.
- Arrays are not sorted: the replacer's check for !Array.isArray(val) intentionally skips arrays, because their element order carries meaning (a list of trades is in chronological order, not alphabetical).
- Numbers are emitted by V8's standard JSON formatter, which uses the shortest round-trippable form. For the integer-string fields the venues use, this is bit-for-bit equivalent to JCS. For native floats, V8 already converges with JCS for almost all values; the edge cases (subnormals, very large doubles) do not appear in venue payloads.
A minimal JCS-style canonicalize function
If you want to verify a NakedPnL chain in your own environment without taking a dependency, the following ~30-line function is a JCS-compatible canonicaliser for the kinds of payloads the venues emit (no NaN, no Infinity, no exotic Unicode):
function canonicalize(value) {
if (value === null) return "null";
if (typeof value === "boolean") return value ? "true" : "false";
if (typeof value === "number") {
if (!Number.isFinite(value)) {
throw new Error("JCS does not encode NaN or Infinity");
}
// V8's default Number.prototype.toString is JCS-compatible
// for the value range used by exchange APIs.
return String(value);
}
if (typeof value === "string") {
return JSON.stringify(value); // Handles required escapes.
}
if (Array.isArray(value)) {
return "[" + value.map(canonicalize).join(",") + "]";
}
if (typeof value === "object") {
const keys = Object.keys(value).sort(); // Lexicographic key order.
const parts = keys.map(
(k) => JSON.stringify(k) + ":" + canonicalize(value[k]),
);
return "{" + parts.join(",") + "}";
}
throw new Error("Unsupported value type: " + typeof value);
}
// Hash any in-memory JSON value to a stable SHA-256 digest.
async function hashCanonical(value) {
const canonical = canonicalize(value);
const bytes = new TextEncoder().encode(canonical);
const digest = await crypto.subtle.digest("SHA-256", bytes);
return Array.from(new Uint8Array(digest))
.map((b) => b.toString(16).padStart(2, "0"))
.join("");
}Drop this into a browser console, paste in a single rawResponse field from /api/chain/[handle], and you should produce the same contentHash that the chain stores. If you don't, the issue is upstream of the chain — most likely a JSON parser somewhere mutating the data before canonicalisation.
Why JSON.stringify alone is not enough
It is tempting to assume that a single call to JSON.stringify on a parsed object solves the canonicalisation problem. It does not — for three independent reasons:
- Insertion-order key ordering: JSON.stringify preserves object insertion order, which depends on how the object was constructed. Two parsers reading the same input may build objects with the same logical content but different key orderings, producing different stringify output.
- Number formatting edge cases: while V8's stringify is mostly JCS-compatible, other runtimes diverge (e.g., older Python implementations preserved the literal source representation 1.500). For cross-language verification, you need an explicit format rule.
- Unicode handling: JSON.stringify does not perform Unicode normalisation. Two strings that compare equal after NFC normalisation can serialise to different byte sequences if one was composed and the other decomposed.
| Property | JSON.stringify | JCS / canonicalize |
|---|---|---|
| Object key order | Insertion order | Lexicographic (deterministic) |
| Whitespace | Configurable, default none | None (mandatory) |
| Number format | V8 implementation default | ECMA-262 toString (mandatory) |
| Unicode normalisation | None | Required for NFC-equivalent strings |
| NaN / Infinity | Allowed (non-standard) | Rejected |
For NakedPnL's specific use case — hashing exchange API responses where the data is ASCII-clean numeric strings, ISO dates, and known field names — vanilla stringify with key sorting is a strict subset of JCS that produces correct, stable hashes. Outside that envelope, full JCS conformance is the right target.
The Unicode pitfall in detail
Unicode normalisation is the canonicalisation problem most teams underestimate. Consider the string "München". The character ü has two valid Unicode encodings:
- Composed (NFC): the single code point U+00FC, encoded as 0xC3 0xBC in UTF-8.
- Decomposed (NFD): the pair U+0075 (Latin small letter u) + U+0308 (combining diaeresis), encoded as 0x75 0xCC 0x88 in UTF-8.
Both render identically. Both are equal under string comparison after Unicode normalisation. Neither produces the same SHA-256 digest as the other. If a venue ever returns a trader's name in an unexpected normalisation form (different OS, different SDK, different encoding library), and the chain hashes the byte sequence directly, the hash diverges.
The mitigation is to apply Unicode NFC normalisation to all string values before serialisation. JCS mandates this; the canonicalize function above can be extended with String.prototype.normalize("NFC") on every string input. NakedPnL's current venue responses do not contain non-ASCII strings (field names are English; numeric values are ASCII digits), so the issue does not arise in practice — but the discipline is worth knowing if you ever extend the chain to user-supplied content.
What this discipline buys you
Canonicalisation looks like a small detail. It is, in practice, the difference between a chain that holds up across SDK upgrades and one that fails the first time a venue rolls out a new serialiser. The properties NakedPnL gets from doing it right:
- Re-fetchability: re-pulling the same venue snapshot tomorrow on a different language runtime produces the same contentHash. Verifiers in any environment converge.
- Cross-language verification: a Python verifier and a JavaScript verifier hash to the same byte sequence. The /docs/verification page can publish parallel snippets that produce identical results.
- SDK independence: upgrading the Binance, Bybit, or OKX SDK does not break historical hashes. The canonical form is decoupled from the SDK's serialisation choices.
- Forensic stability: a regulator pulling a 3-year-old chain runs the same canonicaliser and gets the same digests, regardless of how the JSON parser ecosystem has evolved in the interim.