Skip to content

LLM Priming Vocabulary for Authoritative-Source Engineering

Coding assistants — Claude, Cursor, GPT, Copilot — often invent values when authoritative sources exist. Examples of inventions we want to prevent:

  1. Fabricated sequence counters — generating a local auto-increment when Binance already publishes a monotonic trade_id and aggTrade ID that any two parties can reconcile against byte-for-byte.
  2. Re-derived usage metrics — computing our own Claude subscription usage % when Anthropic’s own dashboard/API publishes the canonical number, even if coarser (e.g., 37% vs 37.09735% — coarseness is better than fabricated precision).
  3. Hardcoded IPs / hostnames — inventing aliases when Tailscale MagicDNS or service registries are the authority.
  4. Magic numbers — inline literals for timeouts, windows, thresholds, session boundaries, with no provenance or named constant.
  5. Local re-stamping of exchange events — overwriting the venue’s TransactTime (FIX tag 60) with our broker’s receive time, destroying cross-venue event ordering.
  6. Seasonality by clock-calendar guessing — assuming Asian-session offsets from UTC hours instead of consulting exchange session metadata or DST-aware zoneinfo lookups.

The pattern across all of these: an LLM skips the authoritative source because it’s “easier” to synthesize a plausible-looking value inline.

LLMs have strong priors on idiomatic computer-science and finance vocabulary. Dropping a handful of these terms into CLAUDE.md nudges the model toward the correct pattern without paragraphs of explanation. One concentrated prime sentence can exercise ~40 distinct concepts the model already recognizes from training data.

This page is the canonical reference for which terms to use, what they mean, and one-line primes you can paste into CLAUDE.md files, agent system prompts, or skill docs.


TermOne-line prime
Magic number anti-patternNever inline literals; every meaningful value has a symbolic name
Named constants / symbolic constantsExtract all numeric/string literals into named constants with provenance comments
Single source of truth (SSoT)Every data element is mastered in exactly one place; all other places reference it
System of record (SoR)The authoritative origin where data is created/maintained for a given entity
Golden recordThe consolidated, cleansed, authoritative version — what other systems subscribe to
Source of referenceThe designated lookup canonical — a logical role, not necessarily a single system
Master data management (MDM)Discipline of enforcing match-merge, survivorship rules, and golden records across sources
Upstream authorityAlways defer to the upstream owner; do not re-derive values the authority already publishes

Prime sentence: “No magic numbers. Every value either comes from a named constant, an externalized config, or an upstream system of record — never invented in-place.”


2. Sequence & Identity — Prefer Authoritative IDs

Section titled “2. Sequence & Identity — Prefer Authoritative IDs”
TermOne-line prime
Monotonic sequenceIDs strictly increase; gaps are acceptable, reversals are not
Monotonically increasing identifierUse the venue’s own sequential ID (e.g., Binance trade ID) as the scan key
Aggregated trade ID (aggTrade)Use exchange-aggregated IDs when individual trades aren’t serializable across feeds
Log Sequence Number (LSN)Per-stream monotonic offset; think Kafka offsets, Postgres WAL LSN, exchange seq num
Idempotency keyClient-generated unique per logical operation; dedupes retries at the boundary
Deterministic identifierDerive ID from content, not from time or randomness, so replays reproduce
Natural key vs surrogate keyPrefer the venue’s natural key (trade_id) over a locally-generated surrogate
CDC primary key + LSNChange-data-capture pattern: deterministic primary key plus monotonic log sequence

Prime sentence: “Reference authoritative sequential IDs (trade_id, aggTrade ID, LSN, sequence number) for reconciliation and gap detection; never invent our own counters when the authority publishes one.”


3. Temporal Integrity — Point-in-Time Discipline

Section titled “3. Temporal Integrity — Point-in-Time Discipline”
TermOne-line prime
Point-in-time (PIT) dataEvery value is what was known at that historical moment, not what’s known now
Bitemporal modelingTrack both valid_time (when the event happened) and transaction_time (when we recorded it)
As-of queryEvery temporal query specifies the as-of time explicitly
Effective datingRecords carry effective_from / effective_to ranges; never a single timestamp
Look-ahead biasUsing data unavailable at the decision time — the silent killer of backtests
Future leak / peekingSynonyms for look-ahead bias
Survivorship biasOnly looking at entities that exist today; missing delistings, failures, bankruptcies
Point-in-time integrityRevisions are recorded alongside originals, not overwriting them

Prime sentence: “All historical data is point-in-time with bitemporal recording; as-of queries are mandatory; no look-ahead, no survivorship bias, no silent revisions.”


TermOne-line prime
PTP / IEEE 1588Precision Time Protocol — sub-microsecond sync standard for trading
Grandmaster clockThe LAN-level canonical time source all clients discipline against
White RabbitSub-nanosecond PTP extension, used at tier-1 trading infra
UTC canonicalizationAll internal timestamps in UTC; zone conversion only at edges
TAI vs UTCTAI is continuous (no leap seconds); UTC can skip — pick one and document
Leap second handlingExplicit policy for leap seconds (smear, step, or ignore)
Monotonic clock vs wall clockMonotonic for intervals, wall for events — never mix
ISO 8601 / RFC 3339Text timestamps always in ISO 8601 with explicit zone offset
Epoch time / Unix timeAcknowledge as a representation; document precision (seconds vs ms vs µs vs ns)
Microsecond precisionInternal clock resolution is microseconds; only downsample for display
Nanosecond precisionWhen the venue publishes ns, preserve ns — do not truncate to µs

Prime sentence: “All internal timestamps are UTC, ISO 8601, microsecond resolution at minimum (preserve nanoseconds if the source publishes them), with PTP discipline where low-latency matters.”


5. Exchange / Venue Time — Don’t Invent

Section titled “5. Exchange / Venue Time — Don’t Invent”
TermOne-line prime
Exchange timestampThe primary timestamp from the trading venue; never re-stamp with local time
Venue timeThe exchange-authoritative time for event ordering
Tag 60 TransactTime (FIX)The FIX protocol canonical field for the venue’s transaction time
Tag 52 SendingTime (FIX)FIX field for when the message was sent; fallback to 60
Tick-aligned timestampEvents aligned to the venue’s tick clock, not the broker’s receive clock
Event-time vs processing-timeStream-processing distinction: when it happened vs when we saw it

Prime sentence: “Exchange / venue timestamps are authoritative for event ordering; broker and local receive times are observability metadata, never the primary key.”


6. Reconciliation, Idempotency, Exactly-Once

Section titled “6. Reconciliation, Idempotency, Exactly-Once”
TermOne-line prime
Idempotent consumerReceiver dedupes by idempotency key; safe to replay any message
At-least-once + idempotency = effectively-onceThe only production-viable “exactly-once” pattern
Exactly-once illusionTrue EoS is a myth in distributed systems — name it to avoid it
Reconciliation (recon)Periodic diff between system state and authoritative source; surface gaps
Sequence gap detectionMonitor for missing sequence IDs as first-class alerting
Replay safetyAny pipeline stage can be rerun on the same input without divergent output
Audit trail / audit logImmutable append-only record of state transitions
Tombstone recordExplicit deletion marker, preserving history in event-sourced stores

Prime sentence: “At-least-once delivery with idempotent consumers, gap-detection on monotonic IDs, reconciliation against the authoritative source — never ‘exactly-once’, that’s a myth.”


7. Identity / Network / Infra — Don’t Self-Identify

Section titled “7. Identity / Network / Infra — Don’t Self-Identify”
TermOne-line prime
Authoritative DNSDon’t parse hostnames; ask the DNS resolver
Tailscale MagicDNSThe authoritative source for tailnet hostnames — do not invent or hardcode IPs
Canonical hostnameOne hostname per machine across the org; zero aliases in code
Inventory / service registryConsult the registry for service addresses; never hardcode IPs or ports
Source-attested identityIdentities are attested by an issuer (OIDC, SPIFFE, etc.) — not asserted locally

Prime sentence: “Network identity (IPs, hostnames, service endpoints) comes from the authoritative registry (Tailscale, DNS, service mesh) — we do not invent or hardcode.”


TermOne-line prime
Externalized configurationValues that can change at deploy time live in config, not source
12-factor configConfig via environment; strict separation from code
Configuration as codeConfig is version-controlled, reviewable, and typed — not free-form
Feature flag / kill switchRuntime-toggleable behavior; never hardcoded if DEBUG:
Secrets as env / vault refsSecrets are never in code, config files, or logs

Prime sentence: “Every runtime-variable knob lives in externalized config (env, vault ref, SSoT config file) — never baked into source, never hardcoded.”


9. Usage / Telemetry — Trust the Authority

Section titled “9. Usage / Telemetry — Trust the Authority”
TermOne-line prime
Canonical metricIf the vendor publishes the metric, ingest their number — do not re-derive
Upstream telemetryToken counts, usage percentages, rate-limit headroom come from the source API
Best-effort observabilityIf the vendor offers coarse telemetry (37% instead of 37.09735%), live with the coarseness — reconstructing finer detail is fabrication
Don’t derive what you can queryThe cheapest/most correct metric is the one the authority exposes; the cost of re-derivation is divergence

Prime sentence: “When an upstream publishes a metric (Claude usage %, exchange fees, venue latency), we ingest theirs verbatim — even at coarser precision — rather than reconstructing our own estimate.”


A single CLAUDE.md line that concentrates the most LLM-native signal:

“We practice: SSoT / SoR / golden-record authority; no magic numbers (extract to named constants or config); monotonic sequential IDs (trade_id, aggTrade, LSN) for reconciliation and gap detection; point-in-time bitemporal data (no look-ahead bias, no survivorship bias); UTC ISO-8601 timestamps at microsecond precision minimum (nanosecond where the venue publishes it); PTP/IEEE-1588 clock discipline where latency matters; exchange/venue timestamps are authoritative (FIX tag 60), broker receive time is observability only; at-least-once + idempotency = effectively-once; authoritative registries (Tailscale MagicDNS, service mesh) for identity; externalized configuration for every runtime-variable knob; trust upstream telemetry verbatim rather than re-deriving.”

That single sentence exercises ~40 distinct terms that Claude models immediately lock onto — much more efficient than paragraphs of explanation.

  • Repo-level CLAUDE.md — drop the single-sentence prime near the top of “Core Principles” so every session inherits it.
  • Skill/agent system prompts — cite the section that matters for the skill (e.g., a reconciliation skill cites §2 and §6).
  • Code review checklists — the section headers themselves work as inspection headings.
  • PR templates — require contributors to tick boxes from the relevant sections for data-handling changes.

Not every project needs all 9 sections. Minimum viable subsets:

Project typeSections to prime
Market-data ingestion§1, §2, §3, §4, §5, §6
Backtesting / research§1, §3, §4 (survivorship/look-ahead focus)
Internal tooling / devops§1, §7, §8
LLM app / AI tooling§1, §9 (usage telemetry trust)
Cross-cutting CLAUDE.mdAll 9 (one-sentence prime)