LLM Priming Vocabulary for Authoritative-Source Engineering
Motivation
Section titled “Motivation”Coding assistants — Claude, Cursor, GPT, Copilot — often invent values when authoritative sources exist. Examples of inventions we want to prevent:
- Fabricated sequence counters — generating a local auto-increment when Binance already publishes a monotonic
trade_idandaggTradeID that any two parties can reconcile against byte-for-byte. - Re-derived usage metrics — computing our own Claude subscription usage % when Anthropic’s own dashboard/API publishes the canonical number, even if coarser (e.g., 37% vs 37.09735% — coarseness is better than fabricated precision).
- Hardcoded IPs / hostnames — inventing aliases when Tailscale MagicDNS or service registries are the authority.
- Magic numbers — inline literals for timeouts, windows, thresholds, session boundaries, with no provenance or named constant.
- Local re-stamping of exchange events — overwriting the venue’s
TransactTime(FIX tag 60) with our broker’s receive time, destroying cross-venue event ordering. - Seasonality by clock-calendar guessing — assuming Asian-session offsets from UTC hours instead of consulting exchange session metadata or DST-aware
zoneinfolookups.
The pattern across all of these: an LLM skips the authoritative source because it’s “easier” to synthesize a plausible-looking value inline.
Why a vocabulary primer works
Section titled “Why a vocabulary primer works”LLMs have strong priors on idiomatic computer-science and finance vocabulary. Dropping a handful of these terms into CLAUDE.md nudges the model toward the correct pattern without paragraphs of explanation. One concentrated prime sentence can exercise ~40 distinct concepts the model already recognizes from training data.
This page is the canonical reference for which terms to use, what they mean, and one-line primes you can paste into CLAUDE.md files, agent system prompts, or skill docs.
1. Authority & No-Magic-Numbers
Section titled “1. Authority & No-Magic-Numbers”| Term | One-line prime |
|---|---|
| Magic number anti-pattern | Never inline literals; every meaningful value has a symbolic name |
| Named constants / symbolic constants | Extract all numeric/string literals into named constants with provenance comments |
| Single source of truth (SSoT) | Every data element is mastered in exactly one place; all other places reference it |
| System of record (SoR) | The authoritative origin where data is created/maintained for a given entity |
| Golden record | The consolidated, cleansed, authoritative version — what other systems subscribe to |
| Source of reference | The designated lookup canonical — a logical role, not necessarily a single system |
| Master data management (MDM) | Discipline of enforcing match-merge, survivorship rules, and golden records across sources |
| Upstream authority | Always defer to the upstream owner; do not re-derive values the authority already publishes |
Prime sentence: “No magic numbers. Every value either comes from a named constant, an externalized config, or an upstream system of record — never invented in-place.”
2. Sequence & Identity — Prefer Authoritative IDs
Section titled “2. Sequence & Identity — Prefer Authoritative IDs”| Term | One-line prime |
|---|---|
| Monotonic sequence | IDs strictly increase; gaps are acceptable, reversals are not |
| Monotonically increasing identifier | Use the venue’s own sequential ID (e.g., Binance trade ID) as the scan key |
| Aggregated trade ID (aggTrade) | Use exchange-aggregated IDs when individual trades aren’t serializable across feeds |
| Log Sequence Number (LSN) | Per-stream monotonic offset; think Kafka offsets, Postgres WAL LSN, exchange seq num |
| Idempotency key | Client-generated unique per logical operation; dedupes retries at the boundary |
| Deterministic identifier | Derive ID from content, not from time or randomness, so replays reproduce |
| Natural key vs surrogate key | Prefer the venue’s natural key (trade_id) over a locally-generated surrogate |
| CDC primary key + LSN | Change-data-capture pattern: deterministic primary key plus monotonic log sequence |
Prime sentence: “Reference authoritative sequential IDs (trade_id, aggTrade ID, LSN, sequence number) for reconciliation and gap detection; never invent our own counters when the authority publishes one.”
3. Temporal Integrity — Point-in-Time Discipline
Section titled “3. Temporal Integrity — Point-in-Time Discipline”| Term | One-line prime |
|---|---|
| Point-in-time (PIT) data | Every value is what was known at that historical moment, not what’s known now |
| Bitemporal modeling | Track both valid_time (when the event happened) and transaction_time (when we recorded it) |
| As-of query | Every temporal query specifies the as-of time explicitly |
| Effective dating | Records carry effective_from / effective_to ranges; never a single timestamp |
| Look-ahead bias | Using data unavailable at the decision time — the silent killer of backtests |
| Future leak / peeking | Synonyms for look-ahead bias |
| Survivorship bias | Only looking at entities that exist today; missing delistings, failures, bankruptcies |
| Point-in-time integrity | Revisions are recorded alongside originals, not overwriting them |
Prime sentence: “All historical data is point-in-time with bitemporal recording; as-of queries are mandatory; no look-ahead, no survivorship bias, no silent revisions.”
4. Clock & Timestamp Precision
Section titled “4. Clock & Timestamp Precision”| Term | One-line prime |
|---|---|
| PTP / IEEE 1588 | Precision Time Protocol — sub-microsecond sync standard for trading |
| Grandmaster clock | The LAN-level canonical time source all clients discipline against |
| White Rabbit | Sub-nanosecond PTP extension, used at tier-1 trading infra |
| UTC canonicalization | All internal timestamps in UTC; zone conversion only at edges |
| TAI vs UTC | TAI is continuous (no leap seconds); UTC can skip — pick one and document |
| Leap second handling | Explicit policy for leap seconds (smear, step, or ignore) |
| Monotonic clock vs wall clock | Monotonic for intervals, wall for events — never mix |
| ISO 8601 / RFC 3339 | Text timestamps always in ISO 8601 with explicit zone offset |
| Epoch time / Unix time | Acknowledge as a representation; document precision (seconds vs ms vs µs vs ns) |
| Microsecond precision | Internal clock resolution is microseconds; only downsample for display |
| Nanosecond precision | When the venue publishes ns, preserve ns — do not truncate to µs |
Prime sentence: “All internal timestamps are UTC, ISO 8601, microsecond resolution at minimum (preserve nanoseconds if the source publishes them), with PTP discipline where low-latency matters.”
5. Exchange / Venue Time — Don’t Invent
Section titled “5. Exchange / Venue Time — Don’t Invent”| Term | One-line prime |
|---|---|
| Exchange timestamp | The primary timestamp from the trading venue; never re-stamp with local time |
| Venue time | The exchange-authoritative time for event ordering |
| Tag 60 TransactTime (FIX) | The FIX protocol canonical field for the venue’s transaction time |
| Tag 52 SendingTime (FIX) | FIX field for when the message was sent; fallback to 60 |
| Tick-aligned timestamp | Events aligned to the venue’s tick clock, not the broker’s receive clock |
| Event-time vs processing-time | Stream-processing distinction: when it happened vs when we saw it |
Prime sentence: “Exchange / venue timestamps are authoritative for event ordering; broker and local receive times are observability metadata, never the primary key.”
6. Reconciliation, Idempotency, Exactly-Once
Section titled “6. Reconciliation, Idempotency, Exactly-Once”| Term | One-line prime |
|---|---|
| Idempotent consumer | Receiver dedupes by idempotency key; safe to replay any message |
| At-least-once + idempotency = effectively-once | The only production-viable “exactly-once” pattern |
| Exactly-once illusion | True EoS is a myth in distributed systems — name it to avoid it |
| Reconciliation (recon) | Periodic diff between system state and authoritative source; surface gaps |
| Sequence gap detection | Monitor for missing sequence IDs as first-class alerting |
| Replay safety | Any pipeline stage can be rerun on the same input without divergent output |
| Audit trail / audit log | Immutable append-only record of state transitions |
| Tombstone record | Explicit deletion marker, preserving history in event-sourced stores |
Prime sentence: “At-least-once delivery with idempotent consumers, gap-detection on monotonic IDs, reconciliation against the authoritative source — never ‘exactly-once’, that’s a myth.”
7. Identity / Network / Infra — Don’t Self-Identify
Section titled “7. Identity / Network / Infra — Don’t Self-Identify”| Term | One-line prime |
|---|---|
| Authoritative DNS | Don’t parse hostnames; ask the DNS resolver |
| Tailscale MagicDNS | The authoritative source for tailnet hostnames — do not invent or hardcode IPs |
| Canonical hostname | One hostname per machine across the org; zero aliases in code |
| Inventory / service registry | Consult the registry for service addresses; never hardcode IPs or ports |
| Source-attested identity | Identities are attested by an issuer (OIDC, SPIFFE, etc.) — not asserted locally |
Prime sentence: “Network identity (IPs, hostnames, service endpoints) comes from the authoritative registry (Tailscale, DNS, service mesh) — we do not invent or hardcode.”
8. Configuration & Constants
Section titled “8. Configuration & Constants”| Term | One-line prime |
|---|---|
| Externalized configuration | Values that can change at deploy time live in config, not source |
| 12-factor config | Config via environment; strict separation from code |
| Configuration as code | Config is version-controlled, reviewable, and typed — not free-form |
| Feature flag / kill switch | Runtime-toggleable behavior; never hardcoded if DEBUG: |
| Secrets as env / vault refs | Secrets are never in code, config files, or logs |
Prime sentence: “Every runtime-variable knob lives in externalized config (env, vault ref, SSoT config file) — never baked into source, never hardcoded.”
9. Usage / Telemetry — Trust the Authority
Section titled “9. Usage / Telemetry — Trust the Authority”| Term | One-line prime |
|---|---|
| Canonical metric | If the vendor publishes the metric, ingest their number — do not re-derive |
| Upstream telemetry | Token counts, usage percentages, rate-limit headroom come from the source API |
| Best-effort observability | If the vendor offers coarse telemetry (37% instead of 37.09735%), live with the coarseness — reconstructing finer detail is fabrication |
| Don’t derive what you can query | The cheapest/most correct metric is the one the authority exposes; the cost of re-derivation is divergence |
Prime sentence: “When an upstream publishes a metric (Claude usage %, exchange fees, venue latency), we ingest theirs verbatim — even at coarser precision — rather than reconstructing our own estimate.”
How To Use This As Priming Material
Section titled “How To Use This As Priming Material”The one-sentence prime
Section titled “The one-sentence prime”A single CLAUDE.md line that concentrates the most LLM-native signal:
“We practice: SSoT / SoR / golden-record authority; no magic numbers (extract to named constants or config); monotonic sequential IDs (trade_id, aggTrade, LSN) for reconciliation and gap detection; point-in-time bitemporal data (no look-ahead bias, no survivorship bias); UTC ISO-8601 timestamps at microsecond precision minimum (nanosecond where the venue publishes it); PTP/IEEE-1588 clock discipline where latency matters; exchange/venue timestamps are authoritative (FIX tag 60), broker receive time is observability only; at-least-once + idempotency = effectively-once; authoritative registries (Tailscale MagicDNS, service mesh) for identity; externalized configuration for every runtime-variable knob; trust upstream telemetry verbatim rather than re-deriving.”
That single sentence exercises ~40 distinct terms that Claude models immediately lock onto — much more efficient than paragraphs of explanation.
Placement recommendations
Section titled “Placement recommendations”- Repo-level
CLAUDE.md— drop the single-sentence prime near the top of “Core Principles” so every session inherits it. - Skill/agent system prompts — cite the section that matters for the skill (e.g., a reconciliation skill cites §2 and §6).
- Code review checklists — the section headers themselves work as inspection headings.
- PR templates — require contributors to tick boxes from the relevant sections for data-handling changes.
Per-context subsetting
Section titled “Per-context subsetting”Not every project needs all 9 sections. Minimum viable subsets:
| Project type | Sections to prime |
|---|---|
| Market-data ingestion | §1, §2, §3, §4, §5, §6 |
| Backtesting / research | §1, §3, §4 (survivorship/look-ahead focus) |
| Internal tooling / devops | §1, §7, §8 |
| LLM app / AI tooling | §1, §9 (usage telemetry trust) |
| Cross-cutting CLAUDE.md | All 9 (one-sentence prime) |
Sources
Section titled “Sources”Authority, SSoT, and data mastering
Section titled “Authority, SSoT, and data mastering”- Single source of truth — Wikipedia
- System of record — Wikipedia
- Golden record (informatics) — Wikipedia
- Master Data Management — Built In
- Golden Record Management: Single Source of Truth — Centric
Magic numbers and named constants
Section titled “Magic numbers and named constants”- Magic number (programming) — Wikipedia
- Antipatterns: Magic Numbers — Baeldung
- Replace Magic Number with Symbolic Constant — Refactoring.Guru
Point-in-time and bitemporal modeling
Section titled “Point-in-time and bitemporal modeling”- Look-Ahead Bias in Quantitative Finance — Medium/Funny AI & Quant
- Point-in-Time for Alternative Data — Eagle Alpha
- A Standardized Benchmark of Look-ahead Bias in PIT (arXiv)
Clock precision and PTP
Section titled “Clock precision and PTP”- Precision Time Protocol — Wikipedia
- Nanosecond observability for market integrity — Hoptroff
- Timestamp Synchronization (PTP/NTP) — QuestDB
- Accurate Timekeeping in Trading — Safran
Idempotency and exactly-once
Section titled “Idempotency and exactly-once”- On Idempotency Keys — Gunnar Morling
- Idempotency & Ordering in Event-Driven Systems — Cockroach Labs
- Exactly-Once Illusions in Distributed Pipelines — System Overflow
Related internal references
Section titled “Related internal references”- Binance trade ID / aggTrade schema is documented in the Binance API reference
- FIX protocol field definitions: FIXimate tag dictionary
- Tailscale MagicDNS: Tailscale docs — MagicDNS
- ISO 8601 / RFC 3339: RFC 3339 (IETF)
- 12-factor configuration: 12factor.net — Config