LLM Priming Vocabulary for Authoritative-Source Engineering

Motivation

Coding assistants — Claude, Cursor, GPT, Copilot — often invent values when authoritative sources exist. Examples of inventions we want to prevent:

Fabricated sequence counters — generating a local auto-increment when Binance already publishes a monotonic trade_id and aggTrade ID that any two parties can reconcile against byte-for-byte.
Re-derived usage metrics — computing our own Claude subscription usage % when Anthropic’s own dashboard/API publishes the canonical number, even if coarser (e.g., 37% vs 37.09735% — coarseness is better than fabricated precision).
Hardcoded IPs / hostnames — inventing aliases when Tailscale MagicDNS or service registries are the authority.
Magic numbers — inline literals for timeouts, windows, thresholds, session boundaries, with no provenance or named constant.
Local re-stamping of exchange events — overwriting the venue’s TransactTime (FIX tag 60) with our broker’s receive time, destroying cross-venue event ordering.
Seasonality by clock-calendar guessing — assuming Asian-session offsets from UTC hours instead of consulting exchange session metadata or DST-aware zoneinfo lookups.

The pattern across all of these: an LLM skips the authoritative source because it’s “easier” to synthesize a plausible-looking value inline.

Why a vocabulary primer works

LLMs have strong priors on idiomatic computer-science and finance vocabulary. Dropping a handful of these terms into CLAUDE.md nudges the model toward the correct pattern without paragraphs of explanation. One concentrated prime sentence can exercise ~40 distinct concepts the model already recognizes from training data.

This page is the canonical reference for which terms to use, what they mean, and one-line primes you can paste into CLAUDE.md files, agent system prompts, or skill docs.

1. Authority & No-Magic-Numbers

Term	One-line prime
Magic number anti-pattern	Never inline literals; every meaningful value has a symbolic name
Named constants / symbolic constants	Extract all numeric/string literals into named constants with provenance comments
Single source of truth (SSoT)	Every data element is mastered in exactly one place; all other places reference it
System of record (SoR)	The authoritative origin where data is created/maintained for a given entity
Golden record	The consolidated, cleansed, authoritative version — what other systems subscribe to
Source of reference	The designated lookup canonical — a logical role, not necessarily a single system
Master data management (MDM)	Discipline of enforcing match-merge, survivorship rules, and golden records across sources
Upstream authority	Always defer to the upstream owner; do not re-derive values the authority already publishes

Prime sentence: “No magic numbers. Every value either comes from a named constant, an externalized config, or an upstream system of record — never invented in-place.”

2. Sequence & Identity — Prefer Authoritative IDs

Term	One-line prime
Monotonic sequence	IDs strictly increase; gaps are acceptable, reversals are not
Monotonically increasing identifier	Use the venue’s own sequential ID (e.g., Binance trade ID) as the scan key
Aggregated trade ID (aggTrade)	Use exchange-aggregated IDs when individual trades aren’t serializable across feeds
Log Sequence Number (LSN)	Per-stream monotonic offset; think Kafka offsets, Postgres WAL LSN, exchange seq num
Idempotency key	Client-generated unique per logical operation; dedupes retries at the boundary
Deterministic identifier	Derive ID from content, not from time or randomness, so replays reproduce
Natural key vs surrogate key	Prefer the venue’s natural key (trade_id) over a locally-generated surrogate
CDC primary key + LSN	Change-data-capture pattern: deterministic primary key plus monotonic log sequence

Prime sentence: “Reference authoritative sequential IDs (trade_id, aggTrade ID, LSN, sequence number) for reconciliation and gap detection; never invent our own counters when the authority publishes one.”

3. Temporal Integrity — Point-in-Time Discipline

Term	One-line prime
Point-in-time (PIT) data	Every value is what was known at that historical moment, not what’s known now
Bitemporal modeling	Track both `valid_time` (when the event happened) and `transaction_time` (when we recorded it)
As-of query	Every temporal query specifies the as-of time explicitly
Effective dating	Records carry `effective_from` / `effective_to` ranges; never a single timestamp
Look-ahead bias	Using data unavailable at the decision time — the silent killer of backtests
Future leak / peeking	Synonyms for look-ahead bias
Survivorship bias	Only looking at entities that exist today; missing delistings, failures, bankruptcies
Point-in-time integrity	Revisions are recorded alongside originals, not overwriting them

Prime sentence: “All historical data is point-in-time with bitemporal recording; as-of queries are mandatory; no look-ahead, no survivorship bias, no silent revisions.”

4. Clock & Timestamp Precision

Term	One-line prime
PTP / IEEE 1588	Precision Time Protocol — sub-microsecond sync standard for trading
Grandmaster clock	The LAN-level canonical time source all clients discipline against
White Rabbit	Sub-nanosecond PTP extension, used at tier-1 trading infra
UTC canonicalization	All internal timestamps in UTC; zone conversion only at edges
TAI vs UTC	TAI is continuous (no leap seconds); UTC can skip — pick one and document
Leap second handling	Explicit policy for leap seconds (smear, step, or ignore)
Monotonic clock vs wall clock	Monotonic for intervals, wall for events — never mix
ISO 8601 / RFC 3339	Text timestamps always in ISO 8601 with explicit zone offset
Epoch time / Unix time	Acknowledge as a representation; document precision (seconds vs ms vs µs vs ns)
Microsecond precision	Internal clock resolution is microseconds; only downsample for display
Nanosecond precision	When the venue publishes ns, preserve ns — do not truncate to µs

Prime sentence: “All internal timestamps are UTC, ISO 8601, microsecond resolution at minimum (preserve nanoseconds if the source publishes them), with PTP discipline where low-latency matters.”

5. Exchange / Venue Time — Don’t Invent

Term	One-line prime
Exchange timestamp	The primary timestamp from the trading venue; never re-stamp with local time
Venue time	The exchange-authoritative time for event ordering
Tag 60 TransactTime (FIX)	The FIX protocol canonical field for the venue’s transaction time
Tag 52 SendingTime (FIX)	FIX field for when the message was sent; fallback to 60
Tick-aligned timestamp	Events aligned to the venue’s tick clock, not the broker’s receive clock
Event-time vs processing-time	Stream-processing distinction: when it happened vs when we saw it

Prime sentence: “Exchange / venue timestamps are authoritative for event ordering; broker and local receive times are observability metadata, never the primary key.”

6. Reconciliation, Idempotency, Exactly-Once

Term	One-line prime
Idempotent consumer	Receiver dedupes by idempotency key; safe to replay any message
At-least-once + idempotency = effectively-once	The only production-viable “exactly-once” pattern
Exactly-once illusion	True EoS is a myth in distributed systems — name it to avoid it
Reconciliation (recon)	Periodic diff between system state and authoritative source; surface gaps
Sequence gap detection	Monitor for missing sequence IDs as first-class alerting
Replay safety	Any pipeline stage can be rerun on the same input without divergent output
Audit trail / audit log	Immutable append-only record of state transitions
Tombstone record	Explicit deletion marker, preserving history in event-sourced stores

Prime sentence: “At-least-once delivery with idempotent consumers, gap-detection on monotonic IDs, reconciliation against the authoritative source — never ‘exactly-once’, that’s a myth.”

7. Identity / Network / Infra — Don’t Self-Identify

Term	One-line prime
Authoritative DNS	Don’t parse hostnames; ask the DNS resolver
Tailscale MagicDNS	The authoritative source for tailnet hostnames — do not invent or hardcode IPs
Canonical hostname	One hostname per machine across the org; zero aliases in code
Inventory / service registry	Consult the registry for service addresses; never hardcode IPs or ports
Source-attested identity	Identities are attested by an issuer (OIDC, SPIFFE, etc.) — not asserted locally

Prime sentence: “Network identity (IPs, hostnames, service endpoints) comes from the authoritative registry (Tailscale, DNS, service mesh) — we do not invent or hardcode.”

8. Configuration & Constants

Term	One-line prime
Externalized configuration	Values that can change at deploy time live in config, not source
12-factor config	Config via environment; strict separation from code
Configuration as code	Config is version-controlled, reviewable, and typed — not free-form
Feature flag / kill switch	Runtime-toggleable behavior; never hardcoded `if DEBUG:`
Secrets as env / vault refs	Secrets are never in code, config files, or logs

Prime sentence: “Every runtime-variable knob lives in externalized config (env, vault ref, SSoT config file) — never baked into source, never hardcoded.”

9. Usage / Telemetry — Trust the Authority

Term	One-line prime
Canonical metric	If the vendor publishes the metric, ingest their number — do not re-derive
Upstream telemetry	Token counts, usage percentages, rate-limit headroom come from the source API
Best-effort observability	If the vendor offers coarse telemetry (37% instead of 37.09735%), live with the coarseness — reconstructing finer detail is fabrication
Don’t derive what you can query	The cheapest/most correct metric is the one the authority exposes; the cost of re-derivation is divergence

Prime sentence: “When an upstream publishes a metric (Claude usage %, exchange fees, venue latency), we ingest theirs verbatim — even at coarser precision — rather than reconstructing our own estimate.”

How To Use This As Priming Material

The one-sentence prime

A single CLAUDE.md line that concentrates the most LLM-native signal:

“We practice: SSoT / SoR / golden-record authority; no magic numbers (extract to named constants or config); monotonic sequential IDs (trade_id, aggTrade, LSN) for reconciliation and gap detection; point-in-time bitemporal data (no look-ahead bias, no survivorship bias); UTC ISO-8601 timestamps at microsecond precision minimum (nanosecond where the venue publishes it); PTP/IEEE-1588 clock discipline where latency matters; exchange/venue timestamps are authoritative (FIX tag 60), broker receive time is observability only; at-least-once + idempotency = effectively-once; authoritative registries (Tailscale MagicDNS, service mesh) for identity; externalized configuration for every runtime-variable knob; trust upstream telemetry verbatim rather than re-deriving.”

That single sentence exercises ~40 distinct terms that Claude models immediately lock onto — much more efficient than paragraphs of explanation.

Placement recommendations

Repo-level CLAUDE.md — drop the single-sentence prime near the top of “Core Principles” so every session inherits it.
Skill/agent system prompts — cite the section that matters for the skill (e.g., a reconciliation skill cites §2 and §6).
Code review checklists — the section headers themselves work as inspection headings.
PR templates — require contributors to tick boxes from the relevant sections for data-handling changes.

Per-context subsetting

Not every project needs all 9 sections. Minimum viable subsets:

Project type	Sections to prime
Market-data ingestion	§1, §2, §3, §4, §5, §6
Backtesting / research	§1, §3, §4 (survivorship/look-ahead focus)
Internal tooling / devops	§1, §7, §8
LLM app / AI tooling	§1, §9 (usage telemetry trust)
Cross-cutting CLAUDE.md	All 9 (one-sentence prime)

Sources

Authority, SSoT, and data mastering

Magic numbers and named constants

Point-in-time and bitemporal modeling

Clock precision and PTP

Idempotency and exactly-once

Binance trade ID / aggTrade schema is documented in the Binance API reference
FIX protocol field definitions: FIXimate tag dictionary
Tailscale MagicDNS: Tailscale docs — MagicDNS
ISO 8601 / RFC 3339: RFC 3339 (IETF)
12-factor configuration: 12factor.net — Config