Execution Layer

Source: Notion | Last edited: 2025-12-03 | ID: 29b2d2dc-3ef...

Execution Layer / Engine — Independent Architecture Design

1) Goals & Design Principles

Unified execution control — translate “Intents” (from Agents/Runner) into real-world actions on exchanges, brokers, or simulators.
Multi-asset compatibility — support crypto, equities, futures, options, FX.
Policy-governed safety — strong pre-trade risk control, compliance, and audit.
Deterministic and auditable — every intent and fill traceable across time, venues, and policies.
Modular & async-first — components can run distributedly; no global lock dependencies.
Separation of intent, risk, routing, and gateway logic — allows for incremental evolution.

2) High-Level Architecture

flowchart LR
  subgraph Up["Upstream Inputs"]
    SIG["Signals / Intents<br/>(from Runner / AI Agents)"]
    POL["Policies<br/>(risk limits • compliance • sizing rules)"]
  end

  subgraph EXE["Execution Layer / Engine"]
    INT["Intent Normalizer<br/>(schema • validation)"]
    RSK["Pre-Trade Risk Engine<br/>(limits • exposure • sanity checks)"]
    OMS["Order Management System<br/>(state • idempotency • retry)"]
    RTE["Smart Router<br/>(venue selection • order type • latency routing)"]
    GWS["Execution Gateways<br/>(Binance • OKX • IBKR • FIX)"]
    SIM["Simulation Gateway<br/>(paper / shadow trading)"]
    CMP["Compliance Filter<br/>(region • whitelist • throttle)"]
    AUD["Audit & Journal<br/>(immutable order/fill logs)"]
  end

  subgraph ST["State & Reporting"]
    POS["Positions / Holdings"]
    RPT["PnL / Risk Reports"]
  end

  SIG --> INT --> RSK --> OMS --> RTE --> GWS
  RTE --> SIM
  RSK --> CMP
  OMS --> AUD
  GWS --> AUD
  GWS --> POS
  POS --> RPT
  POL --> RSK
  POL --> CMP

3) Core Concepts & Entities

3.1 Intent

A normalized, exchange-agnostic action derived from a strategy or agent.

{
  "intent_id": "uuid-2025-10-29-01",
  "ts": "2025-10-29T18:45:00Z",
  "instrument_id": "BINANCE:BTCUSDT",
  "side": "BUY",
  "size": {"type": "notional", "value": 10000, "ccy": "USDT"},
  "constraints": {"tif": "IOC", "max_slippage_bps": 5},
  "labels": {"strategy": "xs_bilstm_v1", "run": "20251029a"},
  "env": "staging"
}

3.2 Order

Concrete order derived from intent after routing and transformation.

{
  "order_id": "binance-123456",
  "intent_id": "uuid-2025-10-29-01",
  "venue": "BINANCE",
  "symbol": "BTCUSDT",
  "side": "BUY",
  "price": 68000.5,
  "qty": 0.147,
  "status": "NEW",
  "ts_sent": "2025-10-29T18:45:01Z"
}

3.3 Execution Report

Returned by gateways for order lifecycle and fills.

{
  "order_id": "binance-123456",
  "status": "PARTIALLY_FILLED",
  "avg_px": 68001.0,
  "filled_qty": 0.09,
  "remain_qty": 0.057,
  "fees": {"amount": 1.2, "ccy": "USDT"},
  "ts_last": "2025-10-29T18:45:02Z"
}

4) Subsystem Breakdown

4.1 Intent Normalizer

Validates and normalizes upstream signals into unified schema (as above).
Enforces required fields and strategy whitelists.
Publishes validated intents to exec.intents Kafka topic.

4.2 Risk Engine

Pre-trade checks:
- Notional/position limits, leverage caps, drawdown guards.
- Exposure by asset, account, counterparty.
Dynamic limits:
- Real-time updates from risk policies or orchestration layer.
In-memory cache:
- Redis or Aerospike (sub-ms access).
Outcome:
- allow / reject / revise(size) / route(sim_only) decisions.

4.3 OMS (Order Management System)

Tracks intent → order mapping and lifecycle.
Ensures idempotency (safe re-submissions).
Handles cancel/retry/replace logic.
Stores state in Redis (active) and ClickHouse (history).
Emits audit events (Kafka exec.audit).

4.4 Smart Router

Chooses optimal venue, order type, execution mode:
- Weighted cost model: expected_fill_price + fee + latency_penalty
- Optional ML router using market microstructure signals.
Supports split routing for multi-venue execution.
Publishes orders to gateway-specific queues.

4.5 Gateways (Execution Adapters)

Crypto: WebSocket / REST (Binance, OKX, Bybit).
Equities/Futures: FIX, IBKR API, Interactive Brokers REST.
Stateless microservices, each handling:
- API auth
- Rate limiting
- Heartbeat + connectivity monitoring
Abstracted through standard API:

submit_order(order) → order_id
cancel(order_id)
stream_fills()

4.6 Simulation Gateway

Mirrors production flow but connects to mock market or historical replay.
Uses same gateway schema to ensure deterministic testing.
Optionally connected to orchestration jobs for stress-test pipelines.

4.7 Compliance & Throttle

Region-based blocking, restricted instrument sets.
Velocity controls (orders/sec per strategy).
Configurable throttles per exchange adapter.
Governed via OPA (Open Policy Agent) rules.

4.8 Audit Journal

Immutable ClickHouse tables with schema:

intent_id, order_id, event_type, venue, ts_event, payload(JSON)

Indexed by intent_id and ts_event.
Lineage links to upstream DAG submission + Orchestration logs.

5) State & Reporting

PnL and exposure metrics are pushed periodically to:

Prometheus (for dashboards)
ClickHouse materialized views (for analytical queries)

6) Data Flow

Runner / Agent emits intent → exec.intents topic.
Intent Normalizer validates & standardizes → forward to Risk Engine.
Risk Engine checks exposure → passes to OMS.
OMS creates new order, routes to Router.
Router selects venue & order type → sends to Gateway queue.
Gateway transmits to exchange or broker API.
Execution reports flow back → OMS → update order/position.
Audit events streamed to ClickHouse.
PnL snapshots pushed to metrics and reporting pipelines.

7) Technology Choices

8) Security & Compliance

API Key Management: Vault handles all exchange/broker credentials; no hard-coded secrets.
Segregation of environments: staging / sim / prod separated by namespace and Vault policy.
Pre-trade controls: enforced at OMS level, rejecting intents violating margin or velocity rules.
Audit immutability: ClickHouse tables append-only, verified daily checksum to S3.
Policy enforcement: OPA checks for:
- Authorized strategy ID
- Position/asset caps
- Venue availability window
- Geolocation restrictions

9) Observability

Metrics: Prometheus exporters (intent throughput, latency, fill ratio).
Logs: Loki centralized logging.
Tracing: OpenTelemetry; parent span = intent_id, children = risk, order, router, gateway.
Dashboards: Grafana panels (PnL, exposure, latency histograms).
Alerting: Alertmanager for:
- Venue disconnects
- Risk rejections > threshold
- Latency spikes > SLO
- Fill ratio below baseline

10) Terraform-Style Resource Layout

infra/
├─ modules/
│  ├─ exec-intent-bus/           # Redpanda topics + ACL
│  ├─ exec-risk-engine/          # Helm chart for risk service
│  ├─ exec-oms/                  # Order manager deployment
│  ├─ exec-router/               # Smart router microservice
│  ├─ exec-gateways/             # Exchange adapters (Binance, OKX, IBKR, FIX)
│  ├─ exec-sim-gateway/          # Simulation adapter
│  ├─ vault/                     # Secrets mgmt
│  ├─ opa/                       # Policy enforcement
│  ├─ clickhouse-audit/          # Audit log storage
│  ├─ redis-cache/               # Risk + order state cache
│  ├─ observability/             # Prom + Graf + Loki + Tempo
│  └─ policy-pipelines/          # Conftest jobs for config validation
└─ envs/{dev,prod}/main.tf

Example: OMS Deployment

apiVersion: apps/v1
kind: Deployment
metadata: { name: exec-oms, namespace: exec }
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: oms
          image: ghcr.io/archetype/exec-oms:latest
          ports: [{ containerPort: 8080 }]
          env:
            - name: REDIS_URL
              value: redis://redis.exec.svc:6379
            - name: CLICKHOUSE_DSN
              valueFrom: { secretKeyRef: { name: ch-secrets, key: dsn } }
            - name: VAULT_ADDR
              value: https://vault.svc.cluster.local

11) Integration with Other Layers

12) Recommended MVP Configuration

Intent Bus: Redpanda
Risk + OMS: Python microservices + Redis
Audit Store: ClickHouse
Router: Python asyncio service (configurable venue weights)
Gateways: Binance + OKX (crypto), Simulation (paper)
Compliance: OPA (policy service)
Secrets: Vault + External Secrets
Monitoring: Prometheus + Loki + Tempo Why optimal:
Matches existing Data/Orchestration stack; minimal new dependencies.
Fully open-source; ready for hybrid cloud deployment.
Agent-friendly: clear Intent → Report APIs.
Easy to evolve toward high-frequency or multi-venue scenarios.