AlphaForge – Architecture

Source: Notion | Last edited: 2025-11-21 | ID: 2b22d2dc-3ef...

1. Vision and Problem Statement

Subsections:

What is AlphaForge?
- Short description of AlphaForge / QuantOS as an AI-agent-centric, DSL-driven quantitative research and execution OS.
- Initial scope: crypto futures / perps, mid-frequency, cross-sectional research.
Why DSL + Compiler + Orchestrator?
- Motivation for a declarative DSL instead of ad-hoc scripts.
- Benefits of a compiler pipeline (canonicalization, fingerprinting, vectorization).
- Role of the orchestrator in turning plans into reproducible runs.
Target Users and Use Cases
- Personas:
  - Quant Researcher
  - Quant Engineer / System Architect
  - Portfolio Manager / Risk Manager
  - AI Research Agents
- Focus use cases:
  - Crypto mid-frequency strategies
  - Cross-sectional factor research
  - Multi-venue, multi-asset in the long run

Subsections:

Narrative for Overall Context
- AlphaForge as a platform between humans/agents and external venues/data.
- Input side: DSL specs, experiment configs, deployment requests, data feeds.
- Output side: orders to venues, metrics and artifacts to humans and agents.
System Context Diagram (Mermaid C4)
- Mermaid C4 context diagram (Person, System, System_Boundary, System_Ext).
- Describes: researcher, PM, AI agent, DevOps; AlphaForge Core; data providers; venues; storage; analytics.
Personas
- Quant Researcher
- Portfolio Manager
- AI Research Agent
- DevOps / SRE
External Systems
- Exchanges & trading venues
- Data vendors & on-chain sources
- Object storage and data warehouse
- Analytics / BI tools and notebooks

Subsections:

High-Level Container List
- Each container as its own bullet/toggle:
  - API and Gateway
  - DSL and Compiler Service
  - Orchestrator Service
  - Data and Feature Service
  - Execution Gateway
  - Plugin and Capability Pack Registry
  - Experiment and Metric Store
  - Monitoring and Observability Stack
  - Identity and Access Management (IAM)
- For each: 2–4 lines on scope and responsibilities.
Container Diagram (Mermaid)
- Mermaid flowchart diagram showing:
  - Client side (Researcher, AI Agent, PM)
  - AlphaForge containers
  - External systems (market data, venues, storage)
- Edges: how requests/data flow between containers and externals.

Subsections:

DSL and Compiler Internals
- DSL Parser, IR Builder
- Canonicalizer, Fingerprinter, Vectorizer
- Plugin Resolver
- Internal flow diagram (Mermaid) from DSL spec → canonical IR + embeddings + plugins.
Orchestrator Internals
- Run Scheduler
- DAG Executor
- Run State Manager
- Event Bus / Queue Adapter
- Policy Engine
Data and Feature Service Internals
- Ingestion Pipelines
- Storage Layout Manager
- Query Planner
- Feature Generator
- Caching Layer
Execution Gateway Internals
- Order Router
- Venue Adapters
- Risk and Pre-Trade Checks
- State Synchronizer
Experiment and Metric Store Internals
- Experiment Registry
- Run Log Store
- Metrics Engine
- Similarity and Novelty Engine

Subsections:

Storage Choices
- ClickHouse for structured time-series / factor panels.
- Object storage for raw, large, or unstructured data.
- Caches (e.g. Redis / in-memory) for hot datasets and panels.
Core Schemas
- OHLCV schema (per venue / instrument).
- Trades and order book schemas (if relevant).
- Factor panels (cross-sectional and time-series).
- Experiment logs and metrics storage design.
Data Lifecycle
- Ingest → Normalize → Serve → Archive:
  - Ingestion from exchanges, vendors, on-chain.
  - Normalization / schema enforcement.
  - Serving to backtests and live runs via Data & Feature Service.
  - Archival and retention policies.

Subsections:

Execution Gateway Design
- Abstract order and portfolio model.
- Separation between strategy logic and venue-specific details.
Integration with External Engines
- Integration pattern with engines like Nautilus.
- Signal-based vs. “code runs inside engine” models.
- Handling of latency and synchronization.
Risk and Safety Controls
- Pre-trade checks and limits.
- Kill switches / circuit breakers.
- Guardrails for agent-controlled and automated runs.

Subsections:

Concept and Motivation
- Why capabilities live in separate packs.
- Separation of core vs. domain-specific logic.
Examples of Capability Packs
- Mid-frequency crypto pack.
- HFT pack (order-book-level, low-latency).
- On-chain pack (events, DeFi, protocols).
Plugin Registration Flow
- How packs register their plugins (data, features, signals, execution adapters).
- How the registry exposes these plugins to:
  - DSL and Compiler
  - Orchestrator
- Versioning and compatibility considerations.

Subsections:

How AI Agents Interact with DSL and APIs
- Agents as first-class users of the DSL and compiler APIs.
- Patterns for agents proposing and modifying experiments.
Searching and Ranking Experiments
- Using fingerprints and embeddings to search prior runs.
- Ranking experiments by performance, novelty, diversity.
Guardrails and Policies for Agent-Driven Changes
- Approval flows (human-in-the-loop).
- Policy Engine integration (limits, whitelists/blacklists).
- Logging and auditability for agent actions.

Subsections:

MVP Scope and Milestones
- What is included in the first working version.
- Milestone list: DSL v1, backtest pipeline, minimal execution integration, etc.
Near-Term Technical Priorities
- Stabilizing DSL and IR.
- Data ingestion and normalization robustness.
- First execution engine integration.
- Observability and basic IAM.
Open Questions for External Architects
- Topics where feedback is explicitly requested:
  - Better IR / DAG design patterns.
  - Data layout and partitioning trade-offs.
  - Multi-tenant scaling and security model.
  - Agent orchestration and safety.

Subsections:

Glossary
- IR (Intermediate Representation)
- DAG (Directed Acyclic Graph)
- Fingerprint
- Capability pack
- Orchestrator, Gateway, Registry, etc.
Naming Conventions
- Repo and package naming (e.g. alphaforge-core, alphaforge-cap-midfreq).
- DSL file naming and directory layout.
- Config file conventions (YAML/JSON structures).
Directory Layout and Configuration Patterns
- Suggested repo structure: core vs capability packs.
- Where DSL specs, configs, and experiment definitions live.
- Patterns for environment-specific configuration (dev / staging / prod).