Skip to content

AlphaForge – Architecture

Source: Notion | Last edited: 2025-11-21 | ID: 2b22d2dc-3ef...


Subsections:

  • What is AlphaForge?
    • Short description of AlphaForge / QuantOS as an AI-agent-centric, DSL-driven quantitative research and execution OS.
    • Initial scope: crypto futures / perps, mid-frequency, cross-sectional research.
  • Why DSL + Compiler + Orchestrator?
    • Motivation for a declarative DSL instead of ad-hoc scripts.
    • Benefits of a compiler pipeline (canonicalization, fingerprinting, vectorization).
    • Role of the orchestrator in turning plans into reproducible runs.
  • Target Users and Use Cases
    • Personas:
      • Quant Researcher
      • Quant Engineer / System Architect
      • Portfolio Manager / Risk Manager
      • AI Research Agents
    • Focus use cases:
      • Crypto mid-frequency strategies
      • Cross-sectional factor research
      • Multi-venue, multi-asset in the long run

Subsections:

  • Narrative for Overall Context
    • AlphaForge as a platform between humans/agents and external venues/data.
    • Input side: DSL specs, experiment configs, deployment requests, data feeds.
    • Output side: orders to venues, metrics and artifacts to humans and agents.
  • System Context Diagram (Mermaid C4)
    • Mermaid C4 context diagram (Person, System, System_Boundary, System_Ext).
    • Describes: researcher, PM, AI agent, DevOps; AlphaForge Core; data providers; venues; storage; analytics.
  • Personas
    • Quant Researcher
    • Portfolio Manager
    • AI Research Agent
    • DevOps / SRE
  • External Systems
    • Exchanges & trading venues
    • Data vendors & on-chain sources
    • Object storage and data warehouse
    • Analytics / BI tools and notebooks

Subsections:

  • High-Level Container List
    • Each container as its own bullet/toggle:
      • API and Gateway
      • DSL and Compiler Service
      • Orchestrator Service
      • Data and Feature Service
      • Execution Gateway
      • Plugin and Capability Pack Registry
      • Experiment and Metric Store
      • Monitoring and Observability Stack
      • Identity and Access Management (IAM)
    • For each: 2–4 lines on scope and responsibilities.
  • Container Diagram (Mermaid)
    • Mermaid flowchart diagram showing:
      • Client side (Researcher, AI Agent, PM)
      • AlphaForge containers
      • External systems (market data, venues, storage)
    • Edges: how requests/data flow between containers and externals.

Subsections:

  • DSL and Compiler Internals
    • DSL Parser, IR Builder
    • Canonicalizer, Fingerprinter, Vectorizer
    • Plugin Resolver
    • Internal flow diagram (Mermaid) from DSL spec → canonical IR + embeddings + plugins.
  • Orchestrator Internals
    • Run Scheduler
    • DAG Executor
    • Run State Manager
    • Event Bus / Queue Adapter
    • Policy Engine
  • Data and Feature Service Internals
    • Ingestion Pipelines
    • Storage Layout Manager
    • Query Planner
    • Feature Generator
    • Caching Layer
  • Execution Gateway Internals
    • Order Router
    • Venue Adapters
    • Risk and Pre-Trade Checks
    • State Synchronizer
  • Experiment and Metric Store Internals
    • Experiment Registry
    • Run Log Store
    • Metrics Engine
    • Similarity and Novelty Engine

Subsections:

  • Storage Choices
    • ClickHouse for structured time-series / factor panels.
    • Object storage for raw, large, or unstructured data.
    • Caches (e.g. Redis / in-memory) for hot datasets and panels.
  • Core Schemas
    • OHLCV schema (per venue / instrument).
    • Trades and order book schemas (if relevant).
    • Factor panels (cross-sectional and time-series).
    • Experiment logs and metrics storage design.
  • Data Lifecycle
    • Ingest → Normalize → Serve → Archive:
      • Ingestion from exchanges, vendors, on-chain.
      • Normalization / schema enforcement.
      • Serving to backtests and live runs via Data & Feature Service.
      • Archival and retention policies.

Subsections:

  • Execution Gateway Design
    • Abstract order and portfolio model.
    • Separation between strategy logic and venue-specific details.
  • Integration with External Engines
    • Integration pattern with engines like Nautilus.
    • Signal-based vs. “code runs inside engine” models.
    • Handling of latency and synchronization.
  • Risk and Safety Controls
    • Pre-trade checks and limits.
    • Kill switches / circuit breakers.
    • Guardrails for agent-controlled and automated runs.

Subsections:

  • Concept and Motivation
    • Why capabilities live in separate packs.
    • Separation of core vs. domain-specific logic.
  • Examples of Capability Packs
    • Mid-frequency crypto pack.
    • HFT pack (order-book-level, low-latency).
    • On-chain pack (events, DeFi, protocols).
  • Plugin Registration Flow
    • How packs register their plugins (data, features, signals, execution adapters).
    • How the registry exposes these plugins to:
      • DSL and Compiler
      • Orchestrator
    • Versioning and compatibility considerations.

Subsections:

  • How AI Agents Interact with DSL and APIs
    • Agents as first-class users of the DSL and compiler APIs.
    • Patterns for agents proposing and modifying experiments.
  • Searching and Ranking Experiments
    • Using fingerprints and embeddings to search prior runs.
    • Ranking experiments by performance, novelty, diversity.
  • Guardrails and Policies for Agent-Driven Changes
    • Approval flows (human-in-the-loop).
    • Policy Engine integration (limits, whitelists/blacklists).
    • Logging and auditability for agent actions.

Subsections:

  • MVP Scope and Milestones
    • What is included in the first working version.
    • Milestone list: DSL v1, backtest pipeline, minimal execution integration, etc.
  • Near-Term Technical Priorities
    • Stabilizing DSL and IR.
    • Data ingestion and normalization robustness.
    • First execution engine integration.
    • Observability and basic IAM.
  • Open Questions for External Architects
    • Topics where feedback is explicitly requested:
      • Better IR / DAG design patterns.
      • Data layout and partitioning trade-offs.
      • Multi-tenant scaling and security model.
      • Agent orchestration and safety.

Subsections:

  • Glossary
    • IR (Intermediate Representation)
    • DAG (Directed Acyclic Graph)
    • Fingerprint
    • Capability pack
    • Orchestrator, Gateway, Registry, etc.
  • Naming Conventions
    • Repo and package naming (e.g. alphaforge-core, alphaforge-cap-midfreq).
    • DSL file naming and directory layout.
    • Config file conventions (YAML/JSON structures).
  • Directory Layout and Configuration Patterns
    • Suggested repo structure: core vs capability packs.
    • Where DSL specs, configs, and experiment definitions live.
    • Patterns for environment-specific configuration (dev / staging / prod).