AlphaForge – Architecture
Source: Notion | Last edited: 2025-11-21 | ID: 2b22d2dc-3ef...
1. Vision and Problem Statement
Section titled “1. Vision and Problem Statement”Subsections:
- What is AlphaForge?
- Short description of AlphaForge / QuantOS as an AI-agent-centric, DSL-driven quantitative research and execution OS.
- Initial scope: crypto futures / perps, mid-frequency, cross-sectional research.
- Why DSL + Compiler + Orchestrator?
- Motivation for a declarative DSL instead of ad-hoc scripts.
- Benefits of a compiler pipeline (canonicalization, fingerprinting, vectorization).
- Role of the orchestrator in turning plans into reproducible runs.
- Target Users and Use Cases
- Personas:
- Quant Researcher
- Quant Engineer / System Architect
- Portfolio Manager / Risk Manager
- AI Research Agents
- Focus use cases:
- Crypto mid-frequency strategies
- Cross-sectional factor research
- Multi-venue, multi-asset in the long run
- Personas:
2. System Context (C1)
Section titled “2. System Context (C1)”Subsections:
- Narrative for Overall Context
- AlphaForge as a platform between humans/agents and external venues/data.
- Input side: DSL specs, experiment configs, deployment requests, data feeds.
- Output side: orders to venues, metrics and artifacts to humans and agents.
- System Context Diagram (Mermaid C4)
- Mermaid C4 context diagram (Person, System, System_Boundary, System_Ext).
- Describes: researcher, PM, AI agent, DevOps; AlphaForge Core; data providers; venues; storage; analytics.
- Personas
- Quant Researcher
- Portfolio Manager
- AI Research Agent
- DevOps / SRE
- External Systems
- Exchanges & trading venues
- Data vendors & on-chain sources
- Object storage and data warehouse
- Analytics / BI tools and notebooks
3. Container View (C2)
Section titled “3. Container View (C2)”Subsections:
- High-Level Container List
- Each container as its own bullet/toggle:
- API and Gateway
- DSL and Compiler Service
- Orchestrator Service
- Data and Feature Service
- Execution Gateway
- Plugin and Capability Pack Registry
- Experiment and Metric Store
- Monitoring and Observability Stack
- Identity and Access Management (IAM)
- For each: 2–4 lines on scope and responsibilities.
- Each container as its own bullet/toggle:
- Container Diagram (Mermaid)
- Mermaid flowchart diagram showing:
- Client side (Researcher, AI Agent, PM)
- AlphaForge containers
- External systems (market data, venues, storage)
- Edges: how requests/data flow between containers and externals.
- Mermaid flowchart diagram showing:
4. Core Components (C3)
Section titled “4. Core Components (C3)”Subsections:
- DSL and Compiler Internals
- DSL Parser, IR Builder
- Canonicalizer, Fingerprinter, Vectorizer
- Plugin Resolver
- Internal flow diagram (Mermaid) from DSL spec → canonical IR + embeddings + plugins.
- Orchestrator Internals
- Run Scheduler
- DAG Executor
- Run State Manager
- Event Bus / Queue Adapter
- Policy Engine
- Data and Feature Service Internals
- Ingestion Pipelines
- Storage Layout Manager
- Query Planner
- Feature Generator
- Caching Layer
- Execution Gateway Internals
- Order Router
- Venue Adapters
- Risk and Pre-Trade Checks
- State Synchronizer
- Experiment and Metric Store Internals
- Experiment Registry
- Run Log Store
- Metrics Engine
- Similarity and Novelty Engine
5. Data Architecture
Section titled “5. Data Architecture”Subsections:
- Storage Choices
- ClickHouse for structured time-series / factor panels.
- Object storage for raw, large, or unstructured data.
- Caches (e.g. Redis / in-memory) for hot datasets and panels.
- Core Schemas
- OHLCV schema (per venue / instrument).
- Trades and order book schemas (if relevant).
- Factor panels (cross-sectional and time-series).
- Experiment logs and metrics storage design.
- Data Lifecycle
- Ingest → Normalize → Serve → Archive:
- Ingestion from exchanges, vendors, on-chain.
- Normalization / schema enforcement.
- Serving to backtests and live runs via Data & Feature Service.
- Archival and retention policies.
- Ingest → Normalize → Serve → Archive:
6. Execution Architecture
Section titled “6. Execution Architecture”Subsections:
- Execution Gateway Design
- Abstract order and portfolio model.
- Separation between strategy logic and venue-specific details.
- Integration with External Engines
- Integration pattern with engines like Nautilus.
- Signal-based vs. “code runs inside engine” models.
- Handling of latency and synchronization.
- Risk and Safety Controls
- Pre-trade checks and limits.
- Kill switches / circuit breakers.
- Guardrails for agent-controlled and automated runs.
7. Plugin and Capability Packs
Section titled “7. Plugin and Capability Packs”Subsections:
- Concept and Motivation
- Why capabilities live in separate packs.
- Separation of core vs. domain-specific logic.
- Examples of Capability Packs
- Mid-frequency crypto pack.
- HFT pack (order-book-level, low-latency).
- On-chain pack (events, DeFi, protocols).
- Plugin Registration Flow
- How packs register their plugins (data, features, signals, execution adapters).
- How the registry exposes these plugins to:
- DSL and Compiler
- Orchestrator
- Versioning and compatibility considerations.
8. Agent-Centric Workflows
Section titled “8. Agent-Centric Workflows”Subsections:
- How AI Agents Interact with DSL and APIs
- Agents as first-class users of the DSL and compiler APIs.
- Patterns for agents proposing and modifying experiments.
- Searching and Ranking Experiments
- Using fingerprints and embeddings to search prior runs.
- Ranking experiments by performance, novelty, diversity.
- Guardrails and Policies for Agent-Driven Changes
- Approval flows (human-in-the-loop).
- Policy Engine integration (limits, whitelists/blacklists).
- Logging and auditability for agent actions.
9. Roadmap and Open Questions
Section titled “9. Roadmap and Open Questions”Subsections:
- MVP Scope and Milestones
- What is included in the first working version.
- Milestone list: DSL v1, backtest pipeline, minimal execution integration, etc.
- Near-Term Technical Priorities
- Stabilizing DSL and IR.
- Data ingestion and normalization robustness.
- First execution engine integration.
- Observability and basic IAM.
- Open Questions for External Architects
- Topics where feedback is explicitly requested:
- Better IR / DAG design patterns.
- Data layout and partitioning trade-offs.
- Multi-tenant scaling and security model.
- Agent orchestration and safety.
- Topics where feedback is explicitly requested:
10. Glossary and Conventions
Section titled “10. Glossary and Conventions”Subsections:
- Glossary
- IR (Intermediate Representation)
- DAG (Directed Acyclic Graph)
- Fingerprint
- Capability pack
- Orchestrator, Gateway, Registry, etc.
- Naming Conventions
- Repo and package naming (e.g.
alphaforge-core,alphaforge-cap-midfreq). - DSL file naming and directory layout.
- Config file conventions (YAML/JSON structures).
- Repo and package naming (e.g.
- Directory Layout and Configuration Patterns
- Suggested repo structure: core vs capability packs.
- Where DSL specs, configs, and experiment definitions live.
- Patterns for environment-specific configuration (dev / staging / prod).