ADR: Data Warehouse Plugin Integration
Source: Notion | Last edited: 2025-12-12 | ID: 2c32d2dc-3ef...
Status: Proposal
Date: 2025-12-08
Deciders: @Chen Li
Related ADRs: ADR-018 (Orchestrator-Level Data Caching)
Context
Section titled “Context”We need to integrate a Data Warehouse (e.g., ClickHouse) as a new data source in Alpha Forge.
Following the Plugin-First Design principle:
All functionality (data loading, feature engineering, strategy logic, backtesting) is delivered through standardized plugins with well-defined interfaces. This creates clear boundaries for AI-generated code and enables rapid composition of complex strategies from simple, reusable building blocks.
This should be implemented as a standalone data plugin - not replacing or restructuring existing plugins.
Current Data Plugins (Remain Unchanged)
Section titled “Current Data Plugins (Remain Unchanged)”Add ONE new plugin: data.warehouse that connects to the centralized Data Warehouse.
Option 1: Simple Plugin in Middlefreq Package
Section titled “Option 1: Simple Plugin in Middlefreq Package”Approach: Add warehouse.py directly in alpha-forge-middlefreq/plugins/data/, following existing plugin patterns exactly.
File Location:
packages/alpha-forge-middlefreq/└── alpha_forge_caps/middlefreq/ └── plugins/data/ ├── binance_[futures.py](http://futures.py/) # Existing ├── binance_funding_[rate.py](http://rate.py/) # Existing └── [warehouse.py](http://warehouse.py/) # ⭐ NEWImplementation (sketch):
# packages/alpha-forge-middlefreq/alpha_forge_caps/middlefreq/plugins/data/[warehouse.py](http://warehouse.py/)from typing import Anyimport pandas as pdfrom alpha_forge.core.plugin_registry import register_plugin
@register_plugin( namespace="[capabilities.middlefreq.data](http://capabilities.middlefreq.data/).warehouse", plugin_type="data", version="1.0.0", tags=["data_source", "warehouse", "ohlcv", "production"], parameters={ "snapshot_id": {"type": "string", "required": False}, "universe": {"type": "list[string]", "required": True}, "date_range": {"type": "dict[str, str]", "required": True}, "columns": {"type": "list[string]", "required": False}, }, outputs={ "format": "panel_df", "columns": ["ts", "symbol", "open", "high", "low", "close", "volume"], },)
def fetch_warehouse(*, dataset: str, universe: list[str], date_range: dict[str, str], columns: list[str] | None = None, **_) -> pd.DataFrame: client = _get_warehouse_client() query = _build_query(dataset, universe, date_range, columns) df = client.query_dataframe(query) return _standardize_output(df, dataset)DSL Usage:
pipeline: - data: frame: ohlcv using: data.warehouse params: dataset: "crypto_ohlcv_1h" universe: ["BTCUSDT", "ETHUSDT"] date_range: { start: "2023-01-01", end: "2023-12-31" }Evaluation (Pros and Cons)
Section titled “Evaluation (Pros and Cons)”Option 2: Plugin in Shared Package (If Used Across Capabilities)
Section titled “Option 2: Plugin in Shared Package (If Used Across Capabilities)”Approach: If the warehouse plugin will be used by multiple capability packages (shared, middlefreq, future highfreq), place it in alpha-forge-shared.
File Location:
packages/alpha-forge-shared/└── alpha_forge_caps/shared/ └── plugins/data/ ├── sample_[csv.py](http://csv.py/) # Existing ├── eonlabs_[s3.py](http://s3.py/) # Existing └── [warehouse.py](http://warehouse.py/) # ⭐ NEWImplementation: Same as Option 1, just different location.
DSL Usage:
pipeline: - data: frame: ohlcv using: data.warehouse # Resolved from shared capability params: dataset: "crypto_ohlcv_1h" universe: ["BTCUSDT", "ETHUSDT"] date_range: { start: "2023-01-01", end: "2023-12-31" }Evaluation (Pros and Cons)
Section titled “Evaluation (Pros and Cons)”Decision
Section titled “Decision”Selected Option: Option 2 - Plugin in Shared Package
Rationale: Implementing the warehouse plugin in the shared package enables reuse across all capability packages and establishes a centralized data source pattern consistent with existing shared plugins like data.sample_csv and data.eonlabs_s3.