Skip to content

ADR: Data Warehouse Plugin Integration

Source: Notion | Last edited: 2025-12-12 | ID: 2c32d2dc-3ef...


Status: Proposal

Date: 2025-12-08

Deciders: @Chen Li

Related ADRs: ADR-018 (Orchestrator-Level Data Caching)

We need to integrate a Data Warehouse (e.g., ClickHouse) as a new data source in Alpha Forge.

Following the Plugin-First Design principle:

All functionality (data loading, feature engineering, strategy logic, backtesting) is delivered through standardized plugins with well-defined interfaces. This creates clear boundaries for AI-generated code and enables rapid composition of complex strategies from simple, reusable building blocks.

This should be implemented as a standalone data plugin - not replacing or restructuring existing plugins.

Add ONE new plugin: data.warehouse that connects to the centralized Data Warehouse.


Option 1: Simple Plugin in Middlefreq Package

Section titled “Option 1: Simple Plugin in Middlefreq Package”

Approach: Add warehouse.py directly in alpha-forge-middlefreq/plugins/data/, following existing plugin patterns exactly.

File Location:

packages/alpha-forge-middlefreq/
└── alpha_forge_caps/middlefreq/
└── plugins/data/
├── binance_[futures.py](http://futures.py/) # Existing
├── binance_funding_[rate.py](http://rate.py/) # Existing
└── [warehouse.py](http://warehouse.py/) # ⭐ NEW

Implementation (sketch):

# packages/alpha-forge-middlefreq/alpha_forge_caps/middlefreq/plugins/data/[warehouse.py](http://warehouse.py/)
from typing import Any
import pandas as pd
from alpha_forge.core.plugin_registry import register_plugin
@register_plugin(
namespace="[capabilities.middlefreq.data](http://capabilities.middlefreq.data/).warehouse",
plugin_type="data",
version="1.0.0",
tags=["data_source", "warehouse", "ohlcv", "production"],
parameters={
"snapshot_id": {"type": "string", "required": False},
"universe": {"type": "list[string]", "required": True},
"date_range": {"type": "dict[str, str]", "required": True},
"columns": {"type": "list[string]", "required": False},
},
outputs={
"format": "panel_df",
"columns": ["ts", "symbol", "open", "high", "low", "close", "volume"],
},
)
def fetch_warehouse(*, dataset: str, universe: list[str], date_range: dict[str, str], columns: list[str] | None = None, **_) -> pd.DataFrame:
client = _get_warehouse_client()
query = _build_query(dataset, universe, date_range, columns)
df = client.query_dataframe(query)
return _standardize_output(df, dataset)

DSL Usage:

pipeline:
- data:
frame: ohlcv
using: data.warehouse
params:
dataset: "crypto_ohlcv_1h"
universe: ["BTCUSDT", "ETHUSDT"]
date_range: { start: "2023-01-01", end: "2023-12-31" }

Option 2: Plugin in Shared Package (If Used Across Capabilities)

Section titled “Option 2: Plugin in Shared Package (If Used Across Capabilities)”

Approach: If the warehouse plugin will be used by multiple capability packages (shared, middlefreq, future highfreq), place it in alpha-forge-shared.

File Location:

packages/alpha-forge-shared/
└── alpha_forge_caps/shared/
└── plugins/data/
├── sample_[csv.py](http://csv.py/) # Existing
├── eonlabs_[s3.py](http://s3.py/) # Existing
└── [warehouse.py](http://warehouse.py/) # ⭐ NEW

Implementation: Same as Option 1, just different location.

DSL Usage:

pipeline:
- data:
frame: ohlcv
using: data.warehouse # Resolved from shared capability
params:
dataset: "crypto_ohlcv_1h"
universe: ["BTCUSDT", "ETHUSDT"]
date_range: { start: "2023-01-01", end: "2023-12-31" }


Selected Option: Option 2 - Plugin in Shared Package

Rationale: Implementing the warehouse plugin in the shared package enables reuse across all capability packages and establishes a centralized data source pattern consistent with existing shared plugins like data.sample_csv and data.eonlabs_s3.