Skip to content

System Architecture Diagram

Source: Notion | Last edited: 2025-12-03 | ID: 29b2d2dc-3ef...


1.1 Mermaid (paste into Notion / GitHub / Markdown renderers)

Section titled “1.1 Mermaid (paste into Notion / GitHub / Markdown renderers)”
flowchart LR
subgraph Sources["Data Sources"]
M1["Markets<br/>(trades / quotes / orderbooks / bars)"]
R1["Reference<br/>(fundamentals / corp actions / calendars)"]
A1["Alt Data<br/>(news / social / RSS)"]
C1["On-chain<br/>(RPC / indexers)"]
end
subgraph Stream["Streaming Layer"]
K1["Redpanda / Kafka"]
SR1["Schema Registry<br/>(Apicurio / Confluent)"]
SP1["Stream Processing<br/>(Materialize / Flink / Bytewax)"]
end
subgraph Lakehouse["Lakehouse Storage"]
S3["S3 / MinIO"]
IC["Iceberg Catalog<br/>(Glue / Nessie)"]
end
subgraph OLAP["OLAP / Timeseries"]
CH["ClickHouse"]
SQL["Trino / Athena / DuckDB"]
end
subgraph Vector["Unstructured / Vector"]
OBJ["Raw Objects<br/>S3 URIs"]
VDB["pgvector / Qdrant"]
end
subgraph Feature["Feature Layer"]
OFF["Offline Feature Store<br/>(Iceberg Tables)"]
ONL["Online Feature Store<br/>(Feast → Redis / ClickHouse)"]
end
subgraph Access["Access & Serving"]
AF["Arrow Flight / Flight SQL"]
GRPC["gRPC / REST"]
META["OpenMetadata + OpenLineage<br/>+ Great Expectations"]
end
subgraph QuantOS["Consumers"]
AG["AI Agents"]
RUN["QuantOS Runner"]
RE["Research Notebooks"]
end
M1 --> K1
R1 --> K1
A1 --> K1
C1 --> K1
K1 --> SR1
K1 --> SP1
SP1 --> S3
K1 --> S3
S3 --> IC
S3 --> CH
CH --> OFF
S3 --> OFF
OFF --> ONL
OBJ --> VDB
A1 --> OBJ
S3 --> SQL
CH --> AF
OFF --> AF
ONL --> AF
CH --> GRPC
OFF --> GRPC
ONL --> GRPC
S3 --> META
K1 --> META
SP1 --> META
AF --> AG
GRPC --> AG
AF --> RUN
GRPC --> RUN
SQL --> RE

1.2 Quick ASCII View (for docs that don’t render Mermaid)

Section titled “1.2 Quick ASCII View (for docs that don’t render Mermaid)”
[SOURCES]
Markets | Reference | Alt/News | On-chain
↓ (Avro/Protobuf + Schema Registry)
[STREAM BUS] Redpanda/Kafka → [STREAM PROC] Materialize/Flink
↓ ↘ alerts/quality
[LAKEHOUSE] S3/MinIO + Iceberg Catalog (Glue/Nessie)
↓ ↓ ↓
[Trino/Athena] [ClickHouse] [Objects + Vector DB]
↓ ↓ ↓
Ad-hoc SQL Rollups/low-latency Unstructured + RAG
\_____________________ ______________________/
\/
[FEATURE LAYER]
Offline (Iceberg) + Online (Feast→Redis/CH)
\/
[ACCESS LAYER]
Arrow Flight / gRPC / REST / Metadata(Lineage/Quality)
\/
[QuantOS Runner | AI Agents | Notebooks]

Opinionated for AWS + EKS, cloud-neutral where possible.

You can swap managed for self-managed later without changing the public interfaces.

infra/
├─ envs/
│ ├─ dev/
│ │ ├─ main.tf
│ │ ├─ variables.tf
│ │ └─ terraform.tfvars
│ └─ prod/
│ ├─ main.tf
│ ├─ variables.tf
│ └─ terraform.tfvars
├─ modules/
│ ├─ vpc/
│ ├─ eks/
│ ├─ redpanda/ # Helm on EKS (or MSK alternative)
│ ├─ schema-registry/ # Apicurio/Confluent Helm
│ ├─ s3-iceberg/ # S3 buckets + IAM + Glue/Nessie
│ ├─ clickhouse/ # ClickHouse operator + cluster
│ ├─ trino/ # Trino Helm (optional early)
│ ├─ materialize/ # Materialize Helm (or Flink)
│ ├─ feast/ # Feast On EKS
│ ├─ redis/ # Online FS low-latency store
│ ├─ pgvector/ # RDS Postgres with pgvector (or Qdrant Helm)
│ ├─ openmetadata/ # Metadata catalog
│ ├─ openlineage/ # Marquez/OpenLineage
│ ├─ great-expectations/ # Data quality runner job
│ ├─ arrow-flight-gateway/ # Custom k8s svc for Flight/grpc
│ └─ observability/ # Prometheus/Grafana/Loki/Tempo
└─ README.md

2.1 envs/dev/main.tf (root wiring example)

Section titled “2.1 envs/dev/main.tf (root wiring example)”
terraform {
required_version = ">= 1.8.0"
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
kubernetes = { source = "hashicorp/kubernetes", version = "~> 2.30" }
helm = { source = "hashicorp/helm", version = "~> 2.13" }
}
}
provider "aws" {
region = var.aws_region
}
module "vpc" {
source = "../../modules/vpc"
name = "${var.prefix}-vpc"
cidr = var.vpc_cidr
}
module "eks" {
source = "../../modules/eks"
cluster_name = "${var.prefix}-eks"
vpc_id = module.vpc.id
private_subnets = module.vpc.private_subnets
public_subnets = module.vpc.public_subnets
}
# Lakehouse foundation: S3 + Glue Iceberg catalog
module "s3_iceberg" {
source = "../../modules/s3-iceberg"
prefix = var.prefix
bucket_lake = "${var.prefix}-lake"
enable_glue_catalog = true
}
# Streaming: Redpanda (or swap to MSK)
module "redpanda" {
source = "../../modules/redpanda"
cluster_name = "${var.prefix}-redpanda"
eks_cluster = module.eks.name
eks_kubeconfig = module.eks.kubeconfig
}
module "schema_registry" {
source = "../../modules/schema-registry"
eks_kubeconfig = module.eks.kubeconfig
provider_type = "apicurio" # or "confluent"
}
# OLAP: ClickHouse
module "clickhouse" {
source = "../../modules/clickhouse"
eks_kubeconfig = module.eks.kubeconfig
storage_size = "2Ti"
}
# SQL access (optional early): Trino
module "trino" {
source = "../../modules/trino"
eks_kubeconfig = module.eks.kubeconfig
}
# Stream processing (start with Materialize)
module "materialize" {
source = "../../modules/materialize"
eks_kubeconfig = module.eks.kubeconfig
}
# Feature Store: Feast + Redis (or ClickHouse as online store)
module "redis" {
source = "../../modules/redis"
eks_kubeconfig = module.eks.kubeconfig
size = "cache.m6g.large"
}
module "feast" {
source = "../../modules/feast"
eks_kubeconfig = module.eks.kubeconfig
offline_store_catalog = module.s3_iceberg.glue_catalog_name
online_store_endpoint = module.redis.endpoint
}
# Vector: RDS Postgres + pgvector (swap to Qdrant later if needed)
module "pgvector" {
source = "../../modules/pgvector"
db_name = "vectors"
instance_type = "db.r6g.large"
vpc_id = module.vpc.id
subnets = module.vpc.private_subnets
}
# Metadata, lineage, quality
module "openmetadata" {
source = "../../modules/openmetadata"
eks_kubeconfig = module.eks.kubeconfig
}
module "openlineage" {
source = "../../modules/openlineage"
eks_kubeconfig = module.eks.kubeconfig
}
module "great_expectations" {
source = "../../modules/great-expectations"
eks_kubeconfig = module.eks.kubeconfig
lake_bucket = module.s3_iceberg.bucket_lake
expectations_s3_prefix = "dq/expectations/"
}
# Arrow Flight / gRPC gateway (your public API to QuantOS)
module "arrow_flight_gateway" {
source = "../../modules/arrow-flight-gateway"
eks_kubeconfig = module.eks.kubeconfig
depends_on = [module.clickhouse, module.feast]
}
variable "prefix" { type = string }
variable "aws_region"{ type = string, default = "us-west-2" }
variable "vpc_cidr" { type = string, default = "10.30.0.0/16" }

resource "aws_s3_bucket" "lake" {
bucket = var.bucket_lake
force_destroy = true
}
resource "aws_s3_bucket_versioning" "v" {
bucket = aws_s3_bucket.lake.id
versioning_configuration { status = "Enabled" }
}
# Optional: AWS Glue Data Catalog as Iceberg catalog
resource "aws_glue_catalog_database" "iceberg_db" {
count = var.enable_glue_catalog ? 1 : 0
name = "${var.prefix}_iceberg"
}
output "bucket_lake" { value = aws_s3_bucket.lake.bucket }
output "glue_catalog_name" { value = try(aws_glue_catalog_database.iceberg_db[0].name, null) }
provider "helm" {
kubernetes { config_path = var.eks_kubeconfig }
}
resource "helm_release" "redpanda" {
name = "redpanda"
repository = "https://charts.redpanda.com"
chart = "redpanda"
namespace = "streaming"
create_namespace = true
values = [yamlencode({
statefulset = { replicas = 3 }
storage = { persistentVolume = { size = "500Gi" } }
external = { enabled = true }
})]
}
resource "helm_release" "clickhouse_operator" {
name = "clickhouse-operator"
repository = "https://charts.altinity.com"
chart = "altinity-clickhouse-operator"
namespace = "olap"
create_namespace = true
}
resource "helm_release" "clickhouse" {
name = "clickhouse"
repository = "https://charts.altinity.com"
chart = "clickhouse"
namespace = "olap"
values = [yamlencode({
replicas = 3
persistence = { size = var.storage_size }
resources = {
requests = { cpu = "4", memory = "16Gi" }
limits = { cpu = "8", memory = "32Gi" }
}
})]
}
offlineStore:
type: iceberg
catalog: glue
warehouse: s3://<lake-bucket>/
onlineStore:
type: redis
host: <redis-host>
port: 6379

modules/arrow-flight-gateway/deployment.yaml (conceptual)

Section titled “modules/arrow-flight-gateway/deployment.yaml (conceptual)”
apiVersion: apps/v1
kind: Deployment
metadata: { name: arrow-flight-gateway, namespace: serving }
spec:
replicas: 2
selector: { matchLabels: { app: flight-gw } }
template:
metadata: { labels: { app: flight-gw } }
spec:
containers:
- name: flight
image: ghcr.io/yourorg/arrow-flight-gateway:latest
ports: [{ containerPort: 31337 }]
env:
- { name: CLICKHOUSE_DSN, valueFrom: { secretKeyRef: { name: ch-secrets, key: dsn } } }
- { name: FEAST_ONLINE_ADDR, value: "redis:6379" }
- { name: ICEBERG_CATALOG, value: "glue://..." }
---
apiVersion: v1
kind: Service
metadata: { name: arrow-flight-gateway, namespace: serving }
spec:
type: LoadBalancer
ports: [{ port: 31337, targetPort: 31337, name: flight }]
selector: { app: flight-gw }