Skip to content

DSM - Market Data Retrieval Workflow

Source: Notion | Last edited: 2025-05-15 | ID: 1c22d2dc-3ef...


graph TB
%% Define four columns to maximize horizontal space
subgraph "Initial Request"
A["Start: Data Request<br/>symbol, time range, interval"] --> B["**Check Cache (Daily)?**<br/>use_cache=True<br/><br/><sup>User preference & config</sup>"]
end
subgraph "Cache Check"
B -- Yes --> C["**Cache Hit (Daily)?**<br/>Valid & Recent Data for Day?<br/><br/><sup>Metadata & checksum validation</sup><br/><sup>Data freshness threshold</sup>"]
C -- Yes --> E["**Load Data from Cache**<br/>UnifiedCacheManager.load_from_cache<br/><br/><sup>Fast daily retrieval</sup><br/><sup>REST API boundary aligned</sup>"]
E --> F["Return Data<br/>DataFrame from Cache"]
end
subgraph "API Strategy"
B -- No --> D["**Data Source Selection**<br/>_should_use_vision_api<br/><br/><sup>Estimate data points</sup><br/><sup>Vision API for large requests</sup>"]
C -- No --> D
D --> G1["**Vision API (Primary)**<br/>VisionDataClient.fetch<br/><br/><sup>Download-First Approach</sup><br/><sup>No pre-checking - faster retrieval</sup>"]
G1 --> G{"**Vision API Fetch**<br/>VisionDataClient._download_data<br/><br/><sup>Direct download with dynamic concurrency</sup><br/><sup>Aligned boundaries via ApiBoundaryValidator</sup>"}
end
subgraph "Results & Caching"
G -- Success --> I{"**Save to Cache (Daily)?**<br/>UnifiedCacheManager.save_to_cache<br/><br/><sup>Saves with REST API-aligned boundaries</sup><br/><sup>using TimeRangeManager.align_vision_api_to_rest</sup>"}
G -- Fail --> H["**Automatic Fallback**<br/>RestDataClient.fetch<br/><br/><sup>Transparent fallback for the user</sup><br/><sup>Same consistent interface</sup>"]
H -- Success --> K{"**Save to Cache (Daily)?**<br/>UnifiedCacheManager.save_to_cache<br/><br/><sup>Caches successful REST API data</sup><br/><sup>Same format as Vision API data</sup>"}
H -- Fail --> M["**Error Handling**<br/>raise Exception<br/><br/><sup>Retrieval failure</sup><br/><sup>Logged error details</sup>"]
I --> J["Return Data<br/>DataFrame from Vision API<br/><br/><sup>Aligned with REST API boundaries</sup>"]
K --> L["Return Data<br/>DataFrame from REST API"]
end
%% Connect across subgraphs
F --> N["End: Data Retrieval<br/>Returns DataFrame"]
J --> N
L --> N
M --> N
%% Styling
style I fill:#f9f,stroke:#333,stroke-width:2px,color:#000
style K fill:#f9f,stroke:#333,stroke-width:2px,color:#000
style B fill:#ccf,stroke:#333,stroke-width:2px,color:#000,shape:rect
style C fill:#ccf,stroke:#333,stroke-width:2px,color:#000,shape:rect
style D fill:#ccf,stroke:#333,stroke-width:2px,color:#000,shape:rect
style G1 fill:#cfc,stroke:#333,stroke-width:2px,color:#000
style H fill:#cfc,stroke:#333,stroke-width:2px,color:#000,stroke-dasharray: 5, 5
style E fill:#cfc,stroke:#333,stroke-width:2px,color:#000
style G fill:#eee,stroke:#333,stroke-width:2px,color:#000
style M fill:#fee,stroke:#333,stroke-width:2px,color:#000
%% Define larger font class
classDef largeText fontSize:18px;
%% Apply large text to all nodes
class A,B,C,D,E,F,G,G1,H,I,J,K,L,M,N largeText;

This diagram illustrates the improved market data retrieval workflow with two key optimizations:

  1. Download-First Approach: The Vision API client now uses a direct download-first approach without pre-checking file existence, significantly improving performance.
  2. Automatic Fallback: If Vision API fails to retrieve data, the system automatically and transparently falls back to REST API. The workflow retains the existing advantages while adding these performance and reliability improvements.

The data retrieval process begins with a user request for market data. The system first checks for valid REST API-aligned cached data. If found, it’s immediately returned.

Otherwise, the data source selection process is triggered:

  • Primary Path (Vision API with Download-First):
    • The system tries Vision API first for most requests, especially larger historical ones
    • Uses download-first approach (no pre-checking) for optimal performance
    • Applies dynamic concurrency optimization based on batch size
    • Downloads data by day, combines results, and caches with REST API-aligned boundaries
  • Automatic Fallback Path (REST API):
    • If Vision API fails or returns no data, the system automatically falls back to REST API
    • This fallback is transparent to the user - same interface and data format
    • REST API data is also cached for future retrieval All data sources (Vision API, REST API, and cache) deliver consistent results with identical time boundaries, ensuring a seamless experience regardless of which source ultimately provides the data.
  1. Improved Performance: The download-first approach eliminates unnecessary HEAD requests
  2. Higher Reliability: Automatic fallback ensures data retrieval even when Vision API is unavailable
  3. Optimized Resource Usage: Dynamic concurrency adjustment based on batch size
  4. Consistent Data Format: All sources return identical data structure
  5. Transparent Experience: Users don’t need to worry about which source provides the data

Update

flowchart TD
A[Application] --> B[DataSourceManager.get_data]
B --> C[Check Cache]
C -->|Hit| D[Return Cached Data]
C -->|Miss| E[Time Range Analysis]
E --> F[Split into Sub-Ranges if Needed]
F --> G[For Each Sub-Range]
G --> H[Try Vision API]
H -->|Success| I[Format Data]
H -->|Failure| J[Try REST API]
J -->|Success| I
J -->|Failure| K[Error]
I --> L[Merge Data]
L --> M[Cache Result]
M --> N[Return Data]
D --> N
%% Place improvements as a vertical list
subgraph Improvements [Key Improvements]
direction TB
R1[1s intervals in SPOT]
R2[Smart Chunking]
R3[Single Cache Entry]
R4[Better Error Handling]
end
%% Place priority as a vertical list
subgraph Priority [Source Priority]
direction TB
P1[1. Cache]
P2[2. Vision API]
P3[3. REST API]
end
%% Position subgraphs
Improvements -.- Priority