Terry’s Project Details

Source: Notion | Last edited: 2025-03-19 | ID: 1bb2d2dc-3ef...

SR&ED Claim – Terry Li’s Work on Novel Feature Engineering for Financial Time Series Forecasting

Technological Uncertainty & Advancement

Major Challenges in Fiscal 2024

As Director of Operations at Eon Labs Ltd., Terry Li is responsible for developing novel financial time series features that improve the firm’s proprietary trading models. While Eon Labs’ primary objective is to develop and deploy trading models for live market execution, the effectiveness of these models relies heavily on the predictive power of the features they utilize. Given the industry’s lack of publicly available references and the diminishing effectiveness of well-known techniques, Terry’s work focuses on discovering and engineering new features that enhance market predictability while mitigating the risks of degradation over time.

Existing Technologies and Their Limitations

Conventional financial forecasting tools, such as TA-Lib, are widely used but fail to maintain predictive strength in high-frequency trading due to their reliance on fixed-loopback calculations. These standard industry tools:

Cannot adapt to changing market regimes
Rely on static historical windows for calculations
Produce diminishing returns as market participants adopt similar strategies
Lack flexibility to incorporate real-time regime detection

Technical Requirements and Novel Approach

To overcome these limitations, Terry developed a dynamic loopback framework, which adapts based on changing market regimes rather than using a fixed historical window. This represents a technological advancement because:

It dynamically adjusts feature calculation windows based on detected market conditions
It requires sophisticated regime detection mechanisms not available in standard libraries
It integrates multiple data sources to determine optimal parameter settings in real-time
It pushes beyond conventional industry solutions by incorporating adaptive mechanisms The challenge lies in determining the correct conditions under which these adaptive mechanisms should adjust, requiring ongoing research, hypothesis testing, and iteration.

Systematic Investigation & Experimental Development

Methodology and Hypothesis Testing

Terry’s approach to feature engineering follows a structured methodology that includes:

Identifying Market Inefficiencies – Evaluating weaknesses in existing forecasting models.
Formulating Hypotheses – Proposing novel feature transformations tailored for financial time series data.
Backtesting with Historical Data – Testing predictive validity using high-quality data.
Performance Evaluation & Refinement – Integrating features into machine learning models and assessing performance against Sharpe and Calmar ratios, among other metrics.

Experimental Iterations and Results

Since the financial market differs fundamentally from natural sciences, experimentation often involves trial and error, where some engineered features fail to provide added value. Without publicly available solutions, issues such as look-ahead bias and data alignment challenges must be addressed through proprietary methods, requiring iterative refinement and problem-solving.

First Iteration

Approach: Initial implementation of regime-based window adjustment using volatility as the primary indicator
Technical Details: Employed rolling volatility calculations with standard deviation measures over multiple timeframes
Risks: High sensitivity to outlier events leading to false regime detection

Second Iteration

Approach: Enhanced regime detection with multiple market indicators beyond volatility, including Balance of Power (BOP)
Technical Details: Implemented a weighted multi-indicator system to provide more robust regime detection compared to single-indicator approaches
Risks: Differently performing indicators could receive conflicting weights, potentially adversely affecting market indicative and predictive power
Results: Achieved improved uncorrelatedness and orthogonality to existing models, creating more independent predictive signals

Third Iteration

Approach: Higher granularity data collection and processing system to address limitations in data resolution
Technical Details: Developed custom framework for collecting data from Binance Exchange using Vision API (for historical data in SIP format with MD5 integrity verification) and real-time REST API (for high-frequency data retrieval)
Risks: Data downtime and API throttling issues required development of advanced caching systems using memory mapping (mmap) offered by Apache Arrow, which was integrated into a custom Data Source Manager (DSM) system
Results: Benchmarking showed Apache Arrow mmap implementation to be at least twice as system efficient and 10 times faster in read operations compared to traditional approaches (CSV, Parquet, etc.) when measured with profiling tools

Performance Metrics

Quantifiable feedback based on MLFlow web UI interface that displays metrics including:
Sharpe and Calmar ratios
Absolute percentage equity growth
Model orthogonality measurements (requiring at least 15% improvement to be shortlisted)

Challenges in Transitioning to Live Trading

A major challenge in Terry’s research is ensuring that features validated in backtests remain effective when applied to live market data. This requires solving:

Data Alignment Issues – Ensuring real-time data is synchronized and accurately structured for model input.
Look-Ahead Bias Prevention – Developing custom techniques to avoid leakage of future data into past predictions, as no standardized open-source solution currently exists.
Computational Efficiency – Performance bottlenecks were encountered when implementing the Data Source Manager, particularly in handling large amounts of data and integrating with legacy systems. A novel approach for simultaneous data retrieval was developed to overcome API throttling limitations (Binance limits each retrieval to 1000 data points and has endpoint throttling). This optimization substantially improved throughput, enabling successful retrieval of high-granularity data across multiple cryptocurrency instruments without API blocking.
Real-time Adaptation – Developed methodology for real-time regime detection by translating discretionary trading concepts (market structures, fair value gap) from manual traders into quantifiable time series features. This approach differs from standard industry practices by bridging the gap between discretionary traders’ expertise and quantitative frameworks, whereas traditional approaches typically focus on mathematical concepts without incorporating frontline trading insights. Technical challenges in implementing this in live trading are anticipated to include look-ahead bias and engineering bugs not captured in test cases.

Failed Approaches and Lessons Learned

Several approaches were attempted that ultimately did not meet the project’s technical requirements:

Traditional Data Caching Methods – Testing of classic approaches including CSV files, Parquet format, and Python pickle files revealed they were not efficient enough to meet business requirements for high-frequency data handling.
Technical Insights – These failures pushed the investigation toward more advanced time series data management solutions.
Methodology Development – The failures guided the team to seek a balance between functionality and efficiency, leading to the selection of memory mapping with Apache Arrow. The system still faces challenges with handling multiple concurrent model sessions accessing the Data Source Manager, which may require additional development of subsidiary systems in 2025.
Model Degradation Quantification – Development of methodologies to measure model degradation over time (identified as a next big challenge for the evolving system)