Quantiphi Questionnaire
Source: Notion | Last edited: 2022-12-31 | ID: 6b0a7b97-f82...
- Which technique is used for forecasting of the base model?
- LTSM
- What is the current accuracy observed with this technique?
- We use a few simple objective functions, such as Calmar Ratio and maximum drawdown.
- How many variables are considered in this technique and what more do we need to consider with this revised model?
- Open High Low Close Volume (OHLCV) are currently used. Some sentiment indicators are being investigated.
- We believe OHLCV are the main inputs. Perhaps OHLCV of related instruments can help somehow but we haven’t tried.
- How frequently are models trained?
- New models are trained monthly but we don’t easily switch to new ones.
- What is the source of the data for model training and inference?
- Binance OHLC minute-based data, e.g. 15-min, 30-min, etc
- What is the type of data being used - images/text/tabular?
- time series price data
- Is the data pre-processed? Any EDA, Feature Engineering is to be done?
- basic technical analysis such as moving averages are used as features
- How much data do we need for model inference per model?
- This varies, the recent one we use is 5-8 years for training, 6 months for verification, and 6 months for testing
- How are model objects currently saved?
- As model files in AWS S3
- What is the destination of the output of model training and inference?
- Database
- How much historical data do we need for model training per model?
- 5-8 years
- Where does the model training currently take place? Is there any need to provide scope for training models / retraining models in the required ML Ops platform
- Local servers.
- Not at the moment.
- Are there any dependencies on any system like hadoop?
- Not much dependency, mainly aws services such as DynamoDB, Parameter Store and S3
Others:
Section titled “Others:”- What typical use cases are solved with the existing / upcoming models (Classification, Regression, NLP, etc)
- Regression
- Which consumption layer components do we have on top of this or are required (CLI, UI , Dashboards)?
- N/A
- What are the expectations regarding orchestration and lifecycle management?
- We don’t have much orhestration and lifecycle to manage, and here is a introduction of it. As of today, we train models on our local machines. Once trained, it will be evaluated against a test dataset. Evaluation result are saved to Database, and models are saved to AWS S3. The evaluation results are read by the team (human). Once we decide to deploy some of them, we will manually run a ever-running script on the prediction servers that will download the model from s3, continuously fetching new data and make predictions every minute. The prediction servers are AWS EC2 instances, and each model on the prediction server is controlled through a tmux session.
- When we want to deprecate a model, we will kill the corresponding tmux session. Since we don’t train too many models, we leave the database records and models in s3 as is.
- This is a intro of the orchestration and lifecycle management we have. Please let us know if you have any suggestion.