Enigma Refactor
Source: Notion | Last edited: 2022-10-27 | ID: d0cefb0c-bd8...
Fetch data
- fetch raw data
- split test, train+validation data Resample (head + tail)
Data
-
Fetch raw data
-
Avoid using the specific date in the saved csv path
-
Avoid train test split
-
Data preprocessing
-
data cleaning/backfill
-
data selection (time based, namely - head() or tail())
-
resample + save
-
feature engineering 1. read from resampled data 1. generate features (use functions as input),将来在feature analyst 尝试添加feature时,将function放到queue里,然后进行evaluate,如果选上则放到private repo里,如果没选上,则从queue里删掉 1. Try to save and load features data before * 30 Model
-
Load features and train model (save stats and upload to s3)
-
Load model
-
Add support for remote ML Flow Service
[{open, high, low, close, volume, timestamp}], [google trends], [coinbase_ohlcv]
def generate_feature_1(ohlcv_list, x, y, z)
/// calculate …
return [{rsi_7, rsi_14, timestamp}]
def generate_features([function_names]):
features_list = []
for function in function_names:
feature_list.append(function_1(ohlcv_list, x, y, z))
// …save_feature_listhttps://www.lullabot.com/articles/fixing-docker-and-vpn-ip-address-conflicts
curl \ -X POST \ -H "Content-Type: application/graphql" \ -H "Authorization: Apikey 2cnztlftbfol6spm_lcaz6zl7kybfsy7k"\ --data '{ getMetric(metric: "social_volume_total") { timeseriesData( slug: "ethereum" from: "2018-01-01T00:00:00Z" to: "2018-03-01T00:00:00Z" interval: "1d" aggregation: SUM ) { datetime value } }}' \ https://api.santiment.net/graphql