Skip to content

Enigma Refactor

Source: Notion | Last edited: 2022-10-27 | ID: d0cefb0c-bd8...


Fetch data

  • fetch raw data
  • split test, train+validation data Resample (head + tail)

Data

  1. Fetch raw data

  2. Avoid using the specific date in the saved csv path

  3. Avoid train test split

  4. Data preprocessing

  5. data cleaning/backfill

  6. data selection (time based, namely - head() or tail())

  7. resample + save

  8. feature engineering 1. read from resampled data 1. generate features (use functions as input),将来在feature analyst 尝试添加feature时,将function放到queue里,然后进行evaluate,如果选上则放到private repo里,如果没选上,则从queue里删掉 1. Try to save and load features data before * 30 Model

  9. Load features and train model (save stats and upload to s3)

  10. Load model

  11. Add support for remote ML Flow Service

[{open, high, low, close, volume, timestamp}], [google trends], [coinbase_ohlcv]

def generate_feature_1(ohlcv_list, x, y, z)
/// calculate …
return [{rsi_7, rsi_14, timestamp}]
def generate_features([function_names]):
features_list = []
for function in function_names:
feature_list.append(function_1(ohlcv_list, x, y, z))
// …save_feature_list

https://www.lullabot.com/articles/fixing-docker-and-vpn-ip-address-conflicts

Terminal window
curl \
-X POST \
-H "Content-Type: application/graphql" \
-H "Authorization: Apikey 2cnztlftbfol6spm_lcaz6zl7kybfsy7k"\
--data '{
getMetric(metric: "social_volume_total") {
timeseriesData(
slug: "ethereum"
from: "2018-01-01T00:00:00Z"
to: "2018-03-01T00:00:00Z"
interval: "1d"
aggregation: SUM
) { datetime value }
}
}' \
https://api.santiment.net/graphql