Enigma Refactor

Source: Notion | Last edited: 2022-10-27 | ID: d0cefb0c-bd8...

Fetch data

fetch raw data
split test, train+validation data Resample (head + tail)

Data

Fetch raw data
Avoid using the specific date in the saved csv path
Avoid train test split
Data preprocessing
data cleaning/backfill
data selection (time based, namely - head() or tail())
resample + save
feature engineering 1. read from resampled data 1. generate features (use functions as input)，将来在feature analyst 尝试添加feature时，将function放到queue里，然后进行evaluate，如果选上则放到private repo里，如果没选上，则从queue里删掉 1. Try to save and load features data before * 30 Model
Load features and train model (save stats and upload to s3)
Load model
Add support for remote ML Flow Service

[{open, high, low, close, volume, timestamp}], [google trends], [coinbase_ohlcv]

def generate_feature_1(ohlcv_list, x, y, z)

  /// calculate …

  return [{rsi_7, rsi_14, timestamp}]

def generate_features([function_names]):

  features_list = []

  for function in function_names:

  feature_list.append(function_1(ohlcv_list, x, y, z))

  // …save_feature_list

https://www.lullabot.com/articles/fixing-docker-and-vpn-ip-address-conflicts

curl \
  -X POST \
  -H "Content-Type: application/graphql" \
  -H "Authorization: Apikey 2cnztlftbfol6spm_lcaz6zl7kybfsy7k"\
  --data '{
  getMetric(metric: "social_volume_total") {
    timeseriesData(
      slug: "ethereum"
      from: "2018-01-01T00:00:00Z"
      to: "2018-03-01T00:00:00Z"
      interval: "1d"
      aggregation: SUM
    ) { datetime value }
 }
}' \
  https://api.santiment.net/graphql