Crypto 15-Minute Direction Classifier

A time-series classification model that predicts whether Bitcoin (BTC/USDT) price will move up or down over the next 15-minute interval using multivariate historical market data.

Model Overview

Attribute Value
Task Binary time-series classification
Target BTC price direction in next 15 minutes (up=1, down=0)
Input 60 minutes of multivariate OHLCV + technical indicators
Assets BTC/USDT + ETH/USDT (cross-asset features)
Best Model Logistic Regression on flattened windows
Dataset 300K rows of 1-minute candles from WinkingFace CryptoLM datasets

Performance

Metric Value
Test Accuracy 53.1%
Test F1 0.574
Test AUC 0.540

Note: 15-minute crypto price direction prediction is an extremely hard problem due to market efficiency at short timeframes. The model consistently edges above random chance (50%), demonstrating a non-trivial but small signal. This pipeline is valuable as a complete data engineering and feature extraction system for further research.

Data Sources

Features (49 per timestep)

BTC & ETH (separately)

  • Price: open, high, low, close
  • Volume: volume
  • Moving Averages: MA_20, MA_50, MA_200
  • Momentum: RSI, %K, %D, ADX, ATR
  • Trend: MACD, Signal, Histogram, Trendline
  • Volatility: BL_Upper, BL_Lower, MN_Upper, MN_Lower

Cross-Asset Engineered

  • eth_btc_ratio - ETH/BTC price ratio
  • btc_ret_1m, eth_ret_1m - 1-minute returns
  • btc_vol_ma20, eth_vol_ma20 - 20-period volume MA
  • btc_range, eth_range - Normalized price range

Pipeline

  1. Load & Merge BTC and ETH 1-minute datasets on timestamp
  2. Engineer Features - Add returns, ratios, ranges, volume MAs
  3. Create Windows - 60-minute lookback → predict next 15-minute direction
  4. Clean - Drop NaN/Inf, standardize per-feature
  5. Split - 70/15/15 temporal train/val/test (no data leakage)
  6. Train - Logistic Regression + Random Forest baselines

Usage

import pickle
import numpy as np

# Load model
with open("model.pkl", "rb") as f:
    model = pickle.load(f)

# Load preprocessing artifacts
mean = np.load("feature_mean.npy")
std = np.load("feature_std.npy")
valid = np.load("valid_cols.npy")

# X shape: (samples, 60 minutes, 49 features)
X_flat = X.reshape(X.shape[0], -1)      # flatten to 2940 features
X_flat = X_flat[:, valid]               # keep valid columns
X_norm = (X_flat - mean) / std            # standardize

# Predict
preds = model.predict(X_norm)            # 0=down, 1=up
probs = model.predict_proba(X_norm)[:, 1]  # probability of up

Files

File Description
model.pkl Trained LogisticRegression classifier
feature_mean.npy Per-feature means for standardization
feature_std.npy Per-feature standard deviations
valid_cols.npy Boolean mask of valid (finite) feature columns
metrics.json Evaluation results

Limitations

  • Market Efficiency: 15-min prediction is near-random walk; edge is small
  • No Costs: Evaluation ignores fees, slippage, spread
  • Historical Data: Trained on 2017-2020 data; may not generalize to current regimes
  • Simple Models: Deep learning (Conv-LSTM, TCN, Transformer) may improve results

Future Work

  1. Deep Learning: Conv-LSTM, Temporal CNN, or Transformer architectures
  2. More Data: Order book, funding rates, on-chain metrics, sentiment
  3. Multi-Scale: Combine 1-min, 5-min, 15-min, 1-hour features
  4. Regime Detection: Train separate models for bull/bear/sideways markets
  5. Cost-Aware Evaluation: Incorporate transaction costs in metric

License

MIT License

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support