| | ---
|
| | language: en
|
| | license: mit
|
| | tags:
|
| | - finance
|
| | - trading
|
| | - cryptocurrency
|
| | - lightgbm
|
| | - tabular
|
| | - time-series
|
| | - quantitative-finance
|
| | ---
|
| |
|
| | # LGBM Crypto Expected-Value Entry Classifier
|
| |
|
| | ## Overview
|
| |
|
| | This model is a **LightGBM-based binary classifier** trained to identify **high-probability long entry points** in cryptocurrency markets based on engineered OHLCV features.
|
| |
|
| | The model outputs a probability representing whether a trade has **positive expected value** over a fixed future horizon, given current market conditions.
|
| |
|
| | It is designed as an **entry signal component**, not a full trading system.
|
| |
|
| | ---
|
| |
|
| | ## Intended Use
|
| |
|
| | - Identifying high-confidence trade entry points
|
| | - Research into ML-driven alpha signals
|
| | - Use as a signal input for rule-based or reinforcement-learning trading systems
|
| | - Educational and experimental quantitative finance projects
|
| |
|
| | **Not intended for:**
|
| | - Direct execution without risk management
|
| | - Standalone portfolio management
|
| | - Live trading without additional validation
|
| |
|
| | ---
|
| |
|
| | ## Data
|
| |
|
| | - **Assets:** BTC_USDT, ETH_USDT (Binance spot)
|
| | - **Frequency:** 1-minute OHLCV bars
|
| | - **Time period:** Historical Binance data (multi-year)
|
| | - **Source:** Public Binance data via CryptoDataDownload
|
| |
|
| | ---
|
| |
|
| | ## Features (high-level)
|
| |
|
| | The model uses engineered, asset-agnostic features including:
|
| |
|
| | - Log returns over multiple horizons
|
| | - Rolling volatility estimates
|
| | - Moving averages and trend slopes
|
| | - ATR-based volatility
|
| | - Volume and trade-count z-scores
|
| |
|
| | All features are computed using **only past information** (no leakage).
|
| |
|
| | ---
|
| |
|
| | ## Labels
|
| |
|
| | The target label represents whether a hypothetical long trade achieves **positive expected value** over a fixed future horizon, accounting for transaction costs.
|
| |
|
| | This is **not** a directional price prediction.
|
| |
|
| | ---
|
| |
|
| | ## Model Details
|
| |
|
| | - **Model type:** LightGBM Gradient Boosted Trees
|
| | - **Objective:** Binary classification (expected value > 0)
|
| | - **Loss:** Binary log loss
|
| | - **Training style:** Time-based train/validation split
|
| | - **Evaluation:** AUC, log loss, walk-forward backtests
|
| |
|
| | ---
|
| |
|
| | ## Performance Summary
|
| |
|
| | Typical validation metrics (varies by window):
|
| |
|
| | - AUC: ~0.55
|
| | - Log loss: ~0.68
|
| |
|
| | Despite modest AUC, the model demonstrates **positive expectancy when thresholded**, consistent with real-world trading signals.
|
| |
|
| | ---
|
| |
|
| | ## Usage Example
|
| |
|
| | ```python
|
| | import joblib
|
| | import pandas as pd
|
| |
|
| | bundle = joblib.load("lgbm_ev_classifier.joblib")
|
| | model = bundle["model"]
|
| | feature_cols = bundle["feature_cols"]
|
| |
|
| | # df must already contain engineered features
|
| | df["prob"] = model.predict(df[feature_cols])
|
| | |