--- language: en license: mit tags: - finance - trading - cryptocurrency - lightgbm - tabular - time-series - quantitative-finance --- # LGBM Crypto Expected-Value Entry Classifier ## Overview This model is a **LightGBM-based binary classifier** trained to identify **high-probability long entry points** in cryptocurrency markets based on engineered OHLCV features. The model outputs a probability representing whether a trade has **positive expected value** over a fixed future horizon, given current market conditions. It is designed as an **entry signal component**, not a full trading system. --- ## Intended Use - Identifying high-confidence trade entry points - Research into ML-driven alpha signals - Use as a signal input for rule-based or reinforcement-learning trading systems - Educational and experimental quantitative finance projects **Not intended for:** - Direct execution without risk management - Standalone portfolio management - Live trading without additional validation --- ## Data - **Assets:** BTC_USDT, ETH_USDT (Binance spot) - **Frequency:** 1-minute OHLCV bars - **Time period:** Historical Binance data (multi-year) - **Source:** Public Binance data via CryptoDataDownload --- ## Features (high-level) The model uses engineered, asset-agnostic features including: - Log returns over multiple horizons - Rolling volatility estimates - Moving averages and trend slopes - ATR-based volatility - Volume and trade-count z-scores All features are computed using **only past information** (no leakage). --- ## Labels The target label represents whether a hypothetical long trade achieves **positive expected value** over a fixed future horizon, accounting for transaction costs. This is **not** a directional price prediction. --- ## Model Details - **Model type:** LightGBM Gradient Boosted Trees - **Objective:** Binary classification (expected value > 0) - **Loss:** Binary log loss - **Training style:** Time-based train/validation split - **Evaluation:** AUC, log loss, walk-forward backtests --- ## Performance Summary Typical validation metrics (varies by window): - AUC: ~0.55 - Log loss: ~0.68 Despite modest AUC, the model demonstrates **positive expectancy when thresholded**, consistent with real-world trading signals. --- ## Usage Example ```python import joblib import pandas as pd bundle = joblib.load("lgbm_ev_classifier.joblib") model = bundle["model"] feature_cols = bundle["feature_cols"] # df must already contain engineered features df["prob"] = model.predict(df[feature_cols])