ML-3m-trader / sractch.md
algorembrant's picture
Upload 11 files
c382c53 verified
# ML-3m-trader: XAUUSDc 3-Minute Timeframe ML Trading System
End-to-end machine learning pipeline for trading XAUUSDc (Gold) on the 3-minute timeframe. Uses MetaTrader 5 for data acquisition, LightGBM for classification, and a vectorized backtesting engine with realistic execution modeling.
## User Review Required
> [!IMPORTANT]
> **ML Framework Choice: LightGBM** β€” LightGBM is the best-suited framework for this task because:
> - Tabular classification (Buy/Sell/Hold/DoNothing) is LightGBM's strongest domain
> - Extremely fast training, even on CPU (i5-7200U will handle it fine)
> - Low memory footprint (well within 12 GB RAM)
> - No GPU required (your MX110 is not needed)
> - Outperforms deep learning on structured/tabular data in virtually all benchmarks
>
> **This will run entirely on your local machine. No Google Colab needed.**
> [!WARNING]
> **MetaTrader 5 Python API** only works on Windows (which you have). MT5 must be open and logged in when running the data fetch script. The `MetaTrader5` pip package handles communication.
> [!NOTE]
> **VIX Feature**: Since the CBOE VIX index is not directly available from MT5, the system will compute a **synthetic VIX proxy** using a rolling standard deviation of returns (realized volatility), which is the standard approach in non-US-equity trading systems. If you want the actual VIX, we would need a separate data source.
---
## Proposed Changes
### Project Structure
```
ML-3m-trader/
β”œβ”€β”€ config.py # All configuration constants
β”œβ”€β”€ data_fetcher.py # MT5 data acquisition
β”œβ”€β”€ features.py # Technical indicator computation
β”œβ”€β”€ labeler.py # Trade label generation (Buy/Sell/Hold/DoNothing)
β”œβ”€β”€ model.py # LightGBM training, prediction, persistence
β”œβ”€β”€ backtester.py # Vectorized backtesting engine
β”œβ”€β”€ metrics.py # Performance evaluation
β”œβ”€β”€ main.py # CLI entry point
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ GUIDE.md # Step-by-step usage guide with tables
└── .gitignore
```
---
### Configuration
#### [NEW] [config.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/config.py)
Central configuration file containing all tunable parameters:
- `SYMBOL = "XAUUSDc"`, `TIMEFRAME = mt5.TIMEFRAME_M3`
- Feature list, lookback periods for SMA (14, 50), VROC (14), ADX (14), Momentum SI (10)
- Risk/reward ratio = 1.0, default bet percentage logic
- Slippage range (0–2 units), spread filter (`stoploss_size >= spread * 10`)
- Train/test split ratio, model hyperparameters
- Starting equity/balance
---
### Data Acquisition
#### [NEW] [data_fetcher.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/data_fetcher.py)
- Connects to MT5 terminal via `MetaTrader5` Python package
- Fetches 1-year of 3-minute OHLCV bars for XAUUSDc
- Returns a `pandas.DataFrame` with columns: `time, open, high, low, close, volume, spread`
- Saves raw data to `data/raw_xauusdc_3m.csv` for reproducibility
- Handles MT5 connection errors gracefully
---
### Feature Engineering
#### [NEW] [features.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/features.py)
Computes all required technical indicators using pure NumPy/Pandas (no TA-Lib dependency):
| Feature | Method |
|---------|--------|
| SMA | Simple Moving Average (14-period) |
| Double Moving Average | SMA(14) and SMA(50), plus crossover signal |
| VROC | Volume Rate of Change (14-period) |
| Synthetic VIX | Rolling std of log-returns (20-period) as volatility proxy |
| Momentum Strength Index | Custom momentum oscillator (10-period, 0–100 scale) |
| ADX | Average Directional Index (14-period) via Wilder's smoothing |
| Time features | Hour-of-day, minute-of-hour, day-of-week (cyclical encoded) |
All computations are vectorized with NumPy for maximum speed. NaN rows from lookback periods are dropped.
---
### Labeling Engine
#### [NEW] [labeler.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/labeler.py)
Generates ground-truth labels for supervised learning:
1. For each bar, compute a potential **Buy** and **Sell** trade:
- **Buy**: entry at `close`, SL below recent swing low (ATR-based), TP = entry + (entry - SL) (1:1 RR)
- **Sell**: entry at `close`, SL above recent swing high (ATR-based), TP = entry - (SL - entry) (1:1 RR)
2. Walk forward through subsequent bars to determine outcome (TP hit, SL hit, or neither within N bars)
3. Apply **spread filter**: if `SL_distance < spread * 10`, label = `DO_NOTHING`
4. Final labels: `BUY_WIN`, `BUY_LOSS`, `SELL_WIN`, `SELL_LOSS`, `HOLD`, `DO_NOTHING` β†’ simplified to 4-class: `BUY (1)`, `SELL (2)`, `HOLD (3)`, `DO_NOTHING (0)`
5. Only winning setups are labeled as BUY/SELL; losing setups become HOLD
---
### ML Model
#### [NEW] [model.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/model.py)
- **LightGBM** multi-class classifier (4 classes)
- Hyperparameters tuned for tabular financial data:
- `num_leaves=63`, `max_depth=8`, `learning_rate=0.05`, `n_estimators=500`
- `subsample=0.8`, `colsample_bytree=0.8`, `min_child_samples=20`
- `class_weight='balanced'` to handle label imbalance
- Train/validation split: 80/20 chronological (no shuffle β€” time series)
- Feature importance output
- Model persistence via `joblib` (save/load `.pkl`)
- Early stopping on validation set
---
### Backtesting Engine
#### [NEW] [backtester.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/backtester.py)
Vectorized backtesting with realistic execution:
- Takes model predictions and raw price data
- **Position sizing**: bet % of current balance, accounting for full SL distance
- `lot_value = balance * bet_pct / sl_distance`
- **Random slippage**: uniform 0–2 XAUUSDc units applied to entry price
- **Spread filter**: skip trade if `sl_distance < spread * 10`
- **1:1 Risk-Reward**: TP distance = SL distance
- Walk forward bar-by-bar on test set, track equity curve
- No trade limit β€” takes every valid signal
- Records all trades with entry/exit prices, PnL, timestamps
---
### Metrics & Evaluation
#### [NEW] [metrics.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/metrics.py)
| Metric | Description |
|--------|-------------|
| Win Rate | % of trades closed at TP |
| Average Win % | Mean profit per winning trade as % of balance |
| Average Loss % | Mean loss per losing trade as % of balance |
| Sharpe Ratio | Annualized risk-adjusted return |
| Sortino Ratio | Downside-risk-adjusted return |
| Max Drawdown | Largest peak-to-trough equity decline |
| Profit Factor | Gross profit / Gross loss |
| Start Equity | Initial balance |
| End Equity | Final balance after all trades |
| Total Trades | Number of executed trades |
| Avg Trade Duration | Mean holding time in bars/minutes |
| Daily PnL Stats | Intraday mean, std, min, max PnL |
| Calmar Ratio | Annualized return / Max Drawdown |
| Expectancy | Average PnL per trade |
Outputs a formatted console report and saves to `results/report.txt`.
---
### CLI Entry Point
#### [NEW] [main.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/main.py)
Unified CLI with subcommands:
```
python main.py fetch # Fetch 1-year data from MT5
python main.py train # Engineer features, label, train model
python main.py backtest # Run backtest on test set
python main.py evaluate # Print metrics report
python main.py run # Full pipeline: fetch β†’ train β†’ backtest β†’ evaluate
```
Uses `argparse` with clear help text.
---
### Project Files
#### [NEW] [requirements.txt](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/requirements.txt)
```
MetaTrader5>=5.0.45
lightgbm>=4.0.0
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
joblib>=1.3.0
```
#### [NEW] [LICENSE](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/LICENSE)
MIT License, author: Rembrant Oyangoren Albeos, year: 2026.
#### [NEW] [README.md](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/README.md)
Professional README with badges (Python, License, LightGBM), project description, features list, quick start, architecture overview, and configuration reference. No emojis.
#### [NEW] [GUIDE.md](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/GUIDE.md)
Step-by-step usage guide with tables for all commands, parameters, and expected outputs.
#### [NEW] [.gitignore](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/.gitignore)
Standard Python gitignore plus `data/`, `results/`, `models/`, `*.pkl`.
---
## Verification Plan
### Automated Tests
1. **Syntax validation** β€” run `python -m py_compile <file>` on every `.py` file to confirm no syntax errors
2. **Import validation** β€” run `python -c "import config; import features; import labeler; import model; import backtester; import metrics"` to confirm all modules load correctly
3. **Dry-run test** β€” run `python main.py --help` to confirm CLI is functional
### Manual Verification
1. **User runs `python main.py fetch`** with MT5 open and logged in, confirms data CSV is created in `data/`
2. **User runs `python main.py run`** for the full pipeline, reviews the metrics report output
3. **User inspects `results/report.txt`** for the performance metrics