Upload 11 files

c382c53 verified 9 days ago

9.26 kB

ML-3m-trader: XAUUSDc 3-Minute Timeframe ML Trading System

End-to-end machine learning pipeline for trading XAUUSDc (Gold) on the 3-minute timeframe. Uses MetaTrader 5 for data acquisition, LightGBM for classification, and a vectorized backtesting engine with realistic execution modeling.

User Review Required

ML Framework Choice: LightGBM — LightGBM is the best-suited framework for this task because:

Tabular classification (Buy/Sell/Hold/DoNothing) is LightGBM's strongest domain

Extremely fast training, even on CPU (i5-7200U will handle it fine)

Low memory footprint (well within 12 GB RAM)

No GPU required (your MX110 is not needed)

Outperforms deep learning on structured/tabular data in virtually all benchmarks

This will run entirely on your local machine. No Google Colab needed.

MetaTrader 5 Python API only works on Windows (which you have). MT5 must be open and logged in when running the data fetch script. The MetaTrader5 pip package handles communication.

VIX Feature: Since the CBOE VIX index is not directly available from MT5, the system will compute a synthetic VIX proxy using a rolling standard deviation of returns (realized volatility), which is the standard approach in non-US-equity trading systems. If you want the actual VIX, we would need a separate data source.

Proposed Changes

Project Structure

ML-3m-trader/
├── config.py             # All configuration constants
├── data_fetcher.py       # MT5 data acquisition
├── features.py           # Technical indicator computation
├── labeler.py            # Trade label generation (Buy/Sell/Hold/DoNothing)
├── model.py              # LightGBM training, prediction, persistence
├── backtester.py         # Vectorized backtesting engine
├── metrics.py            # Performance evaluation
├── main.py               # CLI entry point
├── requirements.txt
├── LICENSE
├── README.md
├── GUIDE.md              # Step-by-step usage guide with tables
└── .gitignore

Configuration

[NEW] config.py

Central configuration file containing all tunable parameters:

SYMBOL = "XAUUSDc", TIMEFRAME = mt5.TIMEFRAME_M3
Feature list, lookback periods for SMA (14, 50), VROC (14), ADX (14), Momentum SI (10)
Risk/reward ratio = 1.0, default bet percentage logic
Slippage range (0–2 units), spread filter (stoploss_size >= spread * 10)
Train/test split ratio, model hyperparameters
Starting equity/balance

Data Acquisition

[NEW] data_fetcher.py

Connects to MT5 terminal via MetaTrader5 Python package
Fetches 1-year of 3-minute OHLCV bars for XAUUSDc
Returns a pandas.DataFrame with columns: time, open, high, low, close, volume, spread
Saves raw data to data/raw_xauusdc_3m.csv for reproducibility
Handles MT5 connection errors gracefully

Feature Engineering

[NEW] features.py

Computes all required technical indicators using pure NumPy/Pandas (no TA-Lib dependency):

Feature	Method
SMA	Simple Moving Average (14-period)
Double Moving Average	SMA(14) and SMA(50), plus crossover signal
VROC	Volume Rate of Change (14-period)
Synthetic VIX	Rolling std of log-returns (20-period) as volatility proxy
Momentum Strength Index	Custom momentum oscillator (10-period, 0–100 scale)
ADX	Average Directional Index (14-period) via Wilder's smoothing
Time features	Hour-of-day, minute-of-hour, day-of-week (cyclical encoded)

All computations are vectorized with NumPy for maximum speed. NaN rows from lookback periods are dropped.

Labeling Engine

[NEW] labeler.py

Generates ground-truth labels for supervised learning:

For each bar, compute a potential Buy and Sell trade:
- Buy: entry at close, SL below recent swing low (ATR-based), TP = entry + (entry - SL) (1:1 RR)
- Sell: entry at close, SL above recent swing high (ATR-based), TP = entry - (SL - entry) (1:1 RR)
Walk forward through subsequent bars to determine outcome (TP hit, SL hit, or neither within N bars)
Apply spread filter: if SL_distance < spread * 10, label = DO_NOTHING
Final labels: BUY_WIN, BUY_LOSS, SELL_WIN, SELL_LOSS, HOLD, DO_NOTHING → simplified to 4-class: BUY (1), SELL (2), HOLD (3), DO_NOTHING (0)
Only winning setups are labeled as BUY/SELL; losing setups become HOLD

ML Model

[NEW] model.py

LightGBM multi-class classifier (4 classes)
Hyperparameters tuned for tabular financial data:
- num_leaves=63, max_depth=8, learning_rate=0.05, n_estimators=500
- subsample=0.8, colsample_bytree=0.8, min_child_samples=20
- class_weight='balanced' to handle label imbalance
Train/validation split: 80/20 chronological (no shuffle — time series)
Feature importance output
Model persistence via joblib (save/load .pkl)
Early stopping on validation set

Backtesting Engine

[NEW] backtester.py

Vectorized backtesting with realistic execution:

Takes model predictions and raw price data
Position sizing: bet % of current balance, accounting for full SL distance
- lot_value = balance * bet_pct / sl_distance
Random slippage: uniform 0–2 XAUUSDc units applied to entry price
Spread filter: skip trade if sl_distance < spread * 10
1:1 Risk-Reward: TP distance = SL distance
Walk forward bar-by-bar on test set, track equity curve
No trade limit — takes every valid signal
Records all trades with entry/exit prices, PnL, timestamps

Metrics & Evaluation

[NEW] metrics.py

Metric	Description
Win Rate	% of trades closed at TP
Average Win %	Mean profit per winning trade as % of balance
Average Loss %	Mean loss per losing trade as % of balance
Sharpe Ratio	Annualized risk-adjusted return
Sortino Ratio	Downside-risk-adjusted return
Max Drawdown	Largest peak-to-trough equity decline
Profit Factor	Gross profit / Gross loss
Start Equity	Initial balance
End Equity	Final balance after all trades
Total Trades	Number of executed trades
Avg Trade Duration	Mean holding time in bars/minutes
Daily PnL Stats	Intraday mean, std, min, max PnL
Calmar Ratio	Annualized return / Max Drawdown
Expectancy	Average PnL per trade

Outputs a formatted console report and saves to results/report.txt.

CLI Entry Point

[NEW] main.py

Unified CLI with subcommands:

python main.py fetch       # Fetch 1-year data from MT5
python main.py train       # Engineer features, label, train model
python main.py backtest    # Run backtest on test set
python main.py evaluate    # Print metrics report
python main.py run         # Full pipeline: fetch → train → backtest → evaluate

Uses argparse with clear help text.

Project Files

[NEW] requirements.txt

MetaTrader5>=5.0.45
lightgbm>=4.0.0
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
joblib>=1.3.0

[NEW] LICENSE

MIT License, author: Rembrant Oyangoren Albeos, year: 2026.

[NEW] README.md

Professional README with badges (Python, License, LightGBM), project description, features list, quick start, architecture overview, and configuration reference. No emojis.

[NEW] GUIDE.md

Step-by-step usage guide with tables for all commands, parameters, and expected outputs.

[NEW] .gitignore

Standard Python gitignore plus data/, results/, models/, *.pkl.

Verification Plan

Automated Tests

Syntax validation — run python -m py_compile <file> on every .py file to confirm no syntax errors
Import validation — run python -c "import config; import features; import labeler; import model; import backtester; import metrics" to confirm all modules load correctly
Dry-run test — run python main.py --help to confirm CLI is functional

Manual Verification

User runs python main.py fetch with MT5 open and logged in, confirms data CSV is created in data/
User runs python main.py run for the full pipeline, reviews the metrics report output
User inspects results/report.txt for the performance metrics