ML-3m-trader / sractch.md
algorembrant's picture
Upload 11 files
c382c53 verified

ML-3m-trader: XAUUSDc 3-Minute Timeframe ML Trading System

End-to-end machine learning pipeline for trading XAUUSDc (Gold) on the 3-minute timeframe. Uses MetaTrader 5 for data acquisition, LightGBM for classification, and a vectorized backtesting engine with realistic execution modeling.

User Review Required

ML Framework Choice: LightGBM β€” LightGBM is the best-suited framework for this task because:

  • Tabular classification (Buy/Sell/Hold/DoNothing) is LightGBM's strongest domain
  • Extremely fast training, even on CPU (i5-7200U will handle it fine)
  • Low memory footprint (well within 12 GB RAM)
  • No GPU required (your MX110 is not needed)
  • Outperforms deep learning on structured/tabular data in virtually all benchmarks

This will run entirely on your local machine. No Google Colab needed.

MetaTrader 5 Python API only works on Windows (which you have). MT5 must be open and logged in when running the data fetch script. The MetaTrader5 pip package handles communication.

VIX Feature: Since the CBOE VIX index is not directly available from MT5, the system will compute a synthetic VIX proxy using a rolling standard deviation of returns (realized volatility), which is the standard approach in non-US-equity trading systems. If you want the actual VIX, we would need a separate data source.


Proposed Changes

Project Structure

ML-3m-trader/
β”œβ”€β”€ config.py             # All configuration constants
β”œβ”€β”€ data_fetcher.py       # MT5 data acquisition
β”œβ”€β”€ features.py           # Technical indicator computation
β”œβ”€β”€ labeler.py            # Trade label generation (Buy/Sell/Hold/DoNothing)
β”œβ”€β”€ model.py              # LightGBM training, prediction, persistence
β”œβ”€β”€ backtester.py         # Vectorized backtesting engine
β”œβ”€β”€ metrics.py            # Performance evaluation
β”œβ”€β”€ main.py               # CLI entry point
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ GUIDE.md              # Step-by-step usage guide with tables
└── .gitignore

Configuration

[NEW] config.py

Central configuration file containing all tunable parameters:

  • SYMBOL = "XAUUSDc", TIMEFRAME = mt5.TIMEFRAME_M3
  • Feature list, lookback periods for SMA (14, 50), VROC (14), ADX (14), Momentum SI (10)
  • Risk/reward ratio = 1.0, default bet percentage logic
  • Slippage range (0–2 units), spread filter (stoploss_size >= spread * 10)
  • Train/test split ratio, model hyperparameters
  • Starting equity/balance

Data Acquisition

[NEW] data_fetcher.py

  • Connects to MT5 terminal via MetaTrader5 Python package
  • Fetches 1-year of 3-minute OHLCV bars for XAUUSDc
  • Returns a pandas.DataFrame with columns: time, open, high, low, close, volume, spread
  • Saves raw data to data/raw_xauusdc_3m.csv for reproducibility
  • Handles MT5 connection errors gracefully

Feature Engineering

[NEW] features.py

Computes all required technical indicators using pure NumPy/Pandas (no TA-Lib dependency):

Feature Method
SMA Simple Moving Average (14-period)
Double Moving Average SMA(14) and SMA(50), plus crossover signal
VROC Volume Rate of Change (14-period)
Synthetic VIX Rolling std of log-returns (20-period) as volatility proxy
Momentum Strength Index Custom momentum oscillator (10-period, 0–100 scale)
ADX Average Directional Index (14-period) via Wilder's smoothing
Time features Hour-of-day, minute-of-hour, day-of-week (cyclical encoded)

All computations are vectorized with NumPy for maximum speed. NaN rows from lookback periods are dropped.


Labeling Engine

[NEW] labeler.py

Generates ground-truth labels for supervised learning:

  1. For each bar, compute a potential Buy and Sell trade:
    • Buy: entry at close, SL below recent swing low (ATR-based), TP = entry + (entry - SL) (1:1 RR)
    • Sell: entry at close, SL above recent swing high (ATR-based), TP = entry - (SL - entry) (1:1 RR)
  2. Walk forward through subsequent bars to determine outcome (TP hit, SL hit, or neither within N bars)
  3. Apply spread filter: if SL_distance < spread * 10, label = DO_NOTHING
  4. Final labels: BUY_WIN, BUY_LOSS, SELL_WIN, SELL_LOSS, HOLD, DO_NOTHING β†’ simplified to 4-class: BUY (1), SELL (2), HOLD (3), DO_NOTHING (0)
  5. Only winning setups are labeled as BUY/SELL; losing setups become HOLD

ML Model

[NEW] model.py

  • LightGBM multi-class classifier (4 classes)
  • Hyperparameters tuned for tabular financial data:
    • num_leaves=63, max_depth=8, learning_rate=0.05, n_estimators=500
    • subsample=0.8, colsample_bytree=0.8, min_child_samples=20
    • class_weight='balanced' to handle label imbalance
  • Train/validation split: 80/20 chronological (no shuffle β€” time series)
  • Feature importance output
  • Model persistence via joblib (save/load .pkl)
  • Early stopping on validation set

Backtesting Engine

[NEW] backtester.py

Vectorized backtesting with realistic execution:

  • Takes model predictions and raw price data
  • Position sizing: bet % of current balance, accounting for full SL distance
    • lot_value = balance * bet_pct / sl_distance
  • Random slippage: uniform 0–2 XAUUSDc units applied to entry price
  • Spread filter: skip trade if sl_distance < spread * 10
  • 1:1 Risk-Reward: TP distance = SL distance
  • Walk forward bar-by-bar on test set, track equity curve
  • No trade limit β€” takes every valid signal
  • Records all trades with entry/exit prices, PnL, timestamps

Metrics & Evaluation

[NEW] metrics.py

Metric Description
Win Rate % of trades closed at TP
Average Win % Mean profit per winning trade as % of balance
Average Loss % Mean loss per losing trade as % of balance
Sharpe Ratio Annualized risk-adjusted return
Sortino Ratio Downside-risk-adjusted return
Max Drawdown Largest peak-to-trough equity decline
Profit Factor Gross profit / Gross loss
Start Equity Initial balance
End Equity Final balance after all trades
Total Trades Number of executed trades
Avg Trade Duration Mean holding time in bars/minutes
Daily PnL Stats Intraday mean, std, min, max PnL
Calmar Ratio Annualized return / Max Drawdown
Expectancy Average PnL per trade

Outputs a formatted console report and saves to results/report.txt.


CLI Entry Point

[NEW] main.py

Unified CLI with subcommands:

python main.py fetch       # Fetch 1-year data from MT5
python main.py train       # Engineer features, label, train model
python main.py backtest    # Run backtest on test set
python main.py evaluate    # Print metrics report
python main.py run         # Full pipeline: fetch β†’ train β†’ backtest β†’ evaluate

Uses argparse with clear help text.


Project Files

[NEW] requirements.txt

MetaTrader5>=5.0.45
lightgbm>=4.0.0
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
joblib>=1.3.0

[NEW] LICENSE

MIT License, author: Rembrant Oyangoren Albeos, year: 2026.

[NEW] README.md

Professional README with badges (Python, License, LightGBM), project description, features list, quick start, architecture overview, and configuration reference. No emojis.

[NEW] GUIDE.md

Step-by-step usage guide with tables for all commands, parameters, and expected outputs.

[NEW] .gitignore

Standard Python gitignore plus data/, results/, models/, *.pkl.


Verification Plan

Automated Tests

  1. Syntax validation β€” run python -m py_compile <file> on every .py file to confirm no syntax errors
  2. Import validation β€” run python -c "import config; import features; import labeler; import model; import backtester; import metrics" to confirm all modules load correctly
  3. Dry-run test β€” run python main.py --help to confirm CLI is functional

Manual Verification

  1. User runs python main.py fetch with MT5 open and logged in, confirms data CSV is created in data/
  2. User runs python main.py run for the full pipeline, reviews the metrics report output
  3. User inspects results/report.txt for the performance metrics