File size: 9,263 Bytes
c382c53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# ML-3m-trader: XAUUSDc 3-Minute Timeframe ML Trading System

End-to-end machine learning pipeline for trading XAUUSDc (Gold) on the 3-minute timeframe. Uses MetaTrader 5 for data acquisition, LightGBM for classification, and a vectorized backtesting engine with realistic execution modeling.

## User Review Required

> [!IMPORTANT]
> **ML Framework Choice: LightGBM** β€” LightGBM is the best-suited framework for this task because:
> - Tabular classification (Buy/Sell/Hold/DoNothing) is LightGBM's strongest domain
> - Extremely fast training, even on CPU (i5-7200U will handle it fine)
> - Low memory footprint (well within 12 GB RAM)
> - No GPU required (your MX110 is not needed)
> - Outperforms deep learning on structured/tabular data in virtually all benchmarks
>
> **This will run entirely on your local machine. No Google Colab needed.**

> [!WARNING]
> **MetaTrader 5 Python API** only works on Windows (which you have). MT5 must be open and logged in when running the data fetch script. The `MetaTrader5` pip package handles communication.

> [!NOTE]
> **VIX Feature**: Since the CBOE VIX index is not directly available from MT5, the system will compute a **synthetic VIX proxy** using a rolling standard deviation of returns (realized volatility), which is the standard approach in non-US-equity trading systems. If you want the actual VIX, we would need a separate data source.

---

## Proposed Changes

### Project Structure

```
ML-3m-trader/
β”œβ”€β”€ config.py             # All configuration constants
β”œβ”€β”€ data_fetcher.py       # MT5 data acquisition
β”œβ”€β”€ features.py           # Technical indicator computation
β”œβ”€β”€ labeler.py            # Trade label generation (Buy/Sell/Hold/DoNothing)
β”œβ”€β”€ model.py              # LightGBM training, prediction, persistence
β”œβ”€β”€ backtester.py         # Vectorized backtesting engine
β”œβ”€β”€ metrics.py            # Performance evaluation
β”œβ”€β”€ main.py               # CLI entry point
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ GUIDE.md              # Step-by-step usage guide with tables
└── .gitignore
```

---

### Configuration

#### [NEW] [config.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/config.py)

Central configuration file containing all tunable parameters:
- `SYMBOL = "XAUUSDc"`, `TIMEFRAME = mt5.TIMEFRAME_M3`
- Feature list, lookback periods for SMA (14, 50), VROC (14), ADX (14), Momentum SI (10)
- Risk/reward ratio = 1.0, default bet percentage logic
- Slippage range (0–2 units), spread filter (`stoploss_size >= spread * 10`)
- Train/test split ratio, model hyperparameters
- Starting equity/balance

---

### Data Acquisition

#### [NEW] [data_fetcher.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/data_fetcher.py)

- Connects to MT5 terminal via `MetaTrader5` Python package
- Fetches 1-year of 3-minute OHLCV bars for XAUUSDc
- Returns a `pandas.DataFrame` with columns: `time, open, high, low, close, volume, spread`
- Saves raw data to `data/raw_xauusdc_3m.csv` for reproducibility
- Handles MT5 connection errors gracefully

---

### Feature Engineering

#### [NEW] [features.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/features.py)

Computes all required technical indicators using pure NumPy/Pandas (no TA-Lib dependency):

| Feature | Method |
|---------|--------|
| SMA | Simple Moving Average (14-period) |
| Double Moving Average | SMA(14) and SMA(50), plus crossover signal |
| VROC | Volume Rate of Change (14-period) |
| Synthetic VIX | Rolling std of log-returns (20-period) as volatility proxy |
| Momentum Strength Index | Custom momentum oscillator (10-period, 0–100 scale) |
| ADX | Average Directional Index (14-period) via Wilder's smoothing |
| Time features | Hour-of-day, minute-of-hour, day-of-week (cyclical encoded) |

All computations are vectorized with NumPy for maximum speed. NaN rows from lookback periods are dropped.

---

### Labeling Engine

#### [NEW] [labeler.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/labeler.py)

Generates ground-truth labels for supervised learning:

1. For each bar, compute a potential **Buy** and **Sell** trade:
   - **Buy**: entry at `close`, SL below recent swing low (ATR-based), TP = entry + (entry - SL) (1:1 RR)
   - **Sell**: entry at `close`, SL above recent swing high (ATR-based), TP = entry - (SL - entry) (1:1 RR)
2. Walk forward through subsequent bars to determine outcome (TP hit, SL hit, or neither within N bars)
3. Apply **spread filter**: if `SL_distance < spread * 10`, label = `DO_NOTHING`
4. Final labels: `BUY_WIN`, `BUY_LOSS`, `SELL_WIN`, `SELL_LOSS`, `HOLD`, `DO_NOTHING` β†’ simplified to 4-class: `BUY (1)`, `SELL (2)`, `HOLD (3)`, `DO_NOTHING (0)`
5. Only winning setups are labeled as BUY/SELL; losing setups become HOLD

---

### ML Model

#### [NEW] [model.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/model.py)

- **LightGBM** multi-class classifier (4 classes)
- Hyperparameters tuned for tabular financial data:
  - `num_leaves=63`, `max_depth=8`, `learning_rate=0.05`, `n_estimators=500`
  - `subsample=0.8`, `colsample_bytree=0.8`, `min_child_samples=20`
  - `class_weight='balanced'` to handle label imbalance
- Train/validation split: 80/20 chronological (no shuffle β€” time series)
- Feature importance output
- Model persistence via `joblib` (save/load `.pkl`)
- Early stopping on validation set

---

### Backtesting Engine

#### [NEW] [backtester.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/backtester.py)

Vectorized backtesting with realistic execution:

- Takes model predictions and raw price data
- **Position sizing**: bet % of current balance, accounting for full SL distance
  - `lot_value = balance * bet_pct / sl_distance`
- **Random slippage**: uniform 0–2 XAUUSDc units applied to entry price
- **Spread filter**: skip trade if `sl_distance < spread * 10`
- **1:1 Risk-Reward**: TP distance = SL distance
- Walk forward bar-by-bar on test set, track equity curve
- No trade limit β€” takes every valid signal
- Records all trades with entry/exit prices, PnL, timestamps

---

### Metrics & Evaluation

#### [NEW] [metrics.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/metrics.py)

| Metric | Description |
|--------|-------------|
| Win Rate | % of trades closed at TP |
| Average Win % | Mean profit per winning trade as % of balance |
| Average Loss % | Mean loss per losing trade as % of balance |
| Sharpe Ratio | Annualized risk-adjusted return |
| Sortino Ratio | Downside-risk-adjusted return |
| Max Drawdown | Largest peak-to-trough equity decline |
| Profit Factor | Gross profit / Gross loss |
| Start Equity | Initial balance |
| End Equity | Final balance after all trades |
| Total Trades | Number of executed trades |
| Avg Trade Duration | Mean holding time in bars/minutes |
| Daily PnL Stats | Intraday mean, std, min, max PnL |
| Calmar Ratio | Annualized return / Max Drawdown |
| Expectancy | Average PnL per trade |

Outputs a formatted console report and saves to `results/report.txt`.

---

### CLI Entry Point

#### [NEW] [main.py](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/main.py)

Unified CLI with subcommands:

```
python main.py fetch       # Fetch 1-year data from MT5
python main.py train       # Engineer features, label, train model
python main.py backtest    # Run backtest on test set
python main.py evaluate    # Print metrics report
python main.py run         # Full pipeline: fetch β†’ train β†’ backtest β†’ evaluate
```

Uses `argparse` with clear help text.

---

### Project Files

#### [NEW] [requirements.txt](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/requirements.txt)

```
MetaTrader5>=5.0.45
lightgbm>=4.0.0
pandas>=2.0.0
numpy>=1.24.0
scikit-learn>=1.3.0
joblib>=1.3.0
```

#### [NEW] [LICENSE](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/LICENSE)

MIT License, author: Rembrant Oyangoren Albeos, year: 2026.

#### [NEW] [README.md](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/README.md)

Professional README with badges (Python, License, LightGBM), project description, features list, quick start, architecture overview, and configuration reference. No emojis.

#### [NEW] [GUIDE.md](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/GUIDE.md)

Step-by-step usage guide with tables for all commands, parameters, and expected outputs.

#### [NEW] [.gitignore](file:///c:/Users/User/Desktop/debugrem/ML-3m-trader/.gitignore)

Standard Python gitignore plus `data/`, `results/`, `models/`, `*.pkl`.

---

## Verification Plan

### Automated Tests

1. **Syntax validation** β€” run `python -m py_compile <file>` on every `.py` file to confirm no syntax errors
2. **Import validation** β€” run `python -c "import config; import features; import labeler; import model; import backtester; import metrics"` to confirm all modules load correctly
3. **Dry-run test** β€” run `python main.py --help` to confirm CLI is functional

### Manual Verification

1. **User runs `python main.py fetch`** with MT5 open and logged in, confirms data CSV is created in `data/`
2. **User runs `python main.py run`** for the full pipeline, reviews the metrics report output
3. **User inspects `results/report.txt`** for the performance metrics