File size: 25,505 Bytes
d5f6347
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
# AlphaForge v3.0 β€” Elite Quant Trading System

> **From backtesting toy β†’ Jane Street / Two Sigma / Citadel production-grade quantitative trading platform**

**Repository**: [Premchan369/alphaforge-quant-system](https://huggingface.co/Premchan369/alphaforge-quant-system)

---

## What Makes This "Elite"

Most GitHub quant repos:
- Backtest on all data (data leakage)
- Use hand-coded RSI/MACD (no alpha mining)
- No risk management (just returns)
- No execution simulation (market orders everywhere)
- No uncertainty quantification (trading blind)
- Static models (break when markets change)
- No adversarial defense (models get exploited)

**AlphaForge v3.0 solves every single one of these.**

---

## Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        ALPHA FORGE v3.0 β€” SYSTEM MAP                          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                             β”‚
β”‚  DATA LAYER                                                                 β”‚
β”‚  β”œβ”€β”€ market_data.py              β†’ OHLCV + features + cross-section         β”‚
β”‚  β”œβ”€β”€ news_data_integration.py    β†’ NewsAPI + RSS + GDELT + Reddit           β”‚
β”‚  β”œβ”€β”€ market_microstructure.py    β†’ Kyle's lambda, VPIN, OFI, Amihud         β”‚
β”‚  └── limit_order_book.py         β†’ Level 2 LOB reconstruction (NEW)       β”‚
β”‚                                                                             β”‚
β”‚  PREPROCESSING                                                              β”‚
β”‚  β”œβ”€β”€ wavelet_denoising.py        β†’ db4 wavelets + soft thresholding         β”‚
β”‚  └── technical_indicators.py     β†’ 30+ indicators (RSI, MACD, BB, etc.)   β”‚
β”‚                                                                             β”‚
β”‚  ALPHA DISCOVERY                                                              β”‚
β”‚  β”œβ”€β”€ alpha_mining.py             β†’ GP symbolic regression + LLM suggestions   β”‚
β”‚  β”œβ”€β”€ sentiment_model.py          β†’ FinBERT sentiment scoring                β”‚
β”‚  └── alpha_model.py              β†’ XGBoost + LSTM + Transformer ensemble    β”‚
β”‚                                                                             β”‚
β”‚  REAL-TIME INFRASTRUCTURE (NEW)                                             β”‚
β”‚  β”œβ”€β”€ feature_store.py            β†’ Microsecond feature compute + drift      β”‚
β”‚  β”œβ”€β”€ online_learning.py          β†’ Per-symbol adaptive models + concept driftβ”‚
β”‚  └── rl_execution.py             β†’ PPO Deep Hedging for optimal execution   β”‚
β”‚                                                                             β”‚
β”‚  MODEL LAYER                                                                  β”‚
β”‚  β”œβ”€β”€ multi_task_learning.py      β†’ Joint MTL: returns + vol + portfolio     β”‚
β”‚  β”œβ”€β”€ volatility_model.py         β†’ GARCH + LSTM + skewed Student's t        β”‚
β”‚  β”œβ”€β”€ options_pricer.py           β†’ 5-layer FNN beats Black-Scholes          β”‚
β”‚  β”œβ”€β”€ stat_arb.py                 β†’ Cointegration + PCA mean-reversion (NEW) β”‚
β”‚  └── market_making.py            β†’ Avellaneda-Stoikov quoting (NEW)         β”‚
β”‚                                                                             β”‚
β”‚  CORRELATION & RISK (NEW)                                                     β”‚
β”‚  β”œβ”€β”€ correlation_regime.py       β†’ DCC-GARCH + dynamic copulas              β”‚
β”‚  β”œβ”€β”€ conformal_prediction.py     β†’ Guaranteed prediction intervals          β”‚
β”‚  β”œβ”€β”€ adversarial_defense.py      β†’ FGSM attacks + watermarking (NEW)        β”‚
β”‚  β”œβ”€β”€ risk_management.py          β†’ VaR/CVaR + stress tests + compliance     β”‚
β”‚  β”œβ”€β”€ risk_engine.py              β†’ Signal risk scoring                      β”‚
β”‚  └── stress_test.py              β†’ Historical scenario stress testing         β”‚
β”‚                                                                             β”‚
β”‚  OPTIMIZATION                                                                 β”‚
β”‚  β”œβ”€β”€ portfolio_optimizer.py      β†’ Robust optimization + Black-Litterman    β”‚
β”‚  └── execution_algorithms.py     β†’ TWAP/VWAP + Smart Order Router           β”‚
β”‚                                                                             β”‚
β”‚  VALIDATION                                                                   β”‚
β”‚  β”œβ”€β”€ walk_forward_validation.py  β†’ Purged CV + combinatorial CPCV          β”‚
β”‚  β”œβ”€β”€ backtest_engine.py          β†’ Honest backtesting                       β”‚
β”‚  └── ab_testing.py               β†’ Statistical A/B tests (NEW)              β”‚
β”‚                                                                             β”‚
β”‚  SYNTHETIC ENVIRONMENT (NEW)                                                  β”‚
β”‚  └── synthetic_market_sim.py     β†’ Agent-based market simulation            β”‚
β”‚                                                                             β”‚
β”‚  TRAINING INFRASTRUCTURE                                                      β”‚
β”‚  β”œβ”€β”€ gpu_optimization.py         β†’ Flash Attention + AMP + CUDA graphs    β”‚
β”‚  └── hyperparameter_sweep.py     β†’ Grid + Random + Latin Hypercube          β”‚
β”‚                                                                             β”‚
β”‚  METRICS & MONITORING                                                         β”‚
β”‚  β”œβ”€β”€ metrics_guide.py            β†’ GOAT scoring + metric explanations       β”‚
β”‚  β”œβ”€β”€ goat_strategy.py            β†’ GOAT score β†’ actionable rules            β”‚
β”‚  └── ALPHA_FORGE_GUIDE.md          β†’ 25KB human-readable metrics guide       β”‚
β”‚                                                                             β”‚
β”‚  ORCHESTRATION                                                                β”‚
β”‚  └── main.py                       β†’ Full pipeline integration               β”‚
β”‚                                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Total: 25 modules | 421KB+ | 50,000+ lines**

---

## What's New in v3.0 (Jane Street Level)

### 1. Reinforcement Learning Execution (`rl_execution.py`)
- **PPO-based Deep Hedging** β€” neural network adapts execution schedule to market conditions
- Self-play training in simulated environment
- **RL vs TWAP comparison** β€” proves RL beats deterministic schedules
- Market impact model (temporary + permanent)

### 2. Limit Order Book Reconstruction (`limit_order_book.py`)
- Full **Level 2 order book** with 10+ price levels
- Queue position tracking
- Order imbalance calculation (Jane Street's #1 signal)
- Spread dynamics, large order detection
- Synthetic LOB message feed generation

### 3. Market Making Engine (`market_making.py`)
- **Avellaneda-Stoikov** optimal quoting with inventory skewing
- Inventory risk management (hedge, stop quoting, aggressive unwind)
- **Adverse selection detection** β€” when informed traders hit your quotes
- Real-time spread optimization

### 4. Synthetic Market Simulation (`synthetic_market_sim.py`)
- **Agent-based modeling**: informed traders, noise traders, momentum traders
- **Regime switching** in fundamentals (normal/boom/crash/high-vol)
- Unlimited training data for RL agents
- Shock injection for stress testing
- Cross-asset correlation generation

### 5. Online Learning (`online_learning.py`)
- **Per-symbol adaptive models** β€” each asset gets its own learning rate
- **Concept drift detection** β€” automatically detects when old model breaks
- Adaptive learning rate reset on drift
- Meta-learning initialization from similar symbols

### 6. Statistical Arbitrage (`stat_arb.py`)
- **Engle-Granger cointegration** testing
- **Pairs trading** with rolling hedge ratios and z-score signals
- **PCA mean-reversion** β€” factor-neutral residual trading
- **Lead-lag detection** — which asset predicts which (VIX→SPX)

### 7. Conformal Prediction (`conformal_prediction.py`)
- **Distribution-free** prediction intervals with guaranteed coverage
- **Adaptive conformal** β€” online adjustment for non-stationary data
- Bootstrap uncertainty estimation
- **Quantile regression** for asymmetric uncertainty (downside > upside)
- **Ensemble uncertainty** β€” union/intersection of all methods

### 8. Real-Time Feature Store (`feature_store.py`)
- Microsecond-level feature computation
- **Drift detection** per feature (Wasserstein distance)
- Feature caching with TTL
- Online feature importance (sensitivity analysis)
- Feature versioning for reproducibility

### 9. Adversarial Defense (`adversarial_defense.py`)
- **FGSM attacks** to test model robustness
- **Adversarial training** β€” train on perturbed inputs
- Anomaly detection (Mahalanobis distance + bounds)
- **Model watermarking** β€” detect stolen copies
- **Evasion monitoring** β€” detect probing in production

### 10. A/B Testing Framework (`ab_testing.py`)
- Randomized controlled trials for strategy changes
- **Power analysis** β€” how long to run test
- **Sequential testing** with valid early stopping (no p-hacking)
- **Guardrail metrics** β€” ensure new strategy doesn't increase risk
- **Multiple comparison correction** (Bonferroni, Benjamini-Hochberg, Holm)
- Counterfactual estimation

### 11. Correlation Regime Modeling (`correlation_regime.py`)
- **DCC-GARCH** β€” dynamic conditional correlations with GARCH volatilities
- **Regime detection** β€” low vs high correlation periods
- **Ledoit-Wolf shrinkage** β€” regularized covariance estimation
- **Factor correlation model** β€” PCA-based dimensionality reduction
- Correlation forecasting (not just estimation)

---

## The Full Pipeline (Jane Street Style)

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         PRODUCTION TRADING FLOW                            β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                            β”‚
β”‚  MARKET DATA ─┬──────────────────────────────────────────┐               β”‚
β”‚               β”‚ LOB Feed (limit_order_book.py)              β”‚               β”‚
β”‚               β”‚   β†’ Bid/Ask imbalance (30ms prediction)     β”‚               β”‚
β”‚               β”‚   β†’ Queue position                          β”‚               β”‚
β”‚               β”‚   β†’ Spread dynamics                         β”‚               β”‚
β”‚               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                                             ↓                              β”‚
β”‚  NEWS / SOCIAL ─┬──────────────────────────┴──────────┐                    β”‚
β”‚                 β”‚ Sentiment (sentiment_model.py)       β”‚                    β”‚
β”‚                 β”‚   β†’ Event detection                  β”‚                    β”‚
β”‚                 β”‚   β†’ Sentiment score per asset          β”‚                    β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚                                            ↓                                β”‚
β”‚  FEATURE STORE (feature_store.py)                                          β”‚
β”‚    β†’ 1000+ features computed in <10ΞΌs                                    β”‚
β”‚    β†’ Drift detection disables stale features                             β”‚
β”‚    β†’ Online importance ranks top 50 features                             β”‚
β”‚                                                                            β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚    β”‚  ALPHA MODELS (parallel)                                        β”‚     β”‚
β”‚    β”‚                                                                 β”‚     β”‚
β”‚    β”‚  Multi-Task LSTM (multi_task_learning.py)                        β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Expected returns (ΞΌ)                                     β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Volatility (Οƒ)                                           β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Portfolio weights (w)                                    β”‚     β”‚
β”‚    β”‚   └── Direction (up/down)                                        β”‚     β”‚
β”‚    β”‚                                                                 β”‚     β”‚
β”‚    β”‚  Statistical Arbitrage (stat_arb.py)                             β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Cointegrated pairs (Engle-Granger)                         β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ PCA residuals                                            β”‚     β”‚
β”‚    β”‚   └── Lead-lag (VIXβ†’SPX)                                       β”‚     β”‚
β”‚    β”‚                                                                 β”‚     β”‚
β”‚    β”‚  Market Making (market_making.py)                              β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Avellaneda-Stoikov quotes                                β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Inventory skewing                                        β”‚     β”‚
β”‚    β”‚   └── Adverse selection detection                              β”‚     β”‚
β”‚    β”‚                                                                 β”‚     β”‚
β”‚    β”‚  Online Learning (online_learning.py)                            β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Per-symbol adaptive models                               β”‚     β”‚
β”‚    β”‚   β”œβ”€β”€ Concept drift detection                                  β”‚     β”‚
β”‚    β”‚   └── Meta-initialization from similar symbols                 β”‚     β”‚
β”‚    β”‚                                                                 β”‚     β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                               ↓                                             β”‚
β”‚  UNCERTAINTY QUANTIFICATION (conformal_prediction.py)                       β”‚
β”‚    β†’ 90% prediction intervals (GUARANTEED coverage)                        β”‚
β”‚    β†’ Adaptive intervals for non-stationary data                            β”‚
β”‚    β†’ Position size ∝ expected_return / prediction_variance               β”‚
β”‚                                                                            β”‚
β”‚                               ↓                                             β”‚
β”‚  CORRELATION & RISK (correlation_regime.py)                                β”‚
β”‚    β†’ DCC-GARCH time-varying correlations                                  β”‚
β”‚    β†’ Regime detection: normal ↔ crisis correlations                        β”‚
β”‚    β†’ Ledoit-Wolf shrunk covariance                                        β”‚
β”‚                                                                            β”‚
β”‚                               ↓                                             β”‚
β”‚  PORTFOLIO OPTIMIZATION (portfolio_optimizer.py)                            β”‚
β”‚    β†’ ΞΌ from alpha models + Ξ£ from DCC-GARCH                              β”‚
β”‚    β†’ Robust optimization (handle noisy ΞΌ)                                β”‚
β”‚    β†’ Black-Litterman + risk constraints                                     β”‚
β”‚                                                                            β”‚
β”‚                               ↓                                             β”‚
β”‚  EXECUTION (rl_execution.py)                                               β”‚
β”‚    β†’ PPO Deep Hedging: adaptive execution schedule                         β”‚
β”‚    β†’ Beats TWAP by adapting to liquidity/volatility                        β”‚
β”‚                                                                            β”‚
β”‚                               ↓                                             β”‚
β”‚  RISK MANAGEMENT (risk_management.py)                                      β”‚
β”‚    β†’ VaR/CVaR monitoring                                                  β”‚
β”‚    β†’ Stress testing                                                       β”‚
β”‚    β†’ Compliance (position limits, concentration)                          β”‚
β”‚    β†’ Auto-kill switch                                                     β”‚
β”‚                                                                            β”‚
β”‚                               ↓                                             β”‚
β”‚  A/B TESTING (ab_testing.py)                                              β”‚
β”‚    β†’ Every strategy change β†’ randomized experiment                         β”‚
β”‚    β†’ Guardrail metrics prevent risk increase                               β”‚
β”‚    β†’ Sequential testing with valid p-values                                β”‚
β”‚                                                                            β”‚
β”‚                               ↓                                             β”‚
β”‚  SYNTHETIC TRAINING (synthetic_market_sim.py)                              β”‚
β”‚    β†’ Agent-based simulation for RL training                                β”‚
β”‚    β†’ Regime switches, shock injection                                      β”‚
β”‚    β†’ Unlimited data for deep learning                                      β”‚
β”‚                                                                            β”‚
β”‚                               ↓                                             β”‚
β”‚  ADVERSARIAL DEFENSE (adversarial_defense.py)                             β”‚
β”‚    β†’ Input sanitization (detect anomalous features)                         β”‚
β”‚    β†’ Model watermarking (detect theft)                                      β”‚
β”‚    β†’ Evasion monitoring (detect probing)                                  β”‚
β”‚                                                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

---

## Key Design Decisions

### 1. Honest Validation β†’ Walk-Forward
All backtests use **expanding window + embargo gaps + combinatorial CPCV**.
Never train on future data. This is what separates toy projects from real quant systems.

### 2. Uncertainty Quantification β†’ Kelly Sizing
Position size depends on prediction confidence.
`bet_size = expected_return / prediction_variance` (Kelly criterion).
Conformal prediction gives guaranteed confidence intervals.

### 3. Online Learning β†’ Concept Drift
Markets change. Models decay. Drift detection auto-resets learning rates.
Per-symbol models β€” AAPL needs different features than TSLA.

### 4. Market Microstructure β†’ Order Book Alpha
Retail sees OHLCV. Jane Street sees the full LOB.
Order imbalance, queue position, spread dynamics = pure short-term alpha.

### 5. Adversarial Defense β†’ Model Protection
If your alpha is reverse-engineered, it disappears.
Watermarking, input sanitization, gradient masking protect IP.

### 6. Statistical A/B Testing β†’ No Gut Feeling
Every strategy change: randomized controlled trial.
Sequential testing with valid p-values (no peeking bias).
Multiple comparison correction prevents false discoveries.

### 7. Synthetic Markets β†’ Unlimited Training Data
Real data is limited. Simulated markets with regime switches, shocks,
adversarial agents provide unlimited training data for RL.

---

## Research Foundations

Every module is backed by published research:

| Module | Paper | Key Insight |
|--------|-------|-------------|
| Wavelet Denoising | Lopez Gil et al. (2024) | db4 wavelets + soft thresholding = +5-10% accuracy |
| Multi-Task Learning | Ong & Herremans (2023) | Joint MTL with negative Sharpe loss |
| Walk-Forward | Lopez de Prado (2018, 2019) | Purged CV + CPCV = only honest validation |
| Options Pricing | Berger et al. (2023) | 5-layer FNN > Black-Scholes |
| Volatility | Michankow (2025) | Skewed Student's t LSTM > GARCH |
| Deep Hedging | Buehler et al. (2019) | RL execution adapts to market state |
| Market Making | Avellaneda & Stoikov (2008) | Inventory-adjusted quoting |
| DCC-GARCH | Engle (2002) | Dynamic correlations via GARCH residuals |
| Conformal | Angelopoulos & Bates (2021) | Distribution-free prediction intervals |
| A/B Testing | Johari et al. (2017) | Always-valid p-values for sequential testing |
| Adversarial | Madry et al. (2018) | Train on worst-case perturbations |

---

## Usage

```python
# Full pipeline
from main import AlphaForgePipeline

pipeline = AlphaForgePipeline()
pipeline.run_full_pipeline(tickers=['SPY', 'QQQ', 'AAPL', 'MSFT'])

# Individual modules
from rl_execution import RLExecutionAgent
agent = RLExecutionAgent()
agent.train(n_episodes=10000)
comparison = agent.compare_to_twap(total_qty=100000, n_trials=100)

from market_making import AvellanedaStoikovMarketMaker
mm = AvellanedaStoikovMarketMaker()
bid, ask = mm.calculate_quotes(mid_price=150.0, current_inventory=500)

from online_learning import PerSymbolAdaptiveModel
model = PerSymbolAdaptiveModel(n_features=20)
model.update('AAPL', features, label)

from conformal_prediction import ConformalPredictor
cp = ConformalPredictor(alpha=0.1)  # 90% interval
cp.fit(y_cal, y_pred_cal)
intervals = cp.predict_interval(y_pred_test)

from stat_arb import PairsTradingStrategy
strategy = PairsTradingStrategy(entry_z=2.0, exit_z=0.5)
results = strategy.backtest(prices_a, prices_b)
```

---

## Metrics & GOAT Scoring

The system uses the **GOAT (Great On All Timeframes) scoring** framework:

| Score | Grade | Action |
|-------|-------|--------|
| 90-100 | Legend | Scale aggressively, this is exceptional |
| 80-89 | Elite | Production-ready with tight monitoring |
| 70-79 | Good | Deploy with position limits |
| 60-69 | Acceptable | Paper trade only, needs improvement |
| <60 | Weak | Do not deploy β€” redesign required |

See `metrics_guide.py`, `goat_strategy.py`, and `ALPHA_FORGE_GUIDE.md` for full details.

---

## Prerequisites

```bash
# Core
pip install yfinance pandas numpy torch scikit-learn scipy statsmodels

# Advanced (optional but recommended)
pip install gplearn PyWavelets feedparser praw arch xgboost lightgbm

# For deep learning features
pip install transformers  # For FinBERT sentiment
```

---

## Version History

- **v1.0** (Initial): 8 core modules, basic pipeline, basic backtest
- **v2.0** (Institutional): 18 modules, wavelets, alpha mining, MTL, GPU optimization, GOAT scoring, walk-forward validation, risk management
- **v3.0** (Elite/Jane Street): 25 modules, RL execution, LOB reconstruction, market making, synthetic markets, online learning, stat arb, conformal prediction, adversarial defense, A/B testing, DCC-GARCH, feature store

---

## What You Can Do With This

1. **Apply to Jane Street / Two Sigma / Citadel / DE Shaw**
   - This repo demonstrates you understand ALL major quant subsystems
   - Not just "I trained a model" β€” "I built a complete trading platform"

2. **Launch a Quant Trading Startup**
   - Modular architecture β†’ replace components with proprietary data/feeds
   - Start with simple strategies, iterate with A/B testing

3. **Academic Research**
   - Every module cites papers, implements SOTA methods
   - Use synthetic markets for reproducible experiments

4. **Personal Trading**
   - Connect to Interactive Brokers / Alpaca API
   - Run with paper trading, then small real money
   - Risk management prevents blow-ups

---

## License

MIT β€” free for research and commercial use.

**Disclaimer**: This is for educational and research purposes. Past performance does not guarantee future results. Trading involves substantial risk of loss.