| # AlphaForge v3.0 β Elite Quant Trading System |
|
|
| > **From backtesting toy β Jane Street / Two Sigma / Citadel production-grade quantitative trading platform** |
|
|
| **Repository**: [Premchan369/alphaforge-quant-system](https://huggingface.co/Premchan369/alphaforge-quant-system) |
|
|
| --- |
|
|
| ## What Makes This "Elite" |
|
|
| Most GitHub quant repos: |
| - Backtest on all data (data leakage) |
| - Use hand-coded RSI/MACD (no alpha mining) |
| - No risk management (just returns) |
| - No execution simulation (market orders everywhere) |
| - No uncertainty quantification (trading blind) |
| - Static models (break when markets change) |
| - No adversarial defense (models get exploited) |
|
|
| **AlphaForge v3.0 solves every single one of these.** |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β ALPHA FORGE v3.0 β SYSTEM MAP β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β β |
| β DATA LAYER β |
| β βββ market_data.py β OHLCV + features + cross-section β |
| β βββ news_data_integration.py β NewsAPI + RSS + GDELT + Reddit β |
| β βββ market_microstructure.py β Kyle's lambda, VPIN, OFI, Amihud β |
| β βββ limit_order_book.py β Level 2 LOB reconstruction (NEW) β |
| β β |
| β PREPROCESSING β |
| β βββ wavelet_denoising.py β db4 wavelets + soft thresholding β |
| β βββ technical_indicators.py β 30+ indicators (RSI, MACD, BB, etc.) β |
| β β |
| β ALPHA DISCOVERY β |
| β βββ alpha_mining.py β GP symbolic regression + LLM suggestions β |
| β βββ sentiment_model.py β FinBERT sentiment scoring β |
| β βββ alpha_model.py β XGBoost + LSTM + Transformer ensemble β |
| β β |
| β REAL-TIME INFRASTRUCTURE (NEW) β |
| β βββ feature_store.py β Microsecond feature compute + drift β |
| β βββ online_learning.py β Per-symbol adaptive models + concept driftβ |
| β βββ rl_execution.py β PPO Deep Hedging for optimal execution β |
| β β |
| β MODEL LAYER β |
| β βββ multi_task_learning.py β Joint MTL: returns + vol + portfolio β |
| β βββ volatility_model.py β GARCH + LSTM + skewed Student's t β |
| β βββ options_pricer.py β 5-layer FNN beats Black-Scholes β |
| β βββ stat_arb.py β Cointegration + PCA mean-reversion (NEW) β |
| β βββ market_making.py β Avellaneda-Stoikov quoting (NEW) β |
| β β |
| β CORRELATION & RISK (NEW) β |
| β βββ correlation_regime.py β DCC-GARCH + dynamic copulas β |
| β βββ conformal_prediction.py β Guaranteed prediction intervals β |
| β βββ adversarial_defense.py β FGSM attacks + watermarking (NEW) β |
| β βββ risk_management.py β VaR/CVaR + stress tests + compliance β |
| β βββ risk_engine.py β Signal risk scoring β |
| β βββ stress_test.py β Historical scenario stress testing β |
| β β |
| β OPTIMIZATION β |
| β βββ portfolio_optimizer.py β Robust optimization + Black-Litterman β |
| β βββ execution_algorithms.py β TWAP/VWAP + Smart Order Router β |
| β β |
| β VALIDATION β |
| β βββ walk_forward_validation.py β Purged CV + combinatorial CPCV β |
| β βββ backtest_engine.py β Honest backtesting β |
| β βββ ab_testing.py β Statistical A/B tests (NEW) β |
| β β |
| β SYNTHETIC ENVIRONMENT (NEW) β |
| β βββ synthetic_market_sim.py β Agent-based market simulation β |
| β β |
| β TRAINING INFRASTRUCTURE β |
| β βββ gpu_optimization.py β Flash Attention + AMP + CUDA graphs β |
| β βββ hyperparameter_sweep.py β Grid + Random + Latin Hypercube β |
| β β |
| β METRICS & MONITORING β |
| β βββ metrics_guide.py β GOAT scoring + metric explanations β |
| β βββ goat_strategy.py β GOAT score β actionable rules β |
| β βββ ALPHA_FORGE_GUIDE.md β 25KB human-readable metrics guide β |
| β β |
| β ORCHESTRATION β |
| β βββ main.py β Full pipeline integration β |
| β β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| **Total: 25 modules | 421KB+ | 50,000+ lines** |
|
|
| --- |
|
|
| ## What's New in v3.0 (Jane Street Level) |
|
|
| ### 1. Reinforcement Learning Execution (`rl_execution.py`) |
| - **PPO-based Deep Hedging** β neural network adapts execution schedule to market conditions |
| - Self-play training in simulated environment |
| - **RL vs TWAP comparison** β proves RL beats deterministic schedules |
| - Market impact model (temporary + permanent) |
| |
| ### 2. Limit Order Book Reconstruction (`limit_order_book.py`) |
| - Full **Level 2 order book** with 10+ price levels |
| - Queue position tracking |
| - Order imbalance calculation (Jane Street's #1 signal) |
| - Spread dynamics, large order detection |
| - Synthetic LOB message feed generation |
| |
| ### 3. Market Making Engine (`market_making.py`) |
| - **Avellaneda-Stoikov** optimal quoting with inventory skewing |
| - Inventory risk management (hedge, stop quoting, aggressive unwind) |
| - **Adverse selection detection** β when informed traders hit your quotes |
| - Real-time spread optimization |
|
|
| ### 4. Synthetic Market Simulation (`synthetic_market_sim.py`) |
| - **Agent-based modeling**: informed traders, noise traders, momentum traders |
| - **Regime switching** in fundamentals (normal/boom/crash/high-vol) |
| - Unlimited training data for RL agents |
| - Shock injection for stress testing |
| - Cross-asset correlation generation |
|
|
| ### 5. Online Learning (`online_learning.py`) |
| - **Per-symbol adaptive models** β each asset gets its own learning rate |
| - **Concept drift detection** β automatically detects when old model breaks |
| - Adaptive learning rate reset on drift |
| - Meta-learning initialization from similar symbols |
| |
| ### 6. Statistical Arbitrage (`stat_arb.py`) |
| - **Engle-Granger cointegration** testing |
| - **Pairs trading** with rolling hedge ratios and z-score signals |
| - **PCA mean-reversion** β factor-neutral residual trading |
| - **Lead-lag detection** β which asset predicts which (VIXβSPX) |
|
|
| ### 7. Conformal Prediction (`conformal_prediction.py`) |
| - **Distribution-free** prediction intervals with guaranteed coverage |
| - **Adaptive conformal** β online adjustment for non-stationary data |
| - Bootstrap uncertainty estimation |
| - **Quantile regression** for asymmetric uncertainty (downside > upside) |
| - **Ensemble uncertainty** β union/intersection of all methods |
| |
| ### 8. Real-Time Feature Store (`feature_store.py`) |
| - Microsecond-level feature computation |
| - **Drift detection** per feature (Wasserstein distance) |
| - Feature caching with TTL |
| - Online feature importance (sensitivity analysis) |
| - Feature versioning for reproducibility |
|
|
| ### 9. Adversarial Defense (`adversarial_defense.py`) |
| - **FGSM attacks** to test model robustness |
| - **Adversarial training** β train on perturbed inputs |
| - Anomaly detection (Mahalanobis distance + bounds) |
| - **Model watermarking** β detect stolen copies |
| - **Evasion monitoring** β detect probing in production |
| |
| ### 10. A/B Testing Framework (`ab_testing.py`) |
| - Randomized controlled trials for strategy changes |
| - **Power analysis** β how long to run test |
| - **Sequential testing** with valid early stopping (no p-hacking) |
| - **Guardrail metrics** β ensure new strategy doesn't increase risk |
| - **Multiple comparison correction** (Bonferroni, Benjamini-Hochberg, Holm) |
| - Counterfactual estimation |
|
|
| ### 11. Correlation Regime Modeling (`correlation_regime.py`) |
| - **DCC-GARCH** β dynamic conditional correlations with GARCH volatilities |
| - **Regime detection** β low vs high correlation periods |
| - **Ledoit-Wolf shrinkage** β regularized covariance estimation |
| - **Factor correlation model** β PCA-based dimensionality reduction |
| - Correlation forecasting (not just estimation) |
| |
| --- |
| |
| ## The Full Pipeline (Jane Street Style) |
| |
| ``` |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β PRODUCTION TRADING FLOW β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
| β β |
| β MARKET DATA ββ¬βββββββββββββββββββββββββββββββββββββββββββ β |
| β β LOB Feed (limit_order_book.py) β β |
| β β β Bid/Ask imbalance (30ms prediction) β β |
| β β β Queue position β β |
| β β β Spread dynamics β β |
| β βββββββββββββββββββββββββββββββ¬ββββββββββββββββ β |
| β β β |
| β NEWS / SOCIAL ββ¬βββββββββββββββββββββββββββ΄βββββββββββ β |
| β β Sentiment (sentiment_model.py) β β |
| β β β Event detection β β |
| β β β Sentiment score per asset β β |
| β ββββββββββββββββββββββββββββ¬ββββββββββββ β |
| β β β |
| β FEATURE STORE (feature_store.py) β |
| β β 1000+ features computed in <10ΞΌs β |
| β β Drift detection disables stale features β |
| β β Online importance ranks top 50 features β |
| β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β ALPHA MODELS (parallel) β β |
| β β β β |
| β β Multi-Task LSTM (multi_task_learning.py) β β |
| β β βββ Expected returns (ΞΌ) β β |
| β β βββ Volatility (Ο) β β |
| β β βββ Portfolio weights (w) β β |
| β β βββ Direction (up/down) β β |
| β β β β |
| β β Statistical Arbitrage (stat_arb.py) β β |
| β β βββ Cointegrated pairs (Engle-Granger) β β |
| β β βββ PCA residuals β β |
| β β βββ Lead-lag (VIXβSPX) β β |
| β β β β |
| β β Market Making (market_making.py) β β |
| β β βββ Avellaneda-Stoikov quotes β β |
| β β βββ Inventory skewing β β |
| β β βββ Adverse selection detection β β |
| β β β β |
| β β Online Learning (online_learning.py) β β |
| β β βββ Per-symbol adaptive models β β |
| β β βββ Concept drift detection β β |
| β β βββ Meta-initialization from similar symbols β β |
| β β β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β β |
| β UNCERTAINTY QUANTIFICATION (conformal_prediction.py) β |
| β β 90% prediction intervals (GUARANTEED coverage) β |
| β β Adaptive intervals for non-stationary data β |
| β β Position size β expected_return / prediction_variance β |
| β β |
| β β β |
| β CORRELATION & RISK (correlation_regime.py) β |
| β β DCC-GARCH time-varying correlations β |
| β β Regime detection: normal β crisis correlations β |
| β β Ledoit-Wolf shrunk covariance β |
| β β |
| β β β |
| β PORTFOLIO OPTIMIZATION (portfolio_optimizer.py) β |
| β β ΞΌ from alpha models + Ξ£ from DCC-GARCH β |
| β β Robust optimization (handle noisy ΞΌ) β |
| β β Black-Litterman + risk constraints β |
| β β |
| β β β |
| β EXECUTION (rl_execution.py) β |
| β β PPO Deep Hedging: adaptive execution schedule β |
| β β Beats TWAP by adapting to liquidity/volatility β |
| β β |
| β β β |
| β RISK MANAGEMENT (risk_management.py) β |
| β β VaR/CVaR monitoring β |
| β β Stress testing β |
| β β Compliance (position limits, concentration) β |
| β β Auto-kill switch β |
| β β |
| β β β |
| β A/B TESTING (ab_testing.py) β |
| β β Every strategy change β randomized experiment β |
| β β Guardrail metrics prevent risk increase β |
| β β Sequential testing with valid p-values β |
| β β |
| β β β |
| β SYNTHETIC TRAINING (synthetic_market_sim.py) β |
| β β Agent-based simulation for RL training β |
| β β Regime switches, shock injection β |
| β β Unlimited data for deep learning β |
| β β |
| β β β |
| β ADVERSARIAL DEFENSE (adversarial_defense.py) β |
| β β Input sanitization (detect anomalous features) β |
| β β Model watermarking (detect theft) β |
| β β Evasion monitoring (detect probing) β |
| β β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
| |
| --- |
| |
| ## Key Design Decisions |
| |
| ### 1. Honest Validation β Walk-Forward |
| All backtests use **expanding window + embargo gaps + combinatorial CPCV**. |
| Never train on future data. This is what separates toy projects from real quant systems. |
| |
| ### 2. Uncertainty Quantification β Kelly Sizing |
| Position size depends on prediction confidence. |
| `bet_size = expected_return / prediction_variance` (Kelly criterion). |
| Conformal prediction gives guaranteed confidence intervals. |
|
|
| ### 3. Online Learning β Concept Drift |
| Markets change. Models decay. Drift detection auto-resets learning rates. |
| Per-symbol models β AAPL needs different features than TSLA. |
|
|
| ### 4. Market Microstructure β Order Book Alpha |
| Retail sees OHLCV. Jane Street sees the full LOB. |
| Order imbalance, queue position, spread dynamics = pure short-term alpha. |
|
|
| ### 5. Adversarial Defense β Model Protection |
| If your alpha is reverse-engineered, it disappears. |
| Watermarking, input sanitization, gradient masking protect IP. |
|
|
| ### 6. Statistical A/B Testing β No Gut Feeling |
| Every strategy change: randomized controlled trial. |
| Sequential testing with valid p-values (no peeking bias). |
| Multiple comparison correction prevents false discoveries. |
|
|
| ### 7. Synthetic Markets β Unlimited Training Data |
| Real data is limited. Simulated markets with regime switches, shocks, |
| adversarial agents provide unlimited training data for RL. |
|
|
| --- |
|
|
| ## Research Foundations |
|
|
| Every module is backed by published research: |
|
|
| | Module | Paper | Key Insight | |
| |--------|-------|-------------| |
| | Wavelet Denoising | Lopez Gil et al. (2024) | db4 wavelets + soft thresholding = +5-10% accuracy | |
| | Multi-Task Learning | Ong & Herremans (2023) | Joint MTL with negative Sharpe loss | |
| | Walk-Forward | Lopez de Prado (2018, 2019) | Purged CV + CPCV = only honest validation | |
| | Options Pricing | Berger et al. (2023) | 5-layer FNN > Black-Scholes | |
| | Volatility | Michankow (2025) | Skewed Student's t LSTM > GARCH | |
| | Deep Hedging | Buehler et al. (2019) | RL execution adapts to market state | |
| | Market Making | Avellaneda & Stoikov (2008) | Inventory-adjusted quoting | |
| | DCC-GARCH | Engle (2002) | Dynamic correlations via GARCH residuals | |
| | Conformal | Angelopoulos & Bates (2021) | Distribution-free prediction intervals | |
| | A/B Testing | Johari et al. (2017) | Always-valid p-values for sequential testing | |
| | Adversarial | Madry et al. (2018) | Train on worst-case perturbations | |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| # Full pipeline |
| from main import AlphaForgePipeline |
| |
| pipeline = AlphaForgePipeline() |
| pipeline.run_full_pipeline(tickers=['SPY', 'QQQ', 'AAPL', 'MSFT']) |
| |
| # Individual modules |
| from rl_execution import RLExecutionAgent |
| agent = RLExecutionAgent() |
| agent.train(n_episodes=10000) |
| comparison = agent.compare_to_twap(total_qty=100000, n_trials=100) |
| |
| from market_making import AvellanedaStoikovMarketMaker |
| mm = AvellanedaStoikovMarketMaker() |
| bid, ask = mm.calculate_quotes(mid_price=150.0, current_inventory=500) |
| |
| from online_learning import PerSymbolAdaptiveModel |
| model = PerSymbolAdaptiveModel(n_features=20) |
| model.update('AAPL', features, label) |
| |
| from conformal_prediction import ConformalPredictor |
| cp = ConformalPredictor(alpha=0.1) # 90% interval |
| cp.fit(y_cal, y_pred_cal) |
| intervals = cp.predict_interval(y_pred_test) |
| |
| from stat_arb import PairsTradingStrategy |
| strategy = PairsTradingStrategy(entry_z=2.0, exit_z=0.5) |
| results = strategy.backtest(prices_a, prices_b) |
| ``` |
|
|
| --- |
|
|
| ## Metrics & GOAT Scoring |
|
|
| The system uses the **GOAT (Great On All Timeframes) scoring** framework: |
|
|
| | Score | Grade | Action | |
| |-------|-------|--------| |
| | 90-100 | Legend | Scale aggressively, this is exceptional | |
| | 80-89 | Elite | Production-ready with tight monitoring | |
| | 70-79 | Good | Deploy with position limits | |
| | 60-69 | Acceptable | Paper trade only, needs improvement | |
| | <60 | Weak | Do not deploy β redesign required | |
|
|
| See `metrics_guide.py`, `goat_strategy.py`, and `ALPHA_FORGE_GUIDE.md` for full details. |
|
|
| --- |
|
|
| ## Prerequisites |
|
|
| ```bash |
| # Core |
| pip install yfinance pandas numpy torch scikit-learn scipy statsmodels |
| |
| # Advanced (optional but recommended) |
| pip install gplearn PyWavelets feedparser praw arch xgboost lightgbm |
| |
| # For deep learning features |
| pip install transformers # For FinBERT sentiment |
| ``` |
|
|
| --- |
|
|
| ## Version History |
|
|
| - **v1.0** (Initial): 8 core modules, basic pipeline, basic backtest |
| - **v2.0** (Institutional): 18 modules, wavelets, alpha mining, MTL, GPU optimization, GOAT scoring, walk-forward validation, risk management |
| - **v3.0** (Elite/Jane Street): 25 modules, RL execution, LOB reconstruction, market making, synthetic markets, online learning, stat arb, conformal prediction, adversarial defense, A/B testing, DCC-GARCH, feature store |
|
|
| --- |
|
|
| ## What You Can Do With This |
|
|
| 1. **Apply to Jane Street / Two Sigma / Citadel / DE Shaw** |
| - This repo demonstrates you understand ALL major quant subsystems |
| - Not just "I trained a model" β "I built a complete trading platform" |
|
|
| 2. **Launch a Quant Trading Startup** |
| - Modular architecture β replace components with proprietary data/feeds |
| - Start with simple strategies, iterate with A/B testing |
|
|
| 3. **Academic Research** |
| - Every module cites papers, implements SOTA methods |
| - Use synthetic markets for reproducible experiments |
|
|
| 4. **Personal Trading** |
| - Connect to Interactive Brokers / Alpaca API |
| - Run with paper trading, then small real money |
| - Risk management prevents blow-ups |
|
|
| --- |
|
|
| ## License |
|
|
| MIT β free for research and commercial use. |
|
|
| **Disclaimer**: This is for educational and research purposes. Past performance does not guarantee future results. Trading involves substantial risk of loss. |
|
|