Spaces:
Sleeping
Return Forecasting & Risk Models
Abstract
This document provides a detailed reference for every return forecasting model, covariance estimator, and risk model available in the Portfolio Engine. Each model is described with its mathematical formulation, implementation location, configurable parameters, and integration points. For the high-level pipeline context, see PIPELINE.md.
1. Model Selection
Models are selected via the model parameter in the configuration:
| ID | Name | Module | Description |
|---|---|---|---|
| 1 | CAPM | models.py → model_capm() |
Capital Asset Pricing Model |
| 2 | Black-Litterman | models.py → model_black_litterman() |
Market equilibrium prior |
| 3 | Bayesian Shrinkage | models.py → model_bayesian_blend() |
James-Stein shrinkage estimator |
| 4 | Fama-French | models.py → model_fama_french() |
Multi-factor regression model |
| 5 | ML Stacking Ensemble | models.py → ensemble_return_forecast() |
XGBoost + ElasticNet + Ridge meta-learner |
| 6 | End-to-End SPO+ | e2e_forecast_model.py |
Differentiable optimization with gradient flow |
| 7 | Regime-Adaptive Blend | forecast_generation.py |
Dynamic mix of CAPM, BL, and Bayesian |
2. Model 1 — CAPM
models.py → model_capm()
The Capital Asset Pricing Model estimates expected returns as:
E[R_i] = R_f + β_i · (E[R_m] - R_f)
where β is computed via exponentially weighted OLS regression of each asset's returns against the benchmark (SPY). The exponential weighting uses a 2-year half-life, giving more recent observations greater influence.
Parameters:
rfr: Risk-free rate (from^TNXor config default)periods: Annualisation factor (252 for daily, 12 for monthly)
3. Model 2 — Black-Litterman
models.py → model_black_litterman()
Computes the equilibrium implied excess returns (π) from the market-cap-weighted portfolio and the covariance matrix:
π = δ · Σ · w_mkt
where δ is the risk aversion coefficient and w_mkt are the market-cap weights. The engine uses a conditional ERP (Equity Risk Premium) derived from the current risk-free rate via get_conditional_erp(), which implements a non-linear decay function:
ERP = max(0.02, LONG_RUN_ERP - max(0, rfr - 0.035) × 2.0)
This ensures the ERP contracts as rates rise above the neutral level, consistent with a DCF framework.
4. Model 3 — Bayesian Shrinkage (James-Stein)
models.py → model_bayesian_blend()
Shrinks the sample mean toward the CAPM prior using the James-Stein estimator:
E[R] = α · μ_CAPM + (1 - α) · μ_hist
where α is the shrinkage intensity. When a Normal-Inverse-Wishart (NIW) prior state exists on disk (niw_prior_state.pkl), the model performs a full Bayesian update using the conjugate posterior update equations:
κ_n = κ_0 + n
μ_n = (κ_0 · μ_0 + n · x̄) / κ_n
This sequential learning approach enables the model to improve across pipeline runs.
5. Model 4 — Fama-French
models.py → model_fama_french()
Runs a multi-factor time-series regression for each asset against the Fama-French 5 factors plus Momentum:
R_i - R_f = α + β_MKT · (R_m - R_f) + β_SMB · SMB + β_HML · HML + β_RMW · RMW + β_CMA · CMA + β_MOM · MOM + ε
Factor data is downloaded from Kenneth French's Dartmouth data library via data.py → fetch_fama_french_factors(). The expected return is computed as the factor risk premiums (rolling average of factor returns) multiplied by each asset's factor loadings.
Regime Awareness: In stressed regimes (severity ≥ 2.0), the model automatically reduces the weight on momentum (MOM) because momentum crashes are well-documented during regime transitions (Daniel & Moskowitz, 2016).
6. Model 5 — ML Stacking Ensemble
models.py → ensemble_return_forecast()
This is the most complex model in the engine. It implements a three-layer stacking ensemble:
Layer 1: Feature Engineering (data.py → build_ml_features())
For each asset, the engine constructs a feature matrix from:
| Feature | Lookback | Description |
|---|---|---|
mom_1m |
22 days | 1-month geometric momentum |
mom_3m |
64 days | 3-month geometric momentum |
mom_6m |
127 days | 6-month geometric momentum |
rev_5d |
6 days | 5-day mean reversion signal |
vol_21d |
21 days | Realised volatility |
beta_63d |
63 days | Rolling market beta |
smb_21d |
21 days | Size factor exposure |
hml_21d |
21 days | Value factor exposure |
mkt_rf_21d |
21 days | Market excess return |
rmw_21d |
21 days | Profitability factor |
cma_21d |
21 days | Investment factor |
put_call_ratio |
Point-in-time | Options flow sentiment (see below) |
iv_skew |
Point-in-time | Implied volatility skew |
Non-overlapping targets: The target variable is the forward 21-day return. To prevent serial correlation, the dataset is sampled at non-overlapping strides of horizon days.
Layer 2: Base Learners
Two base models are trained per asset:
- XGBoost (gradient-boosted trees): Captures non-linear interactions.
- ElasticNet: L1+L2 regularised linear model for robustness against multicollinearity.
Both are trained with time-series-aware cross-validation to prevent look-ahead bias.
Layer 3: Ridge Meta-Learner
A Ridge regression model combines the out-of-fold predictions from both base learners into a single expected return forecast. This stacking architecture exploits the complementary strengths of tree-based and linear models.
Alternative Data Integration
When using Model 5, the engine automatically fetches real-time options flow data via alternative_data.py → fetch_options_sentiment(). For each asset, it extracts:
- Put/Call Volume Ratio: Measures directional sentiment. Values > 1.0 indicate bearish positioning.
- Implied Volatility Skew: The average IV of puts minus the average IV of calls. Positive skew indicates elevated fear of downside.
These features are injected into the features_dict and consumed by both the XGBoost/ElasticNet ensemble and the Transformer deep learning model.
Deep Learning Layer — Noise-Filtered Transformer
dl_models.py → NoiseFilteredTransformer
When PyTorch is available, the ensemble is augmented with a Transformer sequence model:
- Conv1D Noise Filter: A 1D convolutional layer with kernel size 3 and GELU activation smooths noisy daily features before they reach the self-attention mechanism.
- Transformer Encoder: 2 attention heads, 2 encoder layers, feedforward dimension of 128. Processes 60-day sliding windows.
- Cross-Asset Training: A single model is trained across all assets simultaneously using
CrossAssetSequenceDataset, which stacks sequences from the entire universe. This prevents overfitting on individual assets and allows the model to learn universal temporal patterns.
The Transformer's prediction is blended 50/50 with the Ridge meta-learner output.
7. Model 6 — End-to-End Differentiable Optimization (SPO+)
e2e_forecast_model.py
Implements Smart Predict-then-Optimize (SPO+) where the neural network's loss function is the downstream portfolio objective itself. Gradients flow through the CVXPY layer (via cvxpylayers) back into the forecast network:
Forecast Network → Expected Returns → CVXPY Layer → Portfolio Weights → Realised Sharpe Loss
This eliminates the two-stage estimation-then-optimisation pipeline and directly optimises for the decision quality.
8. Model 7 — Regime-Adaptive Factor Blend
forecast_generation.py → RegimeAdaptiveStrategy
Dynamically blends CAPM, Black-Litterman, and Bayesian Shrinkage forecasts based on the HMM regime severity score:
E[R] = w_CAPM · E[R]_CAPM + w_BL · E[R]_BL + w_Bayes · E[R]_Bayes
The weights are controlled by a sigmoid function centered at severity = 2.0:
| Regime | w_CAPM | w_BL | w_Bayes |
|---|---|---|---|
| Calm (sev ≈ 1.0) | 25% | 30% | 45% |
| Crash (sev ≥ 3.0) | 15% | 70% | 15% |
In crisis regimes, the model anchors heavily to the BL equilibrium prior (a conservative, diversified estimate) rather than trusting volatile historical data.
9. Covariance Estimation
Hybrid Covariance (models.py → build_hybrid_covariance())
The covariance matrix is estimated using a Ledoit-Wolf shrinkage estimator that blends the sample covariance with a structured target (constant correlation model):
Σ_shrunk = α · Σ_target + (1 - α) · Σ_sample
The shrinkage intensity α is determined analytically. For portfolios containing bonds, the engine constructs a hybrid block-diagonal covariance matrix that separates equity and fixed-income risk structures.
GARCH Scaling (models.py → garch_scale_covariance())
When garch_enabled = True, the engine fits a GARCH(1,1) model to each asset's return series:
σ²_t = ω + α · ε²_{t-1} + β · σ²_{t-1}
The ratio of the conditional variance to the unconditional variance produces a scaling factor applied to the diagonal of the covariance matrix. This captures short-term volatility clustering.
Random Matrix Theory Noise Filtering
The eigenvalues of the correlation matrix are compared against the Marchenko-Pastur distribution. Eigenvalues below the theoretical upper bound are considered noise and are shrunk toward 1.0.
10. Black-Litterman Integration Bridge
bl_bridge.py → compute_bl_posterior()
When using Model 5 (ML Ensemble), the ML expected returns are not used directly. Instead, they are treated as investor views in the Black-Litterman framework:
- The Black-Litterman equilibrium prior (π) is computed from the covariance matrix.
- The ML forecast and its uncertainty matrix form the view.
- The posterior expected returns are a precision-weighted blend of the prior and the views.
This architecture prevents ML overconfidence from dominating the allocation. In stressed regimes, the uncertainty on ML views is scaled up via scale_uncertainty_by_regime(), further anchoring to the conservative equilibrium prior.
References
- Black, F., & Litterman, R. (1992). Global portfolio optimization. Financial Analysts Journal, 48(5), 28–43.
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD 2016.
- Daniel, K., & Moskowitz, T. J. (2016). Momentum crashes. Journal of Financial Economics, 122(2), 221–247.
- Elmachtoub, A. N., & Grigas, P. (2022). Smart "Predict, then Optimize." Management Science, 68(1), 9–26.
- Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
- James, W., & Stein, C. (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium, 1, 361–379.
- Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.
- Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance, 19(3), 425–442.
- Vaswani, A., et al. (2017). Attention is all you need. NeurIPS 2017.