math-backend / docs /MODELS.md
engineportf's picture
Upload folder using huggingface_hub
558db1e verified
|
Raw
History Blame Contribute Delete
11.4 kB

Return Forecasting & Risk Models

Abstract

This document provides a detailed reference for every return forecasting model, covariance estimator, and risk model available in the Portfolio Engine. Each model is described with its mathematical formulation, implementation location, configurable parameters, and integration points. For the high-level pipeline context, see PIPELINE.md.


1. Model Selection

Models are selected via the model parameter in the configuration:

ID Name Module Description
1 CAPM models.pymodel_capm() Capital Asset Pricing Model
2 Black-Litterman models.pymodel_black_litterman() Market equilibrium prior
3 Bayesian Shrinkage models.pymodel_bayesian_blend() James-Stein shrinkage estimator
4 Fama-French models.pymodel_fama_french() Multi-factor regression model
5 ML Stacking Ensemble models.pyensemble_return_forecast() XGBoost + ElasticNet + Ridge meta-learner
6 End-to-End SPO+ e2e_forecast_model.py Differentiable optimization with gradient flow
7 Regime-Adaptive Blend forecast_generation.py Dynamic mix of CAPM, BL, and Bayesian

2. Model 1 — CAPM

models.pymodel_capm()

The Capital Asset Pricing Model estimates expected returns as:

E[R_i] = R_f + β_i · (E[R_m] - R_f)

where β is computed via exponentially weighted OLS regression of each asset's returns against the benchmark (SPY). The exponential weighting uses a 2-year half-life, giving more recent observations greater influence.

Parameters:

  • rfr: Risk-free rate (from ^TNX or config default)
  • periods: Annualisation factor (252 for daily, 12 for monthly)

3. Model 2 — Black-Litterman

models.pymodel_black_litterman()

Computes the equilibrium implied excess returns (π) from the market-cap-weighted portfolio and the covariance matrix:

π = δ · Σ · w_mkt

where δ is the risk aversion coefficient and w_mkt are the market-cap weights. The engine uses a conditional ERP (Equity Risk Premium) derived from the current risk-free rate via get_conditional_erp(), which implements a non-linear decay function:

ERP = max(0.02, LONG_RUN_ERP - max(0, rfr - 0.035) × 2.0)

This ensures the ERP contracts as rates rise above the neutral level, consistent with a DCF framework.


4. Model 3 — Bayesian Shrinkage (James-Stein)

models.pymodel_bayesian_blend()

Shrinks the sample mean toward the CAPM prior using the James-Stein estimator:

E[R] = α · μ_CAPM + (1 - α) · μ_hist

where α is the shrinkage intensity. When a Normal-Inverse-Wishart (NIW) prior state exists on disk (niw_prior_state.pkl), the model performs a full Bayesian update using the conjugate posterior update equations:

κ_n = κ_0 + n
μ_n = (κ_0 · μ_0 + n · x̄) / κ_n

This sequential learning approach enables the model to improve across pipeline runs.


5. Model 4 — Fama-French

models.pymodel_fama_french()

Runs a multi-factor time-series regression for each asset against the Fama-French 5 factors plus Momentum:

R_i - R_f = α + β_MKT · (R_m - R_f) + β_SMB · SMB + β_HML · HML + β_RMW · RMW + β_CMA · CMA + β_MOM · MOM + ε

Factor data is downloaded from Kenneth French's Dartmouth data library via data.pyfetch_fama_french_factors(). The expected return is computed as the factor risk premiums (rolling average of factor returns) multiplied by each asset's factor loadings.

Regime Awareness: In stressed regimes (severity ≥ 2.0), the model automatically reduces the weight on momentum (MOM) because momentum crashes are well-documented during regime transitions (Daniel & Moskowitz, 2016).


6. Model 5 — ML Stacking Ensemble

models.pyensemble_return_forecast()

This is the most complex model in the engine. It implements a three-layer stacking ensemble:

Layer 1: Feature Engineering (data.pybuild_ml_features())

For each asset, the engine constructs a feature matrix from:

Feature Lookback Description
mom_1m 22 days 1-month geometric momentum
mom_3m 64 days 3-month geometric momentum
mom_6m 127 days 6-month geometric momentum
rev_5d 6 days 5-day mean reversion signal
vol_21d 21 days Realised volatility
beta_63d 63 days Rolling market beta
smb_21d 21 days Size factor exposure
hml_21d 21 days Value factor exposure
mkt_rf_21d 21 days Market excess return
rmw_21d 21 days Profitability factor
cma_21d 21 days Investment factor
put_call_ratio Point-in-time Options flow sentiment (see below)
iv_skew Point-in-time Implied volatility skew

Non-overlapping targets: The target variable is the forward 21-day return. To prevent serial correlation, the dataset is sampled at non-overlapping strides of horizon days.

Layer 2: Base Learners

Two base models are trained per asset:

  • XGBoost (gradient-boosted trees): Captures non-linear interactions.
  • ElasticNet: L1+L2 regularised linear model for robustness against multicollinearity.

Both are trained with time-series-aware cross-validation to prevent look-ahead bias.

Layer 3: Ridge Meta-Learner

A Ridge regression model combines the out-of-fold predictions from both base learners into a single expected return forecast. This stacking architecture exploits the complementary strengths of tree-based and linear models.

Alternative Data Integration

When using Model 5, the engine automatically fetches real-time options flow data via alternative_data.pyfetch_options_sentiment(). For each asset, it extracts:

  • Put/Call Volume Ratio: Measures directional sentiment. Values > 1.0 indicate bearish positioning.
  • Implied Volatility Skew: The average IV of puts minus the average IV of calls. Positive skew indicates elevated fear of downside.

These features are injected into the features_dict and consumed by both the XGBoost/ElasticNet ensemble and the Transformer deep learning model.

Deep Learning Layer — Noise-Filtered Transformer

dl_models.pyNoiseFilteredTransformer

When PyTorch is available, the ensemble is augmented with a Transformer sequence model:

  1. Conv1D Noise Filter: A 1D convolutional layer with kernel size 3 and GELU activation smooths noisy daily features before they reach the self-attention mechanism.
  2. Transformer Encoder: 2 attention heads, 2 encoder layers, feedforward dimension of 128. Processes 60-day sliding windows.
  3. Cross-Asset Training: A single model is trained across all assets simultaneously using CrossAssetSequenceDataset, which stacks sequences from the entire universe. This prevents overfitting on individual assets and allows the model to learn universal temporal patterns.

The Transformer's prediction is blended 50/50 with the Ridge meta-learner output.


7. Model 6 — End-to-End Differentiable Optimization (SPO+)

e2e_forecast_model.py

Implements Smart Predict-then-Optimize (SPO+) where the neural network's loss function is the downstream portfolio objective itself. Gradients flow through the CVXPY layer (via cvxpylayers) back into the forecast network:

Forecast Network → Expected Returns → CVXPY Layer → Portfolio Weights → Realised Sharpe Loss

This eliminates the two-stage estimation-then-optimisation pipeline and directly optimises for the decision quality.


8. Model 7 — Regime-Adaptive Factor Blend

forecast_generation.pyRegimeAdaptiveStrategy

Dynamically blends CAPM, Black-Litterman, and Bayesian Shrinkage forecasts based on the HMM regime severity score:

E[R] = w_CAPM · E[R]_CAPM + w_BL · E[R]_BL + w_Bayes · E[R]_Bayes

The weights are controlled by a sigmoid function centered at severity = 2.0:

Regime w_CAPM w_BL w_Bayes
Calm (sev ≈ 1.0) 25% 30% 45%
Crash (sev ≥ 3.0) 15% 70% 15%

In crisis regimes, the model anchors heavily to the BL equilibrium prior (a conservative, diversified estimate) rather than trusting volatile historical data.


9. Covariance Estimation

Hybrid Covariance (models.pybuild_hybrid_covariance())

The covariance matrix is estimated using a Ledoit-Wolf shrinkage estimator that blends the sample covariance with a structured target (constant correlation model):

Σ_shrunk = α · Σ_target + (1 - α) · Σ_sample

The shrinkage intensity α is determined analytically. For portfolios containing bonds, the engine constructs a hybrid block-diagonal covariance matrix that separates equity and fixed-income risk structures.

GARCH Scaling (models.pygarch_scale_covariance())

When garch_enabled = True, the engine fits a GARCH(1,1) model to each asset's return series:

σ²_t = ω + α · ε²_{t-1} + β · σ²_{t-1}

The ratio of the conditional variance to the unconditional variance produces a scaling factor applied to the diagonal of the covariance matrix. This captures short-term volatility clustering.

Random Matrix Theory Noise Filtering

The eigenvalues of the correlation matrix are compared against the Marchenko-Pastur distribution. Eigenvalues below the theoretical upper bound are considered noise and are shrunk toward 1.0.


10. Black-Litterman Integration Bridge

bl_bridge.pycompute_bl_posterior()

When using Model 5 (ML Ensemble), the ML expected returns are not used directly. Instead, they are treated as investor views in the Black-Litterman framework:

  1. The Black-Litterman equilibrium prior (π) is computed from the covariance matrix.
  2. The ML forecast and its uncertainty matrix form the view.
  3. The posterior expected returns are a precision-weighted blend of the prior and the views.

This architecture prevents ML overconfidence from dominating the allocation. In stressed regimes, the uncertainty on ML views is scaled up via scale_uncertainty_by_regime(), further anchoring to the conservative equilibrium prior.


References

  • Black, F., & Litterman, R. (1992). Global portfolio optimization. Financial Analysts Journal, 48(5), 28–43.
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD 2016.
  • Daniel, K., & Moskowitz, T. J. (2016). Momentum crashes. Journal of Financial Economics, 122(2), 221–247.
  • Elmachtoub, A. N., & Grigas, P. (2022). Smart "Predict, then Optimize." Management Science, 68(1), 9–26.
  • Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56.
  • James, W., & Stein, C. (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium, 1, 361–379.
  • Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.
  • Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance, 19(3), 425–442.
  • Vaswani, A., et al. (2017). Attention is all you need. NeurIPS 2017.