Spaces:
Sleeping
Sleeping
File size: 16,354 Bytes
558db1e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 | # Pipeline Architecture
## Abstract
The Portfolio Engine implements a multi-stage pipeline that transforms raw market data into optimised portfolio allocations, validated through rigorous out-of-sample econometric testing, and exported as interactive reports. This document describes the full execution flow, the data structures that mediate inter-stage communication, the mathematical validation framework, and the report generation subsystem. It serves as the architectural reference for understanding how the engine's components compose into a coherent analytical system.
---
## 1. Pipeline Overview
The engine is orchestrated by the `PortfolioPipeline` class in `core_engine.py`, which implements a four-stage execution model:
```
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Pipeline Stages β
β β
β βββββββββββββ ββββββββββββββββββ βββββββββββββ βββββββββββββ β
β β Stage 1 ββββΆβ Stage 2 ββββΆβ Stage 3 ββββΆβ Stage 4 β β
β β load_data()β βrun_validation()β β optimize() β β reports() β β
β βββββββββββββ ββββββββββββββββββ βββββββββββββ βββββββββββββ β
β β
β Data Fetch Walk-Forward CV Full-Sample HTML + CSV β
β Regime Detect Econometric Tests Optimisation PDF Export β
β Risk Aversion DM / Christoffersen Sensitivity Serve β
β Adjustment PSR / DSR Stress Test β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Entry Point
```python
def run_engine(overrides=None):
pipeline = PortfolioPipeline(overrides=overrides)
pipeline.load_data()
val_bundle = pipeline.run_validation()
opt_bundle = pipeline.optimize()
pipeline.generate_reports(val_bundle, opt_bundle)
```
The `overrides` dictionary enables headless execution from the API layer (`api.py`), test harnesses, or scheduled batch jobs, bypassing the interactive CLI wizard.
---
## 2. Stage 1 β Data Loading (`load_data`)
### 2.1 Data Sources
| Source | Target | Module |
|---------------------------|-----------------------|---------------------|
| Yahoo Finance / DB Cache | Daily OHLCV prices | `data.py` |
| Kenneth French Library | Fama-French factors | `data.py` |
| FRED / ^TNX proxy | Risk-free rate series | `data.py` |
| PostgreSQL / SQLite | Cached price data | `database.py` |
### 2.2 Data Validation
- **Minimum History:** Assets must have β₯ 2Γ `trading_days_per_year` (default: 504 business days) of return history to be included. Assets with insufficient history are silently dropped.
- **Missing Data:** Returns DataFrames are constructed via `pd.DataFrame.dropna()`, ensuring a common date index across all assets.
- **Frequency Conversion:** When `return_frequency = 'monthly'`, daily returns are geometrically compounded to monthly via `build_monthly_returns()`.
### 2.3 Regime Detection
If `hmm_regime = True` (default), the engine fits a Hidden Markov Model to benchmark returns via `regime_detection.detect_volatility_regime()`. The detected regime (Bull, Normal, Crash) informs:
- Dynamic risk aversion adjustment (Stage 2 and 3).
- PID volatility target in the research cybernetic ensemble.
- Report visualisation annotations.
### 2.4 Dynamic Risk Aversion
If `dynamic_risk = True` (default), the VIX level is used to adjust the user's stated risk aversion via `regime_detection.dynamic_risk_aversion()`. This implements a counter-cyclical risk management policy: risk aversion increases during high-volatility episodes, reducing exposure before drawdowns deepen.
---
## 3. Stage 2 β Walk-Forward Validation (`run_validation`)
### 3.1 Expanding Window Cross-Validation
The engine performs expanding-window (walk-forward) backtesting via `backtest.expanding_window_backtest()`:
1. An initial training window of `OOS_TRAIN_DAYS` (total days β 252) is established.
2. The model is trained on the expanding window and produces out-of-sample weights.
3. Weights are rebalanced every `trading_days / 4` periods (quarterly).
4. An out-of-sample equity curve is constructed from realised returns.
This methodology prevents look-ahead bias and is the gold standard for strategy validation in quantitative finance (Bailey et al., 2014).
### 3.2 Econometric Tests
The validation stage runs four statistical tests on the out-of-sample returns:
#### Christoffersen Conditional Coverage Test
Tests whether Value-at-Risk (VaR) exceedances are both correctly calibrated (unconditional coverage) and serially independent (no volatility clustering in violations). A joint likelihood ratio statistic is computed:
```
LR_cc = LR_uc + LR_ind ~ ΟΒ²(2)
```
**Pass Criterion:** p-value > 0.05 for both components.
#### Diebold-Mariano Test
Tests whether the engine's expected return model statistically outperforms a naive historical mean baseline in terms of out-of-sample prediction accuracy:
```
DM = dΜ / ΟΜ(d) ~ N(0, 1)
```
where d_t = |eβ_t| β |eβ_t| is the loss differential (MAE loss function). The test is robust to heteroskedasticity via Newey-West variance estimation.
**Pass Criterion:** p-value < 0.05 and the engine's model wins.
#### Probabilistic Sharpe Ratio (PSR)
Accounts for the non-normality of returns (skewness and kurtosis) when evaluating whether the observed Sharpe ratio is statistically distinguishable from a benchmark value of zero (Bailey & LΓ³pez de Prado, 2012):
```
PSR = Ξ¦[(SR β SR*) Β· β(n-1) / β(1 β Ξ³βΒ·SR + (Ξ³ββ1)/4 Β· SRΒ²)]
```
where Ξ³β and Ξ³β are the sample skewness and kurtosis.
**Pass Criterion:** PSR > 0.95 (95% confidence that the true Sharpe exceeds zero).
#### Deflated Sharpe Ratio (DSR)
Adjusts for multiple testing bias when the engine evaluates K candidate models (Bailey & LΓ³pez de Prado, 2014). The expected maximum Sharpe ratio under the null hypothesis (all models have zero alpha) is:
```
E[max(SR)] β β(2Β·ln(K)) β [Ξ³ + ln(Ο/2)] / [2Β·β(2Β·ln(K))]
```
The DSR then tests whether the observed Sharpe significantly exceeds this multiple-testing threshold.
**Pass Criterion:** DSR > 0.95.
### 3.3 Output
The validation stage produces a `ValidationBundle` dataclass:
```python
@dataclass
class ValidationBundle:
oos_eq: pd.Series # Out-of-sample equity curve
oos_bench_curve: pd.Series # Benchmark equity curve
oos_port_rets: pd.Series # Out-of-sample portfolio returns
wf_ann_ret: float # Walk-forward annualised return
var_results: dict # Christoffersen test results
dm_results: dict # Diebold-Mariano test results
psr_results: dict # Probabilistic Sharpe Ratio
dsr_results: dict # Deflated Sharpe Ratio
```
---
## 4. Stage 3 β Full-Sample Optimisation (`optimize`)
### 4.1 Solver Invocation
The full historical dataset is passed to `solver.build_and_optimize()`, which:
1. Computes expected returns using the selected model (CAPM, BL, Fama-French, Bayesian, or ML Stacking).
2. Estimates the covariance matrix with Ledoit-Wolf shrinkage and optional GARCH scaling.
3. Formulates and solves the convex optimisation problem via the CVXPY engine.
4. Applies the 7-stage constraint relaxation cascade if the initial formulation is infeasible (see `docs/RELAXATION_CASCADE.md`).
### 4.2 Sensitivity & Stress Analysis
Post-optimisation, the engine runs two diagnostic analyses:
- **Sensitivity Analysis** (`analytics.portfolio_sensitivity`): Perturbs expected returns by Β±10% and re-solves, measuring the weight response range per asset. Assets with >15pp swings are flagged as "fragile."
- **Stress Testing** (`analytics.portfolio_stress_test`): Evaluates portfolio impact under historical crash scenarios (e.g., 2008 GFC, 2020 COVID, rate shock, tech crash).
If fragile allocations are detected and the allocation engine is Mean-Variance (engine 1), a stability penalty is added to the objective function and the solver is re-invoked.
### 4.3 Output
```python
@dataclass
class OptimizationBundle:
weights: pd.Series # Final target weights
exp_rets: pd.Series # Expected returns per asset
cov_mat: pd.DataFrame # Covariance matrix
vol: float # Portfolio volatility
corr_matrix: pd.DataFrame # Correlation matrix
betas: pd.Series # Market betas
model_info: dict # Model metadata
sens_report: dict # Sensitivity analysis
stress_report: dict # Stress test results
n_fragile: int # Count of fragile allocations
```
---
## 5. Stage 4 β Report Generation (`generate_reports`)
### 5.1 Architecture
Report generation follows a three-layer architecture:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β report.py (Orchestrator) β
β Coordinates data β template β file pipeline β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ€
β report_data.py β report_html.py β
β (Data Layer) β (Rendering Layer) β
β Formats all β Injects variables into β
β mathematical β report_template.html β
β outputs into β static template β
β template vars β β
ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ
```
### 5.2 Report Data Layer β `report_data.py`
The `prepare_template_variables()` function is the largest single function in the codebase (~675 lines). It transforms raw mathematical outputs into presentation-ready HTML fragments and Chart.js data payloads. Key computations include:
- **Advanced Risk Metrics:** CVaR (95%), Conditional Drawdown-at-Risk (CDaR), Mean Absolute Deviation (MAD), and semi-deviation.
- **Transition Comparisons:** When the user provides current holdings, the report computes before/after comparisons for all metrics.
- **Chart Payload:** A JSON dictionary consumed by Chart.js for interactive equity curves, allocation pie charts, efficient frontier plots, Monte Carlo fan charts, and risk contribution bar charts.
- **Narrative Generation:** `narrative.py` produces a natural-language summary of the portfolio strategy, market conditions, and key risk factors.
### 5.3 HTML Rendering β `report_html.py`
The rendering layer substitutes template variables into `report_template.html`, a 26KB static template with Chart.js initialisation scripts. The template uses CSS-in-HTML styling with a dark theme optimised for screen presentation.
### 5.4 Export Formats
| Format | Module | Content |
|----------|----------------|--------------------------------------------|
| HTML | `report.py` | Interactive report with Chart.js |
| PDF | `exports.py` | Static rendering via headless browser |
| CSV | `exports.py` | Tabular weight/allocation summary |
| Excel | `exports.py` | Multi-sheet workbook (optional) |
---
## 6. Data Flow Diagram
```
External APIs βββΆ data.py βββΆ PostgreSQL/SQLite
β
ββββββββ΄βββββββ
β core_engine β
β load_data() β
ββββββββ¬ββββββββ
β
βββββββββββββΌββββββββββββ
βΌ βΌ βΌ
solver.py backtest.py validation.py
β β β
βΌ βΌ βΌ
OptBundle ValBundle Test Results
β β β
βββββββ¬ββββββββββββββββββ
βΌ
report_data.py βββΆ report_html.py
β
βΌ
output/*.html
output/*.csv
output/*.pdf
```
---
## 7. Configuration-Driven Behaviour
The pipeline's behaviour is heavily parameterised via `config.py`. Key configuration axes include:
| Parameter | Effect |
|--------------------------|-------------------------------------------------------|
| `model` (1β7) | Selects expected return model (see `docs/MODELS.md`) |
| `allocation_engine` (1β3)| Mean-Variance (CVXPY), HRP, or Exact Risk Parity (see `docs/ALLOCATION_ENGINES.md`) |
| `max_assets` | Cardinality constraint: max number of non-zero positions |
| `garch_enabled` | Enables GARCH(1,1) covariance scaling |
| `cvar_enabled` | Adds CVaR tail-risk constraint to CVXPY formulation |
| `tax_enabled` | Activates tax-aware optimisation with cost-basis tracking |
| `hmm_regime` | Enables HMM regime detection |
| `dynamic_risk` | Enables VIX-based risk aversion adjustment |
| `with_futures` | Enables futures overlay optimisation |
| `return_frequency` | Daily or monthly return aggregation |
---
## 8. Error Handling & Graceful Degradation
The pipeline employs multiple fallback mechanisms:
1. **Constraint Relaxation Cascade:** 7-stage progressive constraint relaxation (see `RELAXATION_CASCADE.md`).
2. **Data Fallback:** If PostgreSQL is unreachable, the engine falls back to local SQLite.
3. **Model Fallback:** If ML ensemble training fails, the engine falls back to CAPM.
4. **Report Fallback:** If PDF export fails (no headless browser), only HTML is generated.
These mechanisms ensure the pipeline always produces output, even under degraded conditions.
---
## References
- Bailey, D. H., Borwein, J. M., LΓ³pez de Prado, M., & Zhu, Q. J. (2014). Pseudo-mathematics and financial charlatanism: The effects of backtest overfitting on out-of-sample performance. *Notices of the AMS*, 61(5), 458β471.
- Bailey, D. H., & LΓ³pez de Prado, M. (2012). The Sharpe ratio efficient frontier. *Journal of Risk*, 15(2), 3β44.
- Bailey, D. H., & LΓ³pez de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting, and non-normality. *Journal of Portfolio Management*, 40(5), 94β107.
- Christoffersen, P. (1998). Evaluating interval forecasts. *International Economic Review*, 39(4), 841β862.
- Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. *Journal of Business & Economic Statistics*, 13(3), 253β263.
- Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. *Journal of Multivariate Analysis*, 88(2), 365β411.
- Markowitz, H. (1952). Portfolio selection. *Journal of Finance*, 7(1), 77β91.
|