File size: 16,354 Bytes
558db1e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
# Pipeline Architecture

## Abstract

The Portfolio Engine implements a multi-stage pipeline that transforms raw market data into optimised portfolio allocations, validated through rigorous out-of-sample econometric testing, and exported as interactive reports. This document describes the full execution flow, the data structures that mediate inter-stage communication, the mathematical validation framework, and the report generation subsystem. It serves as the architectural reference for understanding how the engine's components compose into a coherent analytical system.

---

## 1. Pipeline Overview

The engine is orchestrated by the `PortfolioPipeline` class in `core_engine.py`, which implements a four-stage execution model:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Pipeline Stages                                 β”‚
β”‚                                                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚  Stage 1   │──▢│    Stage 2      │──▢│  Stage 3   │──▢│  Stage 4   β”‚ β”‚
β”‚   β”‚ load_data()β”‚   β”‚run_validation()β”‚   β”‚ optimize() β”‚   β”‚ reports() β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                        β”‚
β”‚   Data Fetch      Walk-Forward CV      Full-Sample      HTML + CSV    β”‚
β”‚   Regime Detect   Econometric Tests    Optimisation      PDF Export   β”‚
β”‚   Risk Aversion   DM / Christoffersen  Sensitivity       Serve       β”‚
β”‚   Adjustment      PSR / DSR            Stress Test                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### Entry Point

```python
def run_engine(overrides=None):
    pipeline = PortfolioPipeline(overrides=overrides)
    pipeline.load_data()
    val_bundle = pipeline.run_validation()
    opt_bundle = pipeline.optimize()
    pipeline.generate_reports(val_bundle, opt_bundle)
```

The `overrides` dictionary enables headless execution from the API layer (`api.py`), test harnesses, or scheduled batch jobs, bypassing the interactive CLI wizard.

---

## 2. Stage 1 β€” Data Loading (`load_data`)

### 2.1 Data Sources

| Source                    | Target                | Module              |
|---------------------------|-----------------------|---------------------|
| Yahoo Finance / DB Cache  | Daily OHLCV prices    | `data.py`           |
| Kenneth French Library    | Fama-French factors   | `data.py`           |
| FRED / ^TNX proxy         | Risk-free rate series | `data.py`           |
| PostgreSQL / SQLite       | Cached price data     | `database.py`       |

### 2.2 Data Validation

- **Minimum History:** Assets must have β‰₯ 2Γ— `trading_days_per_year` (default: 504 business days) of return history to be included. Assets with insufficient history are silently dropped.
- **Missing Data:** Returns DataFrames are constructed via `pd.DataFrame.dropna()`, ensuring a common date index across all assets.
- **Frequency Conversion:** When `return_frequency = 'monthly'`, daily returns are geometrically compounded to monthly via `build_monthly_returns()`.

### 2.3 Regime Detection

If `hmm_regime = True` (default), the engine fits a Hidden Markov Model to benchmark returns via `regime_detection.detect_volatility_regime()`. The detected regime (Bull, Normal, Crash) informs:

- Dynamic risk aversion adjustment (Stage 2 and 3).
- PID volatility target in the research cybernetic ensemble.
- Report visualisation annotations.

### 2.4 Dynamic Risk Aversion

If `dynamic_risk = True` (default), the VIX level is used to adjust the user's stated risk aversion via `regime_detection.dynamic_risk_aversion()`. This implements a counter-cyclical risk management policy: risk aversion increases during high-volatility episodes, reducing exposure before drawdowns deepen.

---

## 3. Stage 2 β€” Walk-Forward Validation (`run_validation`)

### 3.1 Expanding Window Cross-Validation

The engine performs expanding-window (walk-forward) backtesting via `backtest.expanding_window_backtest()`:

1. An initial training window of `OOS_TRAIN_DAYS` (total days βˆ’ 252) is established.
2. The model is trained on the expanding window and produces out-of-sample weights.
3. Weights are rebalanced every `trading_days / 4` periods (quarterly).
4. An out-of-sample equity curve is constructed from realised returns.

This methodology prevents look-ahead bias and is the gold standard for strategy validation in quantitative finance (Bailey et al., 2014).

### 3.2 Econometric Tests

The validation stage runs four statistical tests on the out-of-sample returns:

#### Christoffersen Conditional Coverage Test

Tests whether Value-at-Risk (VaR) exceedances are both correctly calibrated (unconditional coverage) and serially independent (no volatility clustering in violations). A joint likelihood ratio statistic is computed:

```
LR_cc = LR_uc + LR_ind ~ χ²(2)
```

**Pass Criterion:** p-value > 0.05 for both components.

#### Diebold-Mariano Test

Tests whether the engine's expected return model statistically outperforms a naive historical mean baseline in terms of out-of-sample prediction accuracy:

```
DM = dΜ„ / ΟƒΜ‚(d) ~ N(0, 1)
```

where d_t = |e₁_t| βˆ’ |eβ‚‚_t| is the loss differential (MAE loss function). The test is robust to heteroskedasticity via Newey-West variance estimation.

**Pass Criterion:** p-value < 0.05 and the engine's model wins.

#### Probabilistic Sharpe Ratio (PSR)

Accounts for the non-normality of returns (skewness and kurtosis) when evaluating whether the observed Sharpe ratio is statistically distinguishable from a benchmark value of zero (Bailey & LΓ³pez de Prado, 2012):

```
PSR = Ξ¦[(SR βˆ’ SR*) Β· √(n-1) / √(1 βˆ’ γ₃·SR + (Ξ³β‚„βˆ’1)/4 Β· SRΒ²)]
```

where γ₃ and Ξ³β‚„ are the sample skewness and kurtosis.

**Pass Criterion:** PSR > 0.95 (95% confidence that the true Sharpe exceeds zero).

#### Deflated Sharpe Ratio (DSR)

Adjusts for multiple testing bias when the engine evaluates K candidate models (Bailey & LΓ³pez de Prado, 2014). The expected maximum Sharpe ratio under the null hypothesis (all models have zero alpha) is:

```
E[max(SR)] β‰ˆ √(2Β·ln(K)) βˆ’ [Ξ³ + ln(Ο€/2)] / [2·√(2Β·ln(K))]
```

The DSR then tests whether the observed Sharpe significantly exceeds this multiple-testing threshold.

**Pass Criterion:** DSR > 0.95.

### 3.3 Output

The validation stage produces a `ValidationBundle` dataclass:

```python
@dataclass
class ValidationBundle:
    oos_eq: pd.Series            # Out-of-sample equity curve
    oos_bench_curve: pd.Series   # Benchmark equity curve
    oos_port_rets: pd.Series     # Out-of-sample portfolio returns
    wf_ann_ret: float            # Walk-forward annualised return
    var_results: dict            # Christoffersen test results
    dm_results: dict             # Diebold-Mariano test results
    psr_results: dict            # Probabilistic Sharpe Ratio
    dsr_results: dict            # Deflated Sharpe Ratio
```

---

## 4. Stage 3 β€” Full-Sample Optimisation (`optimize`)

### 4.1 Solver Invocation

The full historical dataset is passed to `solver.build_and_optimize()`, which:

1. Computes expected returns using the selected model (CAPM, BL, Fama-French, Bayesian, or ML Stacking).
2. Estimates the covariance matrix with Ledoit-Wolf shrinkage and optional GARCH scaling.
3. Formulates and solves the convex optimisation problem via the CVXPY engine.
4. Applies the 7-stage constraint relaxation cascade if the initial formulation is infeasible (see `docs/RELAXATION_CASCADE.md`).

### 4.2 Sensitivity & Stress Analysis

Post-optimisation, the engine runs two diagnostic analyses:

- **Sensitivity Analysis** (`analytics.portfolio_sensitivity`): Perturbs expected returns by Β±10% and re-solves, measuring the weight response range per asset. Assets with >15pp swings are flagged as "fragile."
- **Stress Testing** (`analytics.portfolio_stress_test`): Evaluates portfolio impact under historical crash scenarios (e.g., 2008 GFC, 2020 COVID, rate shock, tech crash).

If fragile allocations are detected and the allocation engine is Mean-Variance (engine 1), a stability penalty is added to the objective function and the solver is re-invoked.

### 4.3 Output

```python
@dataclass
class OptimizationBundle:
    weights: pd.Series           # Final target weights
    exp_rets: pd.Series          # Expected returns per asset
    cov_mat: pd.DataFrame        # Covariance matrix
    vol: float                   # Portfolio volatility
    corr_matrix: pd.DataFrame    # Correlation matrix
    betas: pd.Series             # Market betas
    model_info: dict             # Model metadata
    sens_report: dict            # Sensitivity analysis
    stress_report: dict          # Stress test results
    n_fragile: int               # Count of fragile allocations
```

---

## 5. Stage 4 β€” Report Generation (`generate_reports`)

### 5.1 Architecture

Report generation follows a three-layer architecture:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              report.py (Orchestrator)             β”‚
β”‚  Coordinates data β†’ template β†’ file pipeline     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ report_data.py β”‚      report_html.py             β”‚
β”‚ (Data Layer)   β”‚      (Rendering Layer)          β”‚
β”‚ Formats all    β”‚      Injects variables into     β”‚
β”‚ mathematical   β”‚      report_template.html       β”‚
β”‚ outputs into   β”‚      static template            β”‚
β”‚ template vars  β”‚                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

### 5.2 Report Data Layer β€” `report_data.py`

The `prepare_template_variables()` function is the largest single function in the codebase (~675 lines). It transforms raw mathematical outputs into presentation-ready HTML fragments and Chart.js data payloads. Key computations include:

- **Advanced Risk Metrics:** CVaR (95%), Conditional Drawdown-at-Risk (CDaR), Mean Absolute Deviation (MAD), and semi-deviation.
- **Transition Comparisons:** When the user provides current holdings, the report computes before/after comparisons for all metrics.
- **Chart Payload:** A JSON dictionary consumed by Chart.js for interactive equity curves, allocation pie charts, efficient frontier plots, Monte Carlo fan charts, and risk contribution bar charts.
- **Narrative Generation:** `narrative.py` produces a natural-language summary of the portfolio strategy, market conditions, and key risk factors.

### 5.3 HTML Rendering β€” `report_html.py`

The rendering layer substitutes template variables into `report_template.html`, a 26KB static template with Chart.js initialisation scripts. The template uses CSS-in-HTML styling with a dark theme optimised for screen presentation.

### 5.4 Export Formats

| Format   | Module         | Content                                    |
|----------|----------------|--------------------------------------------|
| HTML     | `report.py`    | Interactive report with Chart.js           |
| PDF      | `exports.py`   | Static rendering via headless browser      |
| CSV      | `exports.py`   | Tabular weight/allocation summary          |
| Excel    | `exports.py`   | Multi-sheet workbook (optional)            |

---

## 6. Data Flow Diagram

```
External APIs ──▢ data.py ──▢ PostgreSQL/SQLite
                                     β”‚
                              β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
                              β”‚ core_engine  β”‚
                              β”‚ load_data()  β”‚
                              β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                     β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β–Ό           β–Ό           β–Ό
                   solver.py   backtest.py  validation.py
                         β”‚           β”‚           β”‚
                         β–Ό           β–Ό           β–Ό
                   OptBundle   ValBundle    Test Results
                         β”‚           β”‚           β”‚
                         β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β–Ό
                    report_data.py ──▢ report_html.py
                               β”‚
                               β–Ό
                         output/*.html
                         output/*.csv
                         output/*.pdf
```

---

## 7. Configuration-Driven Behaviour

The pipeline's behaviour is heavily parameterised via `config.py`. Key configuration axes include:

| Parameter                | Effect                                                |
|--------------------------|-------------------------------------------------------|
| `model` (1–7)            | Selects expected return model (see `docs/MODELS.md`)  |
| `allocation_engine` (1–3)| Mean-Variance (CVXPY), HRP, or Exact Risk Parity (see `docs/ALLOCATION_ENGINES.md`) |
| `max_assets`             | Cardinality constraint: max number of non-zero positions |
| `garch_enabled`          | Enables GARCH(1,1) covariance scaling                |
| `cvar_enabled`           | Adds CVaR tail-risk constraint to CVXPY formulation  |
| `tax_enabled`            | Activates tax-aware optimisation with cost-basis tracking |
| `hmm_regime`             | Enables HMM regime detection                         |
| `dynamic_risk`           | Enables VIX-based risk aversion adjustment           |
| `with_futures`           | Enables futures overlay optimisation                 |
| `return_frequency`       | Daily or monthly return aggregation                  |

---

## 8. Error Handling & Graceful Degradation

The pipeline employs multiple fallback mechanisms:

1. **Constraint Relaxation Cascade:** 7-stage progressive constraint relaxation (see `RELAXATION_CASCADE.md`).
2. **Data Fallback:** If PostgreSQL is unreachable, the engine falls back to local SQLite.
3. **Model Fallback:** If ML ensemble training fails, the engine falls back to CAPM.
4. **Report Fallback:** If PDF export fails (no headless browser), only HTML is generated.

These mechanisms ensure the pipeline always produces output, even under degraded conditions.

---

## References

- Bailey, D. H., Borwein, J. M., LΓ³pez de Prado, M., & Zhu, Q. J. (2014). Pseudo-mathematics and financial charlatanism: The effects of backtest overfitting on out-of-sample performance. *Notices of the AMS*, 61(5), 458–471.
- Bailey, D. H., & LΓ³pez de Prado, M. (2012). The Sharpe ratio efficient frontier. *Journal of Risk*, 15(2), 3–44.
- Bailey, D. H., & LΓ³pez de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting, and non-normality. *Journal of Portfolio Management*, 40(5), 94–107.
- Christoffersen, P. (1998). Evaluating interval forecasts. *International Economic Review*, 39(4), 841–862.
- Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. *Journal of Business & Economic Statistics*, 13(3), 253–263.
- Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. *Journal of Multivariate Analysis*, 88(2), 365–411.
- Markowitz, H. (1952). Portfolio selection. *Journal of Finance*, 7(1), 77–91.