Spaces:
Sleeping
Sleeping
| # Pipeline Architecture | |
| ## Abstract | |
| The Portfolio Engine implements a multi-stage pipeline that transforms raw market data into optimised portfolio allocations, validated through rigorous out-of-sample econometric testing, and exported as interactive reports. This document describes the full execution flow, the data structures that mediate inter-stage communication, the mathematical validation framework, and the report generation subsystem. It serves as the architectural reference for understanding how the engine's components compose into a coherent analytical system. | |
| --- | |
| ## 1. Pipeline Overview | |
| The engine is orchestrated by the `PortfolioPipeline` class in `core_engine.py`, which implements a four-stage execution model: | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Pipeline Stages β | |
| β β | |
| β βββββββββββββ ββββββββββββββββββ βββββββββββββ βββββββββββββ β | |
| β β Stage 1 ββββΆβ Stage 2 ββββΆβ Stage 3 ββββΆβ Stage 4 β β | |
| β β load_data()β βrun_validation()β β optimize() β β reports() β β | |
| β βββββββββββββ ββββββββββββββββββ βββββββββββββ βββββββββββββ β | |
| β β | |
| β Data Fetch Walk-Forward CV Full-Sample HTML + CSV β | |
| β Regime Detect Econometric Tests Optimisation PDF Export β | |
| β Risk Aversion DM / Christoffersen Sensitivity Serve β | |
| β Adjustment PSR / DSR Stress Test β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### Entry Point | |
| ```python | |
| def run_engine(overrides=None): | |
| pipeline = PortfolioPipeline(overrides=overrides) | |
| pipeline.load_data() | |
| val_bundle = pipeline.run_validation() | |
| opt_bundle = pipeline.optimize() | |
| pipeline.generate_reports(val_bundle, opt_bundle) | |
| ``` | |
| The `overrides` dictionary enables headless execution from the API layer (`api.py`), test harnesses, or scheduled batch jobs, bypassing the interactive CLI wizard. | |
| --- | |
| ## 2. Stage 1 β Data Loading (`load_data`) | |
| ### 2.1 Data Sources | |
| | Source | Target | Module | | |
| |---------------------------|-----------------------|---------------------| | |
| | Yahoo Finance / DB Cache | Daily OHLCV prices | `data.py` | | |
| | Kenneth French Library | Fama-French factors | `data.py` | | |
| | FRED / ^TNX proxy | Risk-free rate series | `data.py` | | |
| | PostgreSQL / SQLite | Cached price data | `database.py` | | |
| ### 2.2 Data Validation | |
| - **Minimum History:** Assets must have β₯ 2Γ `trading_days_per_year` (default: 504 business days) of return history to be included. Assets with insufficient history are silently dropped. | |
| - **Missing Data:** Returns DataFrames are constructed via `pd.DataFrame.dropna()`, ensuring a common date index across all assets. | |
| - **Frequency Conversion:** When `return_frequency = 'monthly'`, daily returns are geometrically compounded to monthly via `build_monthly_returns()`. | |
| ### 2.3 Regime Detection | |
| If `hmm_regime = True` (default), the engine fits a Hidden Markov Model to benchmark returns via `regime_detection.detect_volatility_regime()`. The detected regime (Bull, Normal, Crash) informs: | |
| - Dynamic risk aversion adjustment (Stage 2 and 3). | |
| - PID volatility target in the research cybernetic ensemble. | |
| - Report visualisation annotations. | |
| ### 2.4 Dynamic Risk Aversion | |
| If `dynamic_risk = True` (default), the VIX level is used to adjust the user's stated risk aversion via `regime_detection.dynamic_risk_aversion()`. This implements a counter-cyclical risk management policy: risk aversion increases during high-volatility episodes, reducing exposure before drawdowns deepen. | |
| --- | |
| ## 3. Stage 2 β Walk-Forward Validation (`run_validation`) | |
| ### 3.1 Expanding Window Cross-Validation | |
| The engine performs expanding-window (walk-forward) backtesting via `backtest.expanding_window_backtest()`: | |
| 1. An initial training window of `OOS_TRAIN_DAYS` (total days β 252) is established. | |
| 2. The model is trained on the expanding window and produces out-of-sample weights. | |
| 3. Weights are rebalanced every `trading_days / 4` periods (quarterly). | |
| 4. An out-of-sample equity curve is constructed from realised returns. | |
| This methodology prevents look-ahead bias and is the gold standard for strategy validation in quantitative finance (Bailey et al., 2014). | |
| ### 3.2 Econometric Tests | |
| The validation stage runs four statistical tests on the out-of-sample returns: | |
| #### Christoffersen Conditional Coverage Test | |
| Tests whether Value-at-Risk (VaR) exceedances are both correctly calibrated (unconditional coverage) and serially independent (no volatility clustering in violations). A joint likelihood ratio statistic is computed: | |
| ``` | |
| LR_cc = LR_uc + LR_ind ~ ΟΒ²(2) | |
| ``` | |
| **Pass Criterion:** p-value > 0.05 for both components. | |
| #### Diebold-Mariano Test | |
| Tests whether the engine's expected return model statistically outperforms a naive historical mean baseline in terms of out-of-sample prediction accuracy: | |
| ``` | |
| DM = dΜ / ΟΜ(d) ~ N(0, 1) | |
| ``` | |
| where d_t = |eβ_t| β |eβ_t| is the loss differential (MAE loss function). The test is robust to heteroskedasticity via Newey-West variance estimation. | |
| **Pass Criterion:** p-value < 0.05 and the engine's model wins. | |
| #### Probabilistic Sharpe Ratio (PSR) | |
| Accounts for the non-normality of returns (skewness and kurtosis) when evaluating whether the observed Sharpe ratio is statistically distinguishable from a benchmark value of zero (Bailey & LΓ³pez de Prado, 2012): | |
| ``` | |
| PSR = Ξ¦[(SR β SR*) Β· β(n-1) / β(1 β Ξ³βΒ·SR + (Ξ³ββ1)/4 Β· SRΒ²)] | |
| ``` | |
| where Ξ³β and Ξ³β are the sample skewness and kurtosis. | |
| **Pass Criterion:** PSR > 0.95 (95% confidence that the true Sharpe exceeds zero). | |
| #### Deflated Sharpe Ratio (DSR) | |
| Adjusts for multiple testing bias when the engine evaluates K candidate models (Bailey & LΓ³pez de Prado, 2014). The expected maximum Sharpe ratio under the null hypothesis (all models have zero alpha) is: | |
| ``` | |
| E[max(SR)] β β(2Β·ln(K)) β [Ξ³ + ln(Ο/2)] / [2Β·β(2Β·ln(K))] | |
| ``` | |
| The DSR then tests whether the observed Sharpe significantly exceeds this multiple-testing threshold. | |
| **Pass Criterion:** DSR > 0.95. | |
| ### 3.3 Output | |
| The validation stage produces a `ValidationBundle` dataclass: | |
| ```python | |
| @dataclass | |
| class ValidationBundle: | |
| oos_eq: pd.Series # Out-of-sample equity curve | |
| oos_bench_curve: pd.Series # Benchmark equity curve | |
| oos_port_rets: pd.Series # Out-of-sample portfolio returns | |
| wf_ann_ret: float # Walk-forward annualised return | |
| var_results: dict # Christoffersen test results | |
| dm_results: dict # Diebold-Mariano test results | |
| psr_results: dict # Probabilistic Sharpe Ratio | |
| dsr_results: dict # Deflated Sharpe Ratio | |
| ``` | |
| --- | |
| ## 4. Stage 3 β Full-Sample Optimisation (`optimize`) | |
| ### 4.1 Solver Invocation | |
| The full historical dataset is passed to `solver.build_and_optimize()`, which: | |
| 1. Computes expected returns using the selected model (CAPM, BL, Fama-French, Bayesian, or ML Stacking). | |
| 2. Estimates the covariance matrix with Ledoit-Wolf shrinkage and optional GARCH scaling. | |
| 3. Formulates and solves the convex optimisation problem via the CVXPY engine. | |
| 4. Applies the 7-stage constraint relaxation cascade if the initial formulation is infeasible (see `docs/RELAXATION_CASCADE.md`). | |
| ### 4.2 Sensitivity & Stress Analysis | |
| Post-optimisation, the engine runs two diagnostic analyses: | |
| - **Sensitivity Analysis** (`analytics.portfolio_sensitivity`): Perturbs expected returns by Β±10% and re-solves, measuring the weight response range per asset. Assets with >15pp swings are flagged as "fragile." | |
| - **Stress Testing** (`analytics.portfolio_stress_test`): Evaluates portfolio impact under historical crash scenarios (e.g., 2008 GFC, 2020 COVID, rate shock, tech crash). | |
| If fragile allocations are detected and the allocation engine is Mean-Variance (engine 1), a stability penalty is added to the objective function and the solver is re-invoked. | |
| ### 4.3 Output | |
| ```python | |
| @dataclass | |
| class OptimizationBundle: | |
| weights: pd.Series # Final target weights | |
| exp_rets: pd.Series # Expected returns per asset | |
| cov_mat: pd.DataFrame # Covariance matrix | |
| vol: float # Portfolio volatility | |
| corr_matrix: pd.DataFrame # Correlation matrix | |
| betas: pd.Series # Market betas | |
| model_info: dict # Model metadata | |
| sens_report: dict # Sensitivity analysis | |
| stress_report: dict # Stress test results | |
| n_fragile: int # Count of fragile allocations | |
| ``` | |
| --- | |
| ## 5. Stage 4 β Report Generation (`generate_reports`) | |
| ### 5.1 Architecture | |
| Report generation follows a three-layer architecture: | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β report.py (Orchestrator) β | |
| β Coordinates data β template β file pipeline β | |
| ββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ€ | |
| β report_data.py β report_html.py β | |
| β (Data Layer) β (Rendering Layer) β | |
| β Formats all β Injects variables into β | |
| β mathematical β report_template.html β | |
| β outputs into β static template β | |
| β template vars β β | |
| ββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ | |
| ``` | |
| ### 5.2 Report Data Layer β `report_data.py` | |
| The `prepare_template_variables()` function is the largest single function in the codebase (~675 lines). It transforms raw mathematical outputs into presentation-ready HTML fragments and Chart.js data payloads. Key computations include: | |
| - **Advanced Risk Metrics:** CVaR (95%), Conditional Drawdown-at-Risk (CDaR), Mean Absolute Deviation (MAD), and semi-deviation. | |
| - **Transition Comparisons:** When the user provides current holdings, the report computes before/after comparisons for all metrics. | |
| - **Chart Payload:** A JSON dictionary consumed by Chart.js for interactive equity curves, allocation pie charts, efficient frontier plots, Monte Carlo fan charts, and risk contribution bar charts. | |
| - **Narrative Generation:** `narrative.py` produces a natural-language summary of the portfolio strategy, market conditions, and key risk factors. | |
| ### 5.3 HTML Rendering β `report_html.py` | |
| The rendering layer substitutes template variables into `report_template.html`, a 26KB static template with Chart.js initialisation scripts. The template uses CSS-in-HTML styling with a dark theme optimised for screen presentation. | |
| ### 5.4 Export Formats | |
| | Format | Module | Content | | |
| |----------|----------------|--------------------------------------------| | |
| | HTML | `report.py` | Interactive report with Chart.js | | |
| | PDF | `exports.py` | Static rendering via headless browser | | |
| | CSV | `exports.py` | Tabular weight/allocation summary | | |
| | Excel | `exports.py` | Multi-sheet workbook (optional) | | |
| --- | |
| ## 6. Data Flow Diagram | |
| ``` | |
| External APIs βββΆ data.py βββΆ PostgreSQL/SQLite | |
| β | |
| ββββββββ΄βββββββ | |
| β core_engine β | |
| β load_data() β | |
| ββββββββ¬ββββββββ | |
| β | |
| βββββββββββββΌββββββββββββ | |
| βΌ βΌ βΌ | |
| solver.py backtest.py validation.py | |
| β β β | |
| βΌ βΌ βΌ | |
| OptBundle ValBundle Test Results | |
| β β β | |
| βββββββ¬ββββββββββββββββββ | |
| βΌ | |
| report_data.py βββΆ report_html.py | |
| β | |
| βΌ | |
| output/*.html | |
| output/*.csv | |
| output/*.pdf | |
| ``` | |
| --- | |
| ## 7. Configuration-Driven Behaviour | |
| The pipeline's behaviour is heavily parameterised via `config.py`. Key configuration axes include: | |
| | Parameter | Effect | | |
| |--------------------------|-------------------------------------------------------| | |
| | `model` (1β7) | Selects expected return model (see `docs/MODELS.md`) | | |
| | `allocation_engine` (1β3)| Mean-Variance (CVXPY), HRP, or Exact Risk Parity (see `docs/ALLOCATION_ENGINES.md`) | | |
| | `max_assets` | Cardinality constraint: max number of non-zero positions | | |
| | `garch_enabled` | Enables GARCH(1,1) covariance scaling | | |
| | `cvar_enabled` | Adds CVaR tail-risk constraint to CVXPY formulation | | |
| | `tax_enabled` | Activates tax-aware optimisation with cost-basis tracking | | |
| | `hmm_regime` | Enables HMM regime detection | | |
| | `dynamic_risk` | Enables VIX-based risk aversion adjustment | | |
| | `with_futures` | Enables futures overlay optimisation | | |
| | `return_frequency` | Daily or monthly return aggregation | | |
| --- | |
| ## 8. Error Handling & Graceful Degradation | |
| The pipeline employs multiple fallback mechanisms: | |
| 1. **Constraint Relaxation Cascade:** 7-stage progressive constraint relaxation (see `RELAXATION_CASCADE.md`). | |
| 2. **Data Fallback:** If PostgreSQL is unreachable, the engine falls back to local SQLite. | |
| 3. **Model Fallback:** If ML ensemble training fails, the engine falls back to CAPM. | |
| 4. **Report Fallback:** If PDF export fails (no headless browser), only HTML is generated. | |
| These mechanisms ensure the pipeline always produces output, even under degraded conditions. | |
| --- | |
| ## References | |
| - Bailey, D. H., Borwein, J. M., LΓ³pez de Prado, M., & Zhu, Q. J. (2014). Pseudo-mathematics and financial charlatanism: The effects of backtest overfitting on out-of-sample performance. *Notices of the AMS*, 61(5), 458β471. | |
| - Bailey, D. H., & LΓ³pez de Prado, M. (2012). The Sharpe ratio efficient frontier. *Journal of Risk*, 15(2), 3β44. | |
| - Bailey, D. H., & LΓ³pez de Prado, M. (2014). The deflated Sharpe ratio: Correcting for selection bias, backtest overfitting, and non-normality. *Journal of Portfolio Management*, 40(5), 94β107. | |
| - Christoffersen, P. (1998). Evaluating interval forecasts. *International Economic Review*, 39(4), 841β862. | |
| - Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. *Journal of Business & Economic Statistics*, 13(3), 253β263. | |
| - Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. *Journal of Multivariate Analysis*, 88(2), 365β411. | |
| - Markowitz, H. (1952). Portfolio selection. *Journal of Finance*, 7(1), 77β91. | |