Spaces:
Sleeping
Sleeping
File size: 20,158 Bytes
558db1e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 | # Portfolio Engine β Complete Architecture Reference
## Abstract
This document is the master reference for the entire Portfolio Engine codebase. It describes every module, its purpose, its key functions, and how it connects to the rest of the system. When a topic is explained in full depth in a dedicated document, this file links to it rather than duplicating the content. After reading this document, you should understand what every file does, how data flows through the system, and where to find detailed explanations of each subsystem.
---
## 1. System Overview
The Portfolio Engine is an institutional-grade quantitative portfolio allocation system. It ingests market data, estimates expected returns and risk, solves a constrained convex optimization problem to produce target portfolio weights, validates those weights via out-of-sample econometric tests, and generates interactive HTML reports.
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Portfolio Engine β
β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββ ββββββββββββββββ β
β β Data βββΆβ Risk & ReturnβββΆβ Convex βββΆβ Reporting & β β
β β Ingestionβ β Modeling β β Optimization β β Analytics β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββ ββββββββββββββββ β
β β β β β β
β data.py models.py solver.py report.py β
β database.py dl_models.py cvxpy_engine.py analytics.py β
β alternative_ forecast_ hrp_engine.py validation.py β
β data.py generation.py erc_engine.py backtest.py β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## 2. Complete File Map
Every Python file in the project, grouped by functional layer.
### 2.1 Orchestration & Entry Points
| File | Purpose |
|------|---------|
| `main.py` | CLI entry point; invokes the pipeline |
| `core_engine.py` | `PortfolioPipeline` class β the orchestrator (validate β optimize β report) |
| `config.py` | Configuration Facade importing schema, IO, logging, and constants |
| `config_schema.py`| Pydantic `AppConfig` and validation rules |
| `config_io.py` | File loading/saving for configuration dictionaries |
| `constants.py` | Centralized magic numbers, UI formatting, and mapping dictionaries |
| `logger.py` | JSON rotating log configuration |
| `core_types.py` | Shared dataclasses: `PortfolioState`, `ForecastResult`, `CovarianceResult`, `OptimizationResult`, `OptimizationError`, `EngineConfig`, etc. |
| `api.py` | FastAPI REST endpoints for headless/programmatic execution |
| `server.py` | Lightweight HTTP server to serve generated HTML reports |
| `dashboard.py` | Interactive CLI wizard for portfolio configuration |
### 2.2 Data Ingestion & Persistence
| File | Purpose |
|------|---------|
| `data.py` | Market data fetching (yfinance), Fama-French factor download, ML feature engineering (`build_ml_features()`), credit spread proxies, extended history stitching, and block bootstrapping |
| `data_repository.py` | **[NEW]** `DataRepository` class. Centralized abstraction layer responsible for invoking data fetchers, cleaning returned series, standardizing timestamps, and returning a unified `DataSnapshot` for the engine. |
| `database.py` | SQLAlchemy ORM models (`DailyPrice`, `DailyYield`), PostgreSQL/SQLite connection pooling via `get_pg_engine()`, schema initialization |
| `alternative_data.py` | **[NEW]** Options flow sentiment: Put/Call volume ratios, Implied Volatility skew extraction from yfinance options chains. Parallelized across assets |
| `fixed_income.py` | Bond pricing: clean price from yield, duration, convexity, and synthetic historical price generation for direct bonds |
| `futures_data.py` | Futures continuous contract construction via Panama Canal stitching method |
#### Data Flow
```
Yahoo Finance βββ
FRED API ββββββββ€
Kenneth French ββ€βββΆ data.py βββΆ PostgreSQL/SQLite βββΆ data_repository.py βββΆ core_engine.load_data()
Options Chains ββ β
ββββββββ΄βββββββ
β DataSnapshotβ
β (returns_df,β
β ff_df, rfr,β
β yield_df) β
βββββββββββββββ
```
The **feature engineering pipeline** (`build_ml_features()`) transforms raw returns into a per-asset feature matrix with momentum, volatility, factor exposure, and alternative data columns. Non-overlapping sampling prevents serial correlation in the training target. See [MODELS.md](MODELS.md) Β§ 6 for the full feature list.
### 2.3 Return Forecasting & Risk Modeling
> **Detailed reference:** [MODELS.md](MODELS.md)
| File | Purpose |
|------|---------|
| `models.py` | All 7 return forecasting models (CAPM, BL, Bayesian, FF, ML Ensemble, E2E, Regime-Adaptive), covariance estimation (Ledoit-Wolf, hybrid block-diagonal), GARCH scaling, and the meta-learner stacking pipeline |
| `dl_models.py` | **[NEW]** PyTorch `NoiseFilteredTransformer` (Conv1D + Transformer Encoder), `CrossAssetSequenceDataset`, and `train_cross_asset_transformer()` training loop |
| `forecast_generation.py` | `_generate_forecasts()` β the Strategy Pattern router that selects and executes the correct model, applies fixed-income overrides, and returns a `ForecastResult` |
| `bl_bridge.py` | Black-Litterman integration bridge: `compute_bl_posterior()` combines ML views with the BL equilibrium prior; `scale_uncertainty_by_regime()` modulates view confidence |
| `e2e_forecast_model.py` | End-to-End Differentiable Optimization (Model 6): forecast network, differentiable CVXPY layer, SPO+ loss training |
| `regime_detection.py` | Hidden Markov Model (HMM) regime classifier for benchmark returns; `dynamic_risk_aversion()` VIX-based risk adjustment |
| `bayesian_online.py` | Bayesian Online Change-Point Detection (BOCD) for structural break identification in return series |
| `generative_scenarios.py` | Monte Carlo scenario generation from fitted covariance models |
| `math_utils.py` | Shared mathematical utilities: `compute_risk_contributions()` for marginal risk decomposition |
### 2.4 Portfolio Optimization
> **Detailed reference:** [ALLOCATION_ENGINES.md](ALLOCATION_ENGINES.md)
| File | Purpose |
|------|---------|
| `solver.py` | Master optimization router: `build_and_optimize()` for single-period; `multi_period_optimize()` for MPC stochastic programming. Routes to Engine 1, 2, or 3. Computes efficient frontier, risk contributions, and sensitivity analysis |
| `cvxpy_engine.py` | `CVXPYOptimizationEngine` β Mean-Variance quadratic programming with full constraint suite, 7-stage relaxation cascade, cardinality heuristic, CVaR tail-risk, and Almgren-Chriss market impact |
| `hrp_engine.py` | Hierarchical Risk Parity: agglomerative clustering, quasi-diagonalisation, recursive bisection, and tax-aware blending |
| `erc_engine.py` | **[NEW]** Exact True Risk Parity: Spinu logarithmic barrier formulation via CVXPY (SCS/ECOS solver) |
| `constraints.py` | Constraint pre-processing: `check_and_fix_bounds()` for sanitising user inputs, `make_nearest_psd()` for covariance matrix repair |
| `differentiable_optimizer.py` | `cvxpylayers`-based differentiable portfolio layer for gradient flow in Model 6 |
| `futures_overlay.py` | Futures overlay optimizer: beta hedge, duration hedge, or volatility dampening via ES/MES futures |
| `safety.py` | Pre-trade safety checks: position limits, concentration alerts, and drawdown circuit breakers |
#### Optimization Flow
```
forecast_generation.py
β
βΌ
ForecastResult (exp_rets, covariance, betas, garch_info)
β
βΌ
solver.py ββ allocation_engine == 1 βββΆ cvxpy_engine.py (Mean-Variance)
ββ allocation_engine == 2 βββΆ hrp_engine.py (HRP)
ββ allocation_engine == 3 βββΆ erc_engine.py (Exact Risk Parity)
β
βΌ
OptimizationResult (weights, model_info, risk_contributions, ef_curve)
```
### 2.5 Validation & Econometrics
| File | Purpose |
|------|---------|
| `validation.py` | Four econometric tests: Christoffersen Conditional Coverage, Diebold-Mariano, Probabilistic Sharpe Ratio (PSR), Deflated Sharpe Ratio (DSR). See [PIPELINE.md](PIPELINE.md) Β§ 3 for mathematical formulations |
| `backtest.py` | Walk-forward expanding window cross-validation (`expanding_window_backtest()`), Monte Carlo simulation (`monte_carlo()`), and rolling performance metrics |
| `analytics.py` | Portfolio sensitivity analysis (Β±10% return perturbation), historical stress testing (2008 GFC, 2020 COVID, rate shock, tech crash), behavioural diagnostics |
| `risk_attribution.py` | Factor exposure decomposition, marginal VaR, CVaR component attribution, and stress correlation analysis |
| `overlay_analytics.py` | Futures overlay analytics: aggregated overlay returns, margin call simulation |
| `simulation.py` | Monte Carlo and historical simulation engines for risk budgeting |
| `audit_reproducibility.py` | Bit-exact reproducibility verification: hashes inputs and outputs across runs |
### 2.6 Reporting & Output
> **Detailed reference:** [OUTPUT.md](OUTPUT.md)
| File | Purpose |
|------|---------|
| `report.py` | Report orchestrator: coordinates data preparation β HTML rendering β file output |
| `report_data.py` | `prepare_template_variables()` β transforms mathematical outputs into HTML fragments and Chart.js data payloads (~675 lines) |
| `report_html.py` | HTML rendering layer: substitutes template variables into `report_template.html` |
| `report_template.html` | 26KB static HTML template with Chart.js initialization, dark theme, and responsive CSS |
| `report_chart.py` | Chart.js payload generators for equity curves, pie charts, efficient frontiers, Monte Carlo fans |
| `chart_data.py` | Lightweight chart data serialization utilities |
| `model_visuals.py` | Model-specific visualization helpers (factor exposure plots, GARCH regime charts) |
| `narrative.py` | Natural-language narrative generation summarising portfolio strategy and market conditions |
| `table_builder.py` | HTML table construction utilities |
| `exports.py` | CSV, Excel, and PDF export (`export_csv()`, `export_excel()`) |
| `report_builders/` | Modular HTML section builders for performance, risk, and tax report sections |
### 2.7 Execution & Infrastructure
| File | Purpose |
|------|---------|
| `execution.py` | IBKR execution stubs, order management, and paper trading interface (19KB, not yet production-connected) |
| `Dockerfile` | Container image definition (Python 3.11-slim) |
| `docker-compose.yml` | Local development environment with PostgreSQL 15 and Redis 7 |
| `deploy/helm/` | Helm chart for Kubernetes deployment (see [DEPLOY.md](DEPLOY.md)) |
| `pyproject.toml` | Project metadata, pytest configuration, and build system |
| `requirements.txt` | Python dependency manifest |
| `setup.py` | Legacy setuptools configuration |
| `.github/workflows/ci.yml` | GitHub Actions CI pipeline (lint, type-check, test) |
| `.pre-commit-config.yaml` | Pre-commit hooks configuration |
### 2.8 Research & Experimental
> **Detailed reference:** [RESEARCH.md](RESEARCH.md)
| File | Purpose |
|------|---------|
| `research/dreamer/` | DreamerV2 world-model RL agent adapted for financial time series |
| `research/cybernetic.py` | PID volatility controller and adaptive risk setpoint |
| `research/cybernetic_ensemble.py` | Three-layer cybernetic control hierarchy |
| `run_simulation.py` | Standalone simulation script for research experiments |
| `debug_validation.py` | Debugging utilities for validation pipeline |
### 2.9 Tests
> **Detailed reference:** [TESTS.md](TESTS.md)
| File | Purpose |
|------|---------|
| `tests/test_optimize.py` | Constraint logic, mean-variance, HRP, multi-period optimization |
| `tests/test_simulate.py` | End-to-end integration test |
| `tests/test_e2e.py` | Differentiable optimization pipeline |
| `tests/test_models.py` | Return model correctness |
| `tests/test_analytics.py` | Backtest engine, Sharpe, Sortino, Calmar |
| `tests/test_data.py` | Data fetching, missing-data handling |
| `tests/test_validation.py` | Econometric test statistical properties |
| `tests/test_new_features.py` | **[NEW]** Transformer training/inference, options flow sentiment extraction, and exact risk parity mathematical verification |
| `test_audit.py` | Reproducibility audit |
| `test_perf.py` | Performance benchmarks |
---
## 3. Configuration System
**`config.py` β `AppConfig`**
The engine is driven by a Pydantic-validated configuration schema. The `AppConfig` class enforces type safety and cross-field validation (e.g., `single_asset_min` β€ `single_asset_max`). Configuration is loaded from `output/portfolio_config.json`, merged with `constraints.json`, and can be overridden programmatically via the API or CLI.
Key configuration axes:
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model` | int (1β7) | 5 | Return forecasting model selection |
| `allocation_engine` | int (1β3) | 1 | Optimization engine: 1=MV, 2=HRP, 3=ERC |
| `max_assets` | int | None | Cardinality constraint (max non-zero positions) |
| `risk_free_rate` | float | 0.04 | Annual risk-free rate |
| `single_asset_min` | float | -1.0 | Min weight per asset (negative = shorting) |
| `single_asset_max` | float | 0.40 | Max weight per asset |
| `sector_limit` | float | 0.40 | Max aggregate weight per sector |
| `gross_leverage_cap` | float | 2.0 | Maximum gross leverage (L1 norm of weights) |
| `max_turnover` | float | 3.0 | Maximum total turnover per rebalance |
| `garch_enabled` | bool | True | Enable GARCH(1,1) covariance scaling |
| `cvar_enabled` | bool | True | Enable CVaR tail-risk constraint |
| `tax_enabled` | bool | False | Enable tax-aware optimization |
| `hmm_regime` | bool | True | Enable HMM regime detection |
| `dynamic_risk` | bool | True | VIX-based risk aversion adjustment |
| `with_futures` | bool | False | Enable futures overlay |
| `extended_history` | bool | False | Extended history via proxy stitching |
See `config.py` for the full schema and validation rules.
---
## 4. Data Structures
The engine communicates between layers via typed dataclasses defined in `core_types.py`:
### PortfolioState
Tracks the current portfolio: `total_capital`, `current_weights`, `cost_basis`, `tax_rates`, `gain_fractions`, and `tickers`. Created empty for new portfolios or loaded from `portfolio_state.json`.
### ForecastResult
Output of `_generate_forecasts()`: contains `expected_returns`, `covariance_result`, `betas`, `garch_info`, `js_alpha`, `capm_rets`, `ff_betas`, `periods`, `historical_returns`, and `feature_importances`.
### CovarianceResult
Wraps the covariance matrix with its derived correlation matrix, per-asset volatility series, and the Ledoit-Wolf shrinkage intensity Ξ±.
### OptimizationResult
Final output: `weights`, `expected_returns`, `covariance_matrix`, `volatility`, `correlation_matrix`, `betas`, and a `model_info` dictionary containing all metadata (risk contributions, efficient frontier, relaxation log, binding constraints, duration, GARCH info, feature importances, etc.).
---
## 5. Concurrency & Thread Safety
| Mechanism | Location | Purpose |
|-----------|----------|---------|
| `_yf_lock` (threading.Lock) | `data.py` | Rate-limits yfinance API calls to max 2/sec |
| `_ML_CACHE_LOCK` (threading.Lock) | `models.py` | Thread-safe caching of trained ML ensemble models |
| `_ef_cache_lock` (threading.Lock) | `solver.py` | Thread-safe efficient frontier LRU cache |
| `ThreadPoolExecutor` | `data.py`, `alternative_data.py` | Parallel data fetching (max 10 workers) |
---
## 6. Graceful Degradation
The engine is designed to always produce output, even under degraded conditions:
| Failure Mode | Fallback |
|-------------|----------|
| PostgreSQL unreachable | Falls back to local SQLite |
| ML ensemble training fails | Falls back to CAPM expected returns |
| PyTorch not installed | Transformer predictions silently skipped; ensemble uses only XGBoost + ElasticNet |
| Options data fetch fails | Returns neutral defaults (PCR=1.0, skew=0.0) |
| GARCH fitting fails | Uses unconditional covariance (no scaling) |
| Fama-French download fails | Models 4/5 fall back to CAPM |
| CVXPY solver infeasible | 7-stage constraint relaxation cascade |
| All constraints infeasible | 100% cash allocation |
| PDF export fails | Only HTML report generated |
| MPC multi-period fails | Falls back to single-period optimization |
---
## 7. Dependency Graph
```
core_engine.py
βββ config.py
βββ core_types.py
βββ data.py
β βββ database.py
β βββ fixed_income.py
β βββ alternative_data.py
βββ regime_detection.py
βββ solver.py
β βββ forecast_generation.py
β β βββ models.py
β β β βββ dl_models.py
β β βββ bl_bridge.py
β βββ cvxpy_engine.py
β β βββ constraints.py
β βββ hrp_engine.py
β βββ erc_engine.py
βββ backtest.py
βββ validation.py
βββ analytics.py
β βββ risk_attribution.py
βββ report.py
β βββ report_data.py
β βββ report_html.py
β βββ report_builders/
βββ exports.py
βββ futures_overlay.py
β βββ futures_data.py
β βββ overlay_analytics.py
βββ execution.py
βββ server.py
```
**Circular dependency rule:** The dependency chain follows a strict unidirectional flow: `config` β `core_types` β `data` β `models` β `solver` β `analytics` β `report`. Lazy imports are used in `forecast_generation.py` and `solver.py` to avoid circular references at module load time.
---
## 8. Document Index
| Document | What It Covers |
|----------|----------------|
| **[ARCHITECTURE.md](ARCHITECTURE.md)** (this file) | Master reference β complete file map, data flow, dependency graph |
| [PIPELINE.md](PIPELINE.md) | 4-stage pipeline execution model, data flow diagrams, configuration axes |
| [MODELS.md](MODELS.md) | All 7 return forecasting models, covariance estimators, GARCH, BL bridge, alternative data, and Transformer |
| [ALLOCATION_ENGINES.md](ALLOCATION_ENGINES.md) | Mean-Variance (CVXPY), HRP, and Exact Risk Parity engines; cardinality constraints; MPC multi-period |
| [RELAXATION_CASCADE.md](RELAXATION_CASCADE.md) | 7-stage progressive constraint relaxation for infeasible CVXPY solves |
| [TESTS.md](TESTS.md) | Test suite design, mocking strategy, property-based testing, and full test inventory |
| [OUTPUT.md](OUTPUT.md) | Output directory structure and artefact descriptions |
| [DEPLOY.md](DEPLOY.md) | Docker, Helm, Kubernetes, CI/CD, and production considerations |
| [RESEARCH.md](RESEARCH.md) | Experimental modules: PID controller, Dreamer RL, cybernetic ensemble |
|