Spaces:
Sleeping
Portfolio Engine β Complete Architecture Reference
Abstract
This document is the master reference for the entire Portfolio Engine codebase. It describes every module, its purpose, its key functions, and how it connects to the rest of the system. When a topic is explained in full depth in a dedicated document, this file links to it rather than duplicating the content. After reading this document, you should understand what every file does, how data flows through the system, and where to find detailed explanations of each subsystem.
1. System Overview
The Portfolio Engine is an institutional-grade quantitative portfolio allocation system. It ingests market data, estimates expected returns and risk, solves a constrained convex optimization problem to produce target portfolio weights, validates those weights via out-of-sample econometric tests, and generates interactive HTML reports.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Portfolio Engine β
β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββ ββββββββββββββββ β
β β Data βββΆβ Risk & ReturnβββΆβ Convex βββΆβ Reporting & β β
β β Ingestionβ β Modeling β β Optimization β β Analytics β β
β ββββββββββββ ββββββββββββββββ βββββββββββββββββ ββββββββββββββββ β
β β β β β β
β data.py models.py solver.py report.py β
β database.py dl_models.py cvxpy_engine.py analytics.py β
β alternative_ forecast_ hrp_engine.py validation.py β
β data.py generation.py erc_engine.py backtest.py β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2. Complete File Map
Every Python file in the project, grouped by functional layer.
2.1 Orchestration & Entry Points
| File | Purpose |
|---|---|
main.py |
CLI entry point; invokes the pipeline |
core_engine.py |
PortfolioPipeline class β the orchestrator (validate β optimize β report) |
config.py |
Configuration Facade importing schema, IO, logging, and constants |
config_schema.py |
Pydantic AppConfig and validation rules |
config_io.py |
File loading/saving for configuration dictionaries |
constants.py |
Centralized magic numbers, UI formatting, and mapping dictionaries |
logger.py |
JSON rotating log configuration |
core_types.py |
Shared dataclasses: PortfolioState, ForecastResult, CovarianceResult, OptimizationResult, OptimizationError, EngineConfig, etc. |
api.py |
FastAPI REST endpoints for headless/programmatic execution |
server.py |
Lightweight HTTP server to serve generated HTML reports |
dashboard.py |
Interactive CLI wizard for portfolio configuration |
2.2 Data Ingestion & Persistence
| File | Purpose |
|---|---|
data.py |
Market data fetching (yfinance), Fama-French factor download, ML feature engineering (build_ml_features()), credit spread proxies, extended history stitching, and block bootstrapping |
data_repository.py |
[NEW] DataRepository class. Centralized abstraction layer responsible for invoking data fetchers, cleaning returned series, standardizing timestamps, and returning a unified DataSnapshot for the engine. |
database.py |
SQLAlchemy ORM models (DailyPrice, DailyYield), PostgreSQL/SQLite connection pooling via get_pg_engine(), schema initialization |
alternative_data.py |
[NEW] Options flow sentiment: Put/Call volume ratios, Implied Volatility skew extraction from yfinance options chains. Parallelized across assets |
fixed_income.py |
Bond pricing: clean price from yield, duration, convexity, and synthetic historical price generation for direct bonds |
futures_data.py |
Futures continuous contract construction via Panama Canal stitching method |
Data Flow
Yahoo Finance βββ
FRED API ββββββββ€
Kenneth French ββ€βββΆ data.py βββΆ PostgreSQL/SQLite βββΆ data_repository.py βββΆ core_engine.load_data()
Options Chains ββ β
ββββββββ΄βββββββ
β DataSnapshotβ
β (returns_df,β
β ff_df, rfr,β
β yield_df) β
βββββββββββββββ
The feature engineering pipeline (build_ml_features()) transforms raw returns into a per-asset feature matrix with momentum, volatility, factor exposure, and alternative data columns. Non-overlapping sampling prevents serial correlation in the training target. See MODELS.md Β§ 6 for the full feature list.
2.3 Return Forecasting & Risk Modeling
Detailed reference: MODELS.md
| File | Purpose |
|---|---|
models.py |
All 7 return forecasting models (CAPM, BL, Bayesian, FF, ML Ensemble, E2E, Regime-Adaptive), covariance estimation (Ledoit-Wolf, hybrid block-diagonal), GARCH scaling, and the meta-learner stacking pipeline |
dl_models.py |
[NEW] PyTorch NoiseFilteredTransformer (Conv1D + Transformer Encoder), CrossAssetSequenceDataset, and train_cross_asset_transformer() training loop |
forecast_generation.py |
_generate_forecasts() β the Strategy Pattern router that selects and executes the correct model, applies fixed-income overrides, and returns a ForecastResult |
bl_bridge.py |
Black-Litterman integration bridge: compute_bl_posterior() combines ML views with the BL equilibrium prior; scale_uncertainty_by_regime() modulates view confidence |
e2e_forecast_model.py |
End-to-End Differentiable Optimization (Model 6): forecast network, differentiable CVXPY layer, SPO+ loss training |
regime_detection.py |
Hidden Markov Model (HMM) regime classifier for benchmark returns; dynamic_risk_aversion() VIX-based risk adjustment |
bayesian_online.py |
Bayesian Online Change-Point Detection (BOCD) for structural break identification in return series |
generative_scenarios.py |
Monte Carlo scenario generation from fitted covariance models |
math_utils.py |
Shared mathematical utilities: compute_risk_contributions() for marginal risk decomposition |
2.4 Portfolio Optimization
Detailed reference: ALLOCATION_ENGINES.md
| File | Purpose |
|---|---|
solver.py |
Master optimization router: build_and_optimize() for single-period; multi_period_optimize() for MPC stochastic programming. Routes to Engine 1, 2, or 3. Computes efficient frontier, risk contributions, and sensitivity analysis |
cvxpy_engine.py |
CVXPYOptimizationEngine β Mean-Variance quadratic programming with full constraint suite, 7-stage relaxation cascade, cardinality heuristic, CVaR tail-risk, and Almgren-Chriss market impact |
hrp_engine.py |
Hierarchical Risk Parity: agglomerative clustering, quasi-diagonalisation, recursive bisection, and tax-aware blending |
erc_engine.py |
[NEW] Exact True Risk Parity: Spinu logarithmic barrier formulation via CVXPY (SCS/ECOS solver) |
constraints.py |
Constraint pre-processing: check_and_fix_bounds() for sanitising user inputs, make_nearest_psd() for covariance matrix repair |
differentiable_optimizer.py |
cvxpylayers-based differentiable portfolio layer for gradient flow in Model 6 |
futures_overlay.py |
Futures overlay optimizer: beta hedge, duration hedge, or volatility dampening via ES/MES futures |
safety.py |
Pre-trade safety checks: position limits, concentration alerts, and drawdown circuit breakers |
Optimization Flow
forecast_generation.py
β
βΌ
ForecastResult (exp_rets, covariance, betas, garch_info)
β
βΌ
solver.py ββ allocation_engine == 1 βββΆ cvxpy_engine.py (Mean-Variance)
ββ allocation_engine == 2 βββΆ hrp_engine.py (HRP)
ββ allocation_engine == 3 βββΆ erc_engine.py (Exact Risk Parity)
β
βΌ
OptimizationResult (weights, model_info, risk_contributions, ef_curve)
2.5 Validation & Econometrics
| File | Purpose |
|---|---|
validation.py |
Four econometric tests: Christoffersen Conditional Coverage, Diebold-Mariano, Probabilistic Sharpe Ratio (PSR), Deflated Sharpe Ratio (DSR). See PIPELINE.md Β§ 3 for mathematical formulations |
backtest.py |
Walk-forward expanding window cross-validation (expanding_window_backtest()), Monte Carlo simulation (monte_carlo()), and rolling performance metrics |
analytics.py |
Portfolio sensitivity analysis (Β±10% return perturbation), historical stress testing (2008 GFC, 2020 COVID, rate shock, tech crash), behavioural diagnostics |
risk_attribution.py |
Factor exposure decomposition, marginal VaR, CVaR component attribution, and stress correlation analysis |
overlay_analytics.py |
Futures overlay analytics: aggregated overlay returns, margin call simulation |
simulation.py |
Monte Carlo and historical simulation engines for risk budgeting |
audit_reproducibility.py |
Bit-exact reproducibility verification: hashes inputs and outputs across runs |
2.6 Reporting & Output
Detailed reference: OUTPUT.md
| File | Purpose |
|---|---|
report.py |
Report orchestrator: coordinates data preparation β HTML rendering β file output |
report_data.py |
prepare_template_variables() β transforms mathematical outputs into HTML fragments and Chart.js data payloads (~675 lines) |
report_html.py |
HTML rendering layer: substitutes template variables into report_template.html |
report_template.html |
26KB static HTML template with Chart.js initialization, dark theme, and responsive CSS |
report_chart.py |
Chart.js payload generators for equity curves, pie charts, efficient frontiers, Monte Carlo fans |
chart_data.py |
Lightweight chart data serialization utilities |
model_visuals.py |
Model-specific visualization helpers (factor exposure plots, GARCH regime charts) |
narrative.py |
Natural-language narrative generation summarising portfolio strategy and market conditions |
table_builder.py |
HTML table construction utilities |
exports.py |
CSV, Excel, and PDF export (export_csv(), export_excel()) |
report_builders/ |
Modular HTML section builders for performance, risk, and tax report sections |
2.7 Execution & Infrastructure
| File | Purpose |
|---|---|
execution.py |
IBKR execution stubs, order management, and paper trading interface (19KB, not yet production-connected) |
Dockerfile |
Container image definition (Python 3.11-slim) |
docker-compose.yml |
Local development environment with PostgreSQL 15 and Redis 7 |
deploy/helm/ |
Helm chart for Kubernetes deployment (see DEPLOY.md) |
pyproject.toml |
Project metadata, pytest configuration, and build system |
requirements.txt |
Python dependency manifest |
setup.py |
Legacy setuptools configuration |
.github/workflows/ci.yml |
GitHub Actions CI pipeline (lint, type-check, test) |
.pre-commit-config.yaml |
Pre-commit hooks configuration |
2.8 Research & Experimental
Detailed reference: RESEARCH.md
| File | Purpose |
|---|---|
research/dreamer/ |
DreamerV2 world-model RL agent adapted for financial time series |
research/cybernetic.py |
PID volatility controller and adaptive risk setpoint |
research/cybernetic_ensemble.py |
Three-layer cybernetic control hierarchy |
run_simulation.py |
Standalone simulation script for research experiments |
debug_validation.py |
Debugging utilities for validation pipeline |
2.9 Tests
Detailed reference: TESTS.md
| File | Purpose |
|---|---|
tests/test_optimize.py |
Constraint logic, mean-variance, HRP, multi-period optimization |
tests/test_simulate.py |
End-to-end integration test |
tests/test_e2e.py |
Differentiable optimization pipeline |
tests/test_models.py |
Return model correctness |
tests/test_analytics.py |
Backtest engine, Sharpe, Sortino, Calmar |
tests/test_data.py |
Data fetching, missing-data handling |
tests/test_validation.py |
Econometric test statistical properties |
tests/test_new_features.py |
[NEW] Transformer training/inference, options flow sentiment extraction, and exact risk parity mathematical verification |
test_audit.py |
Reproducibility audit |
test_perf.py |
Performance benchmarks |
3. Configuration System
config.py β AppConfig
The engine is driven by a Pydantic-validated configuration schema. The AppConfig class enforces type safety and cross-field validation (e.g., single_asset_min β€ single_asset_max). Configuration is loaded from output/portfolio_config.json, merged with constraints.json, and can be overridden programmatically via the API or CLI.
Key configuration axes:
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
int (1β7) | 5 | Return forecasting model selection |
allocation_engine |
int (1β3) | 1 | Optimization engine: 1=MV, 2=HRP, 3=ERC |
max_assets |
int | None | Cardinality constraint (max non-zero positions) |
risk_free_rate |
float | 0.04 | Annual risk-free rate |
single_asset_min |
float | -1.0 | Min weight per asset (negative = shorting) |
single_asset_max |
float | 0.40 | Max weight per asset |
sector_limit |
float | 0.40 | Max aggregate weight per sector |
gross_leverage_cap |
float | 2.0 | Maximum gross leverage (L1 norm of weights) |
max_turnover |
float | 3.0 | Maximum total turnover per rebalance |
garch_enabled |
bool | True | Enable GARCH(1,1) covariance scaling |
cvar_enabled |
bool | True | Enable CVaR tail-risk constraint |
tax_enabled |
bool | False | Enable tax-aware optimization |
hmm_regime |
bool | True | Enable HMM regime detection |
dynamic_risk |
bool | True | VIX-based risk aversion adjustment |
with_futures |
bool | False | Enable futures overlay |
extended_history |
bool | False | Extended history via proxy stitching |
See config.py for the full schema and validation rules.
4. Data Structures
The engine communicates between layers via typed dataclasses defined in core_types.py:
PortfolioState
Tracks the current portfolio: total_capital, current_weights, cost_basis, tax_rates, gain_fractions, and tickers. Created empty for new portfolios or loaded from portfolio_state.json.
ForecastResult
Output of _generate_forecasts(): contains expected_returns, covariance_result, betas, garch_info, js_alpha, capm_rets, ff_betas, periods, historical_returns, and feature_importances.
CovarianceResult
Wraps the covariance matrix with its derived correlation matrix, per-asset volatility series, and the Ledoit-Wolf shrinkage intensity Ξ±.
OptimizationResult
Final output: weights, expected_returns, covariance_matrix, volatility, correlation_matrix, betas, and a model_info dictionary containing all metadata (risk contributions, efficient frontier, relaxation log, binding constraints, duration, GARCH info, feature importances, etc.).
5. Concurrency & Thread Safety
| Mechanism | Location | Purpose |
|---|---|---|
_yf_lock (threading.Lock) |
data.py |
Rate-limits yfinance API calls to max 2/sec |
_ML_CACHE_LOCK (threading.Lock) |
models.py |
Thread-safe caching of trained ML ensemble models |
_ef_cache_lock (threading.Lock) |
solver.py |
Thread-safe efficient frontier LRU cache |
ThreadPoolExecutor |
data.py, alternative_data.py |
Parallel data fetching (max 10 workers) |
6. Graceful Degradation
The engine is designed to always produce output, even under degraded conditions:
| Failure Mode | Fallback |
|---|---|
| PostgreSQL unreachable | Falls back to local SQLite |
| ML ensemble training fails | Falls back to CAPM expected returns |
| PyTorch not installed | Transformer predictions silently skipped; ensemble uses only XGBoost + ElasticNet |
| Options data fetch fails | Returns neutral defaults (PCR=1.0, skew=0.0) |
| GARCH fitting fails | Uses unconditional covariance (no scaling) |
| Fama-French download fails | Models 4/5 fall back to CAPM |
| CVXPY solver infeasible | 7-stage constraint relaxation cascade |
| All constraints infeasible | 100% cash allocation |
| PDF export fails | Only HTML report generated |
| MPC multi-period fails | Falls back to single-period optimization |
7. Dependency Graph
core_engine.py
βββ config.py
βββ core_types.py
βββ data.py
β βββ database.py
β βββ fixed_income.py
β βββ alternative_data.py
βββ regime_detection.py
βββ solver.py
β βββ forecast_generation.py
β β βββ models.py
β β β βββ dl_models.py
β β βββ bl_bridge.py
β βββ cvxpy_engine.py
β β βββ constraints.py
β βββ hrp_engine.py
β βββ erc_engine.py
βββ backtest.py
βββ validation.py
βββ analytics.py
β βββ risk_attribution.py
βββ report.py
β βββ report_data.py
β βββ report_html.py
β βββ report_builders/
βββ exports.py
βββ futures_overlay.py
β βββ futures_data.py
β βββ overlay_analytics.py
βββ execution.py
βββ server.py
Circular dependency rule: The dependency chain follows a strict unidirectional flow: config β core_types β data β models β solver β analytics β report. Lazy imports are used in forecast_generation.py and solver.py to avoid circular references at module load time.
8. Document Index
| Document | What It Covers |
|---|---|
| ARCHITECTURE.md (this file) | Master reference β complete file map, data flow, dependency graph |
| PIPELINE.md | 4-stage pipeline execution model, data flow diagrams, configuration axes |
| MODELS.md | All 7 return forecasting models, covariance estimators, GARCH, BL bridge, alternative data, and Transformer |
| ALLOCATION_ENGINES.md | Mean-Variance (CVXPY), HRP, and Exact Risk Parity engines; cardinality constraints; MPC multi-period |
| RELAXATION_CASCADE.md | 7-stage progressive constraint relaxation for infeasible CVXPY solves |
| TESTS.md | Test suite design, mocking strategy, property-based testing, and full test inventory |
| OUTPUT.md | Output directory structure and artefact descriptions |
| DEPLOY.md | Docker, Helm, Kubernetes, CI/CD, and production considerations |
| RESEARCH.md | Experimental modules: PID controller, Dreamer RL, cybernetic ensemble |