Spaces:

engineportf
/

math-backend

Sleeping

App Files Files Community

math-backend / docs /ARCHITECTURE.md

engineportf

Upload folder using huggingface_hub

558db1e verified 14 days ago

preview code

Raw

History Blame Contribute Delete

20.2 kB

Portfolio Engine — Complete Architecture Reference

Abstract

This document is the master reference for the entire Portfolio Engine codebase. It describes every module, its purpose, its key functions, and how it connects to the rest of the system. When a topic is explained in full depth in a dedicated document, this file links to it rather than duplicating the content. After reading this document, you should understand what every file does, how data flows through the system, and where to find detailed explanations of each subsystem.

1. System Overview

The Portfolio Engine is an institutional-grade quantitative portfolio allocation system. It ingests market data, estimates expected returns and risk, solves a constrained convex optimization problem to produce target portfolio weights, validates those weights via out-of-sample econometric tests, and generates interactive HTML reports.

┌─────────────────────────────────────────────────────────────────────────┐
│                           Portfolio Engine                              │
│                                                                         │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────┐  ┌──────────────┐  │
│  │ Data     │─▶│ Risk & Return│─▶│ Convex        │─▶│ Reporting &  │  │
│  │ Ingestion│  │ Modeling     │  │ Optimization  │  │ Analytics    │  │
│  └──────────┘  └──────────────┘  └───────────────┘  └──────────────┘  │
│       │              │                   │                  │           │
│   data.py        models.py          solver.py          report.py       │
│   database.py    dl_models.py       cvxpy_engine.py    analytics.py    │
│   alternative_  forecast_           hrp_engine.py      validation.py   │
│    data.py       generation.py      erc_engine.py      backtest.py     │
└─────────────────────────────────────────────────────────────────────────┘

2. Complete File Map

Every Python file in the project, grouped by functional layer.

2.1 Orchestration & Entry Points

File	Purpose
`main.py`	CLI entry point; invokes the pipeline
`core_engine.py`	`PortfolioPipeline` class — the orchestrator (validate → optimize → report)
`config.py`	Configuration Facade importing schema, IO, logging, and constants
`config_schema.py`	Pydantic `AppConfig` and validation rules
`config_io.py`	File loading/saving for configuration dictionaries
`constants.py`	Centralized magic numbers, UI formatting, and mapping dictionaries
`logger.py`	JSON rotating log configuration
`core_types.py`	Shared dataclasses: `PortfolioState`, `ForecastResult`, `CovarianceResult`, `OptimizationResult`, `OptimizationError`, `EngineConfig`, etc.
`api.py`	FastAPI REST endpoints for headless/programmatic execution
`server.py`	Lightweight HTTP server to serve generated HTML reports
`dashboard.py`	Interactive CLI wizard for portfolio configuration

2.2 Data Ingestion & Persistence

File	Purpose
`data.py`	Market data fetching (yfinance), Fama-French factor download, ML feature engineering (`build_ml_features()`), credit spread proxies, extended history stitching, and block bootstrapping
`data_repository.py`	[NEW] `DataRepository` class. Centralized abstraction layer responsible for invoking data fetchers, cleaning returned series, standardizing timestamps, and returning a unified `DataSnapshot` for the engine.
`database.py`	SQLAlchemy ORM models (`DailyPrice`, `DailyYield`), PostgreSQL/SQLite connection pooling via `get_pg_engine()`, schema initialization
`alternative_data.py`	[NEW] Options flow sentiment: Put/Call volume ratios, Implied Volatility skew extraction from yfinance options chains. Parallelized across assets
`fixed_income.py`	Bond pricing: clean price from yield, duration, convexity, and synthetic historical price generation for direct bonds
`futures_data.py`	Futures continuous contract construction via Panama Canal stitching method

Data Flow

Yahoo Finance ──┐
FRED API ───────┤
Kenneth French ─┤──▶ data.py ──▶ PostgreSQL/SQLite ──▶ data_repository.py ──▶ core_engine.load_data()
Options Chains ─┘                                           │
                                                     ┌──────┴──────┐
                                                     │ DataSnapshot│
                                                     │ (returns_df,│
                                                     │  ff_df, rfr,│
                                                     │  yield_df)  │
                                                     └─────────────┘

The feature engineering pipeline (build_ml_features()) transforms raw returns into a per-asset feature matrix with momentum, volatility, factor exposure, and alternative data columns. Non-overlapping sampling prevents serial correlation in the training target. See MODELS.md § 6 for the full feature list.

2.3 Return Forecasting & Risk Modeling

Detailed reference: MODELS.md

File	Purpose
`models.py`	All 7 return forecasting models (CAPM, BL, Bayesian, FF, ML Ensemble, E2E, Regime-Adaptive), covariance estimation (Ledoit-Wolf, hybrid block-diagonal), GARCH scaling, and the meta-learner stacking pipeline
`dl_models.py`	[NEW] PyTorch `NoiseFilteredTransformer` (Conv1D + Transformer Encoder), `CrossAssetSequenceDataset`, and `train_cross_asset_transformer()` training loop
`forecast_generation.py`	`_generate_forecasts()` — the Strategy Pattern router that selects and executes the correct model, applies fixed-income overrides, and returns a `ForecastResult`
`bl_bridge.py`	Black-Litterman integration bridge: `compute_bl_posterior()` combines ML views with the BL equilibrium prior; `scale_uncertainty_by_regime()` modulates view confidence
`e2e_forecast_model.py`	End-to-End Differentiable Optimization (Model 6): forecast network, differentiable CVXPY layer, SPO+ loss training
`regime_detection.py`	Hidden Markov Model (HMM) regime classifier for benchmark returns; `dynamic_risk_aversion()` VIX-based risk adjustment
`bayesian_online.py`	Bayesian Online Change-Point Detection (BOCD) for structural break identification in return series
`generative_scenarios.py`	Monte Carlo scenario generation from fitted covariance models
`math_utils.py`	Shared mathematical utilities: `compute_risk_contributions()` for marginal risk decomposition

2.4 Portfolio Optimization

Detailed reference: ALLOCATION_ENGINES.md

File	Purpose
`solver.py`	Master optimization router: `build_and_optimize()` for single-period; `multi_period_optimize()` for MPC stochastic programming. Routes to Engine 1, 2, or 3. Computes efficient frontier, risk contributions, and sensitivity analysis
`cvxpy_engine.py`	`CVXPYOptimizationEngine` — Mean-Variance quadratic programming with full constraint suite, 7-stage relaxation cascade, cardinality heuristic, CVaR tail-risk, and Almgren-Chriss market impact
`hrp_engine.py`	Hierarchical Risk Parity: agglomerative clustering, quasi-diagonalisation, recursive bisection, and tax-aware blending
`erc_engine.py`	[NEW] Exact True Risk Parity: Spinu logarithmic barrier formulation via CVXPY (SCS/ECOS solver)
`constraints.py`	Constraint pre-processing: `check_and_fix_bounds()` for sanitising user inputs, `make_nearest_psd()` for covariance matrix repair
`differentiable_optimizer.py`	`cvxpylayers`-based differentiable portfolio layer for gradient flow in Model 6
`futures_overlay.py`	Futures overlay optimizer: beta hedge, duration hedge, or volatility dampening via ES/MES futures
`safety.py`	Pre-trade safety checks: position limits, concentration alerts, and drawdown circuit breakers

Optimization Flow

forecast_generation.py
        │
        ▼
    ForecastResult (exp_rets, covariance, betas, garch_info)
        │
        ▼
    solver.py ── allocation_engine == 1 ──▶ cvxpy_engine.py (Mean-Variance)
              ├─ allocation_engine == 2 ──▶ hrp_engine.py (HRP)
              └─ allocation_engine == 3 ──▶ erc_engine.py (Exact Risk Parity)
        │
        ▼
    OptimizationResult (weights, model_info, risk_contributions, ef_curve)

2.5 Validation & Econometrics

File	Purpose
`validation.py`	Four econometric tests: Christoffersen Conditional Coverage, Diebold-Mariano, Probabilistic Sharpe Ratio (PSR), Deflated Sharpe Ratio (DSR). See PIPELINE.md § 3 for mathematical formulations
`backtest.py`	Walk-forward expanding window cross-validation (`expanding_window_backtest()`), Monte Carlo simulation (`monte_carlo()`), and rolling performance metrics
`analytics.py`	Portfolio sensitivity analysis (±10% return perturbation), historical stress testing (2008 GFC, 2020 COVID, rate shock, tech crash), behavioural diagnostics
`risk_attribution.py`	Factor exposure decomposition, marginal VaR, CVaR component attribution, and stress correlation analysis
`overlay_analytics.py`	Futures overlay analytics: aggregated overlay returns, margin call simulation
`simulation.py`	Monte Carlo and historical simulation engines for risk budgeting
`audit_reproducibility.py`	Bit-exact reproducibility verification: hashes inputs and outputs across runs

2.6 Reporting & Output

Detailed reference: OUTPUT.md

File	Purpose
`report.py`	Report orchestrator: coordinates data preparation → HTML rendering → file output
`report_data.py`	`prepare_template_variables()` — transforms mathematical outputs into HTML fragments and Chart.js data payloads (~675 lines)
`report_html.py`	HTML rendering layer: substitutes template variables into `report_template.html`
`report_template.html`	26KB static HTML template with Chart.js initialization, dark theme, and responsive CSS
`report_chart.py`	Chart.js payload generators for equity curves, pie charts, efficient frontiers, Monte Carlo fans
`chart_data.py`	Lightweight chart data serialization utilities
`model_visuals.py`	Model-specific visualization helpers (factor exposure plots, GARCH regime charts)
`narrative.py`	Natural-language narrative generation summarising portfolio strategy and market conditions
`table_builder.py`	HTML table construction utilities
`exports.py`	CSV, Excel, and PDF export (`export_csv()`, `export_excel()`)
`report_builders/`	Modular HTML section builders for performance, risk, and tax report sections

2.7 Execution & Infrastructure

File	Purpose
`execution.py`	IBKR execution stubs, order management, and paper trading interface (19KB, not yet production-connected)
`Dockerfile`	Container image definition (Python 3.11-slim)
`docker-compose.yml`	Local development environment with PostgreSQL 15 and Redis 7
`deploy/helm/`	Helm chart for Kubernetes deployment (see DEPLOY.md)
`pyproject.toml`	Project metadata, pytest configuration, and build system
`requirements.txt`	Python dependency manifest
`setup.py`	Legacy setuptools configuration
`.github/workflows/ci.yml`	GitHub Actions CI pipeline (lint, type-check, test)
`.pre-commit-config.yaml`	Pre-commit hooks configuration

2.8 Research & Experimental

Detailed reference: RESEARCH.md

File	Purpose
`research/dreamer/`	DreamerV2 world-model RL agent adapted for financial time series
`research/cybernetic.py`	PID volatility controller and adaptive risk setpoint
`research/cybernetic_ensemble.py`	Three-layer cybernetic control hierarchy
`run_simulation.py`	Standalone simulation script for research experiments
`debug_validation.py`	Debugging utilities for validation pipeline

2.9 Tests

Detailed reference: TESTS.md

File	Purpose
`tests/test_optimize.py`	Constraint logic, mean-variance, HRP, multi-period optimization
`tests/test_simulate.py`	End-to-end integration test
`tests/test_e2e.py`	Differentiable optimization pipeline
`tests/test_models.py`	Return model correctness
`tests/test_analytics.py`	Backtest engine, Sharpe, Sortino, Calmar
`tests/test_data.py`	Data fetching, missing-data handling
`tests/test_validation.py`	Econometric test statistical properties
`tests/test_new_features.py`	[NEW] Transformer training/inference, options flow sentiment extraction, and exact risk parity mathematical verification
`test_audit.py`	Reproducibility audit
`test_perf.py`	Performance benchmarks

3. Configuration System

config.py → AppConfig

The engine is driven by a Pydantic-validated configuration schema. The AppConfig class enforces type safety and cross-field validation (e.g., single_asset_min ≤ single_asset_max). Configuration is loaded from output/portfolio_config.json, merged with constraints.json, and can be overridden programmatically via the API or CLI.

Key configuration axes:

Parameter	Type	Default	Description
`model`	int (1–7)	5	Return forecasting model selection
`allocation_engine`	int (1–3)	1	Optimization engine: 1=MV, 2=HRP, 3=ERC
`max_assets`	int	None	Cardinality constraint (max non-zero positions)
`risk_free_rate`	float	0.04	Annual risk-free rate
`single_asset_min`	float	-1.0	Min weight per asset (negative = shorting)
`single_asset_max`	float	0.40	Max weight per asset
`sector_limit`	float	0.40	Max aggregate weight per sector
`gross_leverage_cap`	float	2.0	Maximum gross leverage (L1 norm of weights)
`max_turnover`	float	3.0	Maximum total turnover per rebalance
`garch_enabled`	bool	True	Enable GARCH(1,1) covariance scaling
`cvar_enabled`	bool	True	Enable CVaR tail-risk constraint
`tax_enabled`	bool	False	Enable tax-aware optimization
`hmm_regime`	bool	True	Enable HMM regime detection
`dynamic_risk`	bool	True	VIX-based risk aversion adjustment
`with_futures`	bool	False	Enable futures overlay
`extended_history`	bool	False	Extended history via proxy stitching

See config.py for the full schema and validation rules.

4. Data Structures

The engine communicates between layers via typed dataclasses defined in core_types.py:

PortfolioState

Tracks the current portfolio: total_capital, current_weights, cost_basis, tax_rates, gain_fractions, and tickers. Created empty for new portfolios or loaded from portfolio_state.json.

ForecastResult

Output of _generate_forecasts(): contains expected_returns, covariance_result, betas, garch_info, js_alpha, capm_rets, ff_betas, periods, historical_returns, and feature_importances.

CovarianceResult

Wraps the covariance matrix with its derived correlation matrix, per-asset volatility series, and the Ledoit-Wolf shrinkage intensity α.

OptimizationResult

Final output: weights, expected_returns, covariance_matrix, volatility, correlation_matrix, betas, and a model_info dictionary containing all metadata (risk contributions, efficient frontier, relaxation log, binding constraints, duration, GARCH info, feature importances, etc.).

5. Concurrency & Thread Safety

Mechanism	Location	Purpose
`_yf_lock` (threading.Lock)	`data.py`	Rate-limits yfinance API calls to max 2/sec
`_ML_CACHE_LOCK` (threading.Lock)	`models.py`	Thread-safe caching of trained ML ensemble models
`_ef_cache_lock` (threading.Lock)	`solver.py`	Thread-safe efficient frontier LRU cache
`ThreadPoolExecutor`	`data.py`, `alternative_data.py`	Parallel data fetching (max 10 workers)

6. Graceful Degradation

The engine is designed to always produce output, even under degraded conditions:

Failure Mode	Fallback
PostgreSQL unreachable	Falls back to local SQLite
ML ensemble training fails	Falls back to CAPM expected returns
PyTorch not installed	Transformer predictions silently skipped; ensemble uses only XGBoost + ElasticNet
Options data fetch fails	Returns neutral defaults (PCR=1.0, skew=0.0)
GARCH fitting fails	Uses unconditional covariance (no scaling)
Fama-French download fails	Models 4/5 fall back to CAPM
CVXPY solver infeasible	7-stage constraint relaxation cascade
All constraints infeasible	100% cash allocation
PDF export fails	Only HTML report generated
MPC multi-period fails	Falls back to single-period optimization

7. Dependency Graph

core_engine.py
├── config.py
├── core_types.py
├── data.py
│   ├── database.py
│   ├── fixed_income.py
│   └── alternative_data.py
├── regime_detection.py
├── solver.py
│   ├── forecast_generation.py
│   │   ├── models.py
│   │   │   └── dl_models.py
│   │   └── bl_bridge.py
│   ├── cvxpy_engine.py
│   │   └── constraints.py
│   ├── hrp_engine.py
│   └── erc_engine.py
├── backtest.py
├── validation.py
├── analytics.py
│   └── risk_attribution.py
├── report.py
│   ├── report_data.py
│   ├── report_html.py
│   └── report_builders/
├── exports.py
├── futures_overlay.py
│   ├── futures_data.py
│   └── overlay_analytics.py
├── execution.py
└── server.py

Circular dependency rule: The dependency chain follows a strict unidirectional flow: config ← core_types ← data ← models ← solver ← analytics ← report. Lazy imports are used in forecast_generation.py and solver.py to avoid circular references at module load time.

8. Document Index

Document	What It Covers
ARCHITECTURE.md (this file)	Master reference — complete file map, data flow, dependency graph
PIPELINE.md	4-stage pipeline execution model, data flow diagrams, configuration axes
MODELS.md	All 7 return forecasting models, covariance estimators, GARCH, BL bridge, alternative data, and Transformer
ALLOCATION_ENGINES.md	Mean-Variance (CVXPY), HRP, and Exact Risk Parity engines; cardinality constraints; MPC multi-period
RELAXATION_CASCADE.md	7-stage progressive constraint relaxation for infeasible CVXPY solves
TESTS.md	Test suite design, mocking strategy, property-based testing, and full test inventory
OUTPUT.md	Output directory structure and artefact descriptions
DEPLOY.md	Docker, Helm, Kubernetes, CI/CD, and production considerations
RESEARCH.md	Experimental modules: PID controller, Dreamer RL, cybernetic ensemble