# Quantitative Portfolio Builder: Architecture & Data Flow This document maps out the core data processing pipeline, the stochastic optimizer flow, and the post-trade analytics orchestration within the Engine. ## High-Level Architecture The Engine separates concerns into distinct layers: Data Ingestion, Risk Modeling, Convex Optimization, and Post-Trade Reporting. ```mermaid flowchart TD %% Define external data sources db[(PostgreSQL / SQLite)] yfinance[("Yahoo Finance (API)")] fred[("FRED (Macro Data)")] %% Ingestion Layer subgraph Data Layer ["Data Ingestion & Pre-Processing"] db --> |"Raw Pricing"| df_fetch(fetch_data) yfinance -.-> |"Fallback"| df_fetch fred --> |"Risk Free Rate"| rfr(fetch_risk_free_rate) df_fetch --> cleaning[Missing Value Imputation] cleaning --> returns[Calculate Daily Returns] end %% Modeling Layer subgraph Quant Models ["Risk & Return Modeling"] returns --> ewma[Covariance Estimation] returns --> garch[GARCH Volatility Regime] returns --> ff[Fama-French Factor Betas] returns --> hmm[HMM Regime Detection] ewma --> rmt[RMT Noise Filtering] end %% Optimization Layer subgraph Solver Engine ["Convex Optimization (cvxpy)"] direction TB rmt --> cov[Clean Covariance Matrix] garch --> cov cov --> cvx_setup[Build CVX Objective] ff --> expected_rets[Calculate Expected Returns] expected_rets --> cvx_setup cvx_setup --> constraints[Apply Bounds, Sectors, Turnover, Risk Limit] constraints --> cvx_solve[Solve ECOS/SCS] cvx_solve --> target_weights[Target Asset Weights] end %% Execution & Simulation subgraph Execution ["Execution & Backtesting"] target_weights --> hifo[HIFO Lot Manager] target_weights --> exec_cost[Almgren-Chriss Impact] hifo --> tax[Tax Drag Calculation] exec_cost --> net_curve[Net Equity Curve] tax --> net_curve end %% Post Trade Reporting subgraph Reporting ["Reporting & Analytics Builders"] target_weights --> mc[Monte Carlo Simulation] target_weights --> mvar[Marginal VaR] target_weights --> cvar[Component CVaR] net_curve --> perf_builder[html_performance.py] mc --> risk_builder[html_risk.py] cvar --> risk_builder mvar --> risk_builder hifo --> tax_builder[html_tax.py] perf_builder --> report_orchestrator(report_data.py) risk_builder --> report_orchestrator tax_builder --> report_orchestrator end %% Connections between major blocks returns --> Solver Engine target_weights --> Reporting net_curve --> Reporting ``` ## The "cfg" Dictionary and EngineConfig Historically, the engine passed a mutable `cfg` dictionary throughout the entire codebase. This is being replaced by the `EngineConfig` dataclass (defined in `core_types.py`). `EngineConfig` ensures all mathematical parameters (e.g. `garch_enabled`, `cvar_alpha`, `max_turnover`) are strictly typed and immutable during the optimization step. ## Circular Dependency Resolution The core analytical dependency chain follows a strict unidirectional flow to avoid circular imports: `utils.metrics` ← `analytics.py` ← `solver.py` ← `report_data.py` (which orchestrates the HTML builders inside `report_builders/`).