math-backend / PIPELINE.md
engineportf's picture
Upload folder using huggingface_hub
558db1e verified
|
Raw
History Blame Contribute Delete
3.52 kB

Quantitative Portfolio Builder: Architecture & Data Flow

This document maps out the core data processing pipeline, the stochastic optimizer flow, and the post-trade analytics orchestration within the Engine.

High-Level Architecture

The Engine separates concerns into distinct layers: Data Ingestion, Risk Modeling, Convex Optimization, and Post-Trade Reporting.

flowchart TD
    %% Define external data sources
    db[(PostgreSQL / SQLite)]
    yfinance[("Yahoo Finance (API)")]
    fred[("FRED (Macro Data)")]
    
    %% Ingestion Layer
    subgraph Data Layer ["Data Ingestion & Pre-Processing"]
        db --> |"Raw Pricing"| df_fetch(fetch_data)
        yfinance -.-> |"Fallback"| df_fetch
        fred --> |"Risk Free Rate"| rfr(fetch_risk_free_rate)
        
        df_fetch --> cleaning[Missing Value Imputation]
        cleaning --> returns[Calculate Daily Returns]
    end

    %% Modeling Layer
    subgraph Quant Models ["Risk & Return Modeling"]
        returns --> ewma[Covariance Estimation]
        returns --> garch[GARCH Volatility Regime]
        returns --> ff[Fama-French Factor Betas]
        returns --> hmm[HMM Regime Detection]
        
        ewma --> rmt[RMT Noise Filtering]
    end

    %% Optimization Layer
    subgraph Solver Engine ["Convex Optimization (cvxpy)"]
        direction TB
        rmt --> cov[Clean Covariance Matrix]
        garch --> cov
        
        cov --> cvx_setup[Build CVX Objective]
        ff --> expected_rets[Calculate Expected Returns]
        
        expected_rets --> cvx_setup
        
        cvx_setup --> constraints[Apply Bounds, Sectors, Turnover, Risk Limit]
        constraints --> cvx_solve[Solve ECOS/SCS]
        cvx_solve --> target_weights[Target Asset Weights]
    end

    %% Execution & Simulation
    subgraph Execution ["Execution & Backtesting"]
        target_weights --> hifo[HIFO Lot Manager]
        target_weights --> exec_cost[Almgren-Chriss Impact]
        
        hifo --> tax[Tax Drag Calculation]
        exec_cost --> net_curve[Net Equity Curve]
        tax --> net_curve
    end

    %% Post Trade Reporting
    subgraph Reporting ["Reporting & Analytics Builders"]
        target_weights --> mc[Monte Carlo Simulation]
        target_weights --> mvar[Marginal VaR]
        target_weights --> cvar[Component CVaR]
        
        net_curve --> perf_builder[html_performance.py]
        mc --> risk_builder[html_risk.py]
        cvar --> risk_builder
        mvar --> risk_builder
        hifo --> tax_builder[html_tax.py]
        
        perf_builder --> report_orchestrator(report_data.py)
        risk_builder --> report_orchestrator
        tax_builder --> report_orchestrator
    end

    %% Connections between major blocks
    returns --> Solver Engine
    target_weights --> Reporting
    net_curve --> Reporting

The "cfg" Dictionary and EngineConfig

Historically, the engine passed a mutable cfg dictionary throughout the entire codebase. This is being replaced by the EngineConfig dataclass (defined in core_types.py). EngineConfig ensures all mathematical parameters (e.g. garch_enabled, cvar_alpha, max_turnover) are strictly typed and immutable during the optimization step.

Circular Dependency Resolution

The core analytical dependency chain follows a strict unidirectional flow to avoid circular imports: utils.metricsanalytics.pysolver.pyreport_data.py (which orchestrates the HTML builders inside report_builders/).