math-backend / PIPELINE.md
engineportf's picture
Upload folder using huggingface_hub
558db1e verified
|
Raw
History Blame Contribute Delete
3.52 kB
# Quantitative Portfolio Builder: Architecture & Data Flow
This document maps out the core data processing pipeline, the stochastic optimizer flow, and the post-trade analytics orchestration within the Engine.
## High-Level Architecture
The Engine separates concerns into distinct layers: Data Ingestion, Risk Modeling, Convex Optimization, and Post-Trade Reporting.
```mermaid
flowchart TD
%% Define external data sources
db[(PostgreSQL / SQLite)]
yfinance[("Yahoo Finance (API)")]
fred[("FRED (Macro Data)")]
%% Ingestion Layer
subgraph Data Layer ["Data Ingestion & Pre-Processing"]
db --> |"Raw Pricing"| df_fetch(fetch_data)
yfinance -.-> |"Fallback"| df_fetch
fred --> |"Risk Free Rate"| rfr(fetch_risk_free_rate)
df_fetch --> cleaning[Missing Value Imputation]
cleaning --> returns[Calculate Daily Returns]
end
%% Modeling Layer
subgraph Quant Models ["Risk & Return Modeling"]
returns --> ewma[Covariance Estimation]
returns --> garch[GARCH Volatility Regime]
returns --> ff[Fama-French Factor Betas]
returns --> hmm[HMM Regime Detection]
ewma --> rmt[RMT Noise Filtering]
end
%% Optimization Layer
subgraph Solver Engine ["Convex Optimization (cvxpy)"]
direction TB
rmt --> cov[Clean Covariance Matrix]
garch --> cov
cov --> cvx_setup[Build CVX Objective]
ff --> expected_rets[Calculate Expected Returns]
expected_rets --> cvx_setup
cvx_setup --> constraints[Apply Bounds, Sectors, Turnover, Risk Limit]
constraints --> cvx_solve[Solve ECOS/SCS]
cvx_solve --> target_weights[Target Asset Weights]
end
%% Execution & Simulation
subgraph Execution ["Execution & Backtesting"]
target_weights --> hifo[HIFO Lot Manager]
target_weights --> exec_cost[Almgren-Chriss Impact]
hifo --> tax[Tax Drag Calculation]
exec_cost --> net_curve[Net Equity Curve]
tax --> net_curve
end
%% Post Trade Reporting
subgraph Reporting ["Reporting & Analytics Builders"]
target_weights --> mc[Monte Carlo Simulation]
target_weights --> mvar[Marginal VaR]
target_weights --> cvar[Component CVaR]
net_curve --> perf_builder[html_performance.py]
mc --> risk_builder[html_risk.py]
cvar --> risk_builder
mvar --> risk_builder
hifo --> tax_builder[html_tax.py]
perf_builder --> report_orchestrator(report_data.py)
risk_builder --> report_orchestrator
tax_builder --> report_orchestrator
end
%% Connections between major blocks
returns --> Solver Engine
target_weights --> Reporting
net_curve --> Reporting
```
## The "cfg" Dictionary and EngineConfig
Historically, the engine passed a mutable `cfg` dictionary throughout the entire codebase. This is being replaced by the `EngineConfig` dataclass (defined in `core_types.py`). `EngineConfig` ensures all mathematical parameters (e.g. `garch_enabled`, `cvar_alpha`, `max_turnover`) are strictly typed and immutable during the optimization step.
## Circular Dependency Resolution
The core analytical dependency chain follows a strict unidirectional flow to avoid circular imports:
`utils.metrics` ← `analytics.py` ← `solver.py` ← `report_data.py` (which orchestrates the HTML builders inside `report_builders/`).