math-backend / docs /OUTPUT.md
engineportf's picture
Upload folder using huggingface_hub
558db1e verified
|
Raw
History Blame Contribute Delete
8.25 kB

Output Module

Abstract

The output/ directory serves as the canonical artefact store for all data, reports, and serialised state produced by a single execution of the Portfolio Engine pipeline. Every pipeline run writes a deterministic set of files to this directory, enabling reproducibility audits, downstream consumption by external systems, and human review via rendered HTML reports. This document catalogues each output artefact, its format, provenance, and intended use.


1. Directory Structure

output/
β”œβ”€β”€ engine.log                  # Structured execution log
β”œβ”€β”€ finance_data.db             # SQLite cache of fetched market data
β”œβ”€β”€ niw_prior_state.pkl         # Serialised Normal-Inverse-Wishart prior
β”œβ”€β”€ portfolio_config.json       # Resolved configuration snapshot
β”œβ”€β”€ portfolio_db.sqlite3        # Persistent portfolio state database
β”œβ”€β”€ portfolio_report.html       # Interactive HTML report
β”œβ”€β”€ portfolio_state.json        # Current position and cost-basis ledger
└── portfolio_summary.csv       # Tabular weight and allocation summary

2. Artefact Descriptions

2.1 engine.log β€” Execution Log

Format: Structured JSON lines (JSONL).

Content: A chronological record of every significant event during a pipeline run, including:

  • Data fetch timestamps and API call metadata (rate-limit compliance).
  • Solver invocations, constraint relaxation cascade stages, and convergence status.
  • Walk-forward cross-validation fold boundaries.
  • Econometric test results (Diebold-Mariano, Christoffersen, PSR, DSR).
  • Warnings, errors, and fallback decisions.

Purpose. The log provides a full audit trail for reproducibility. In institutional settings, regulatory mandates (e.g., MiFID II, SEC Rule 17a-4) require that algorithmic trading decisions be reconstructable from contemporaneous records. This log satisfies that requirement.

2.2 finance_data.db β€” Market Data Cache

Format: SQLite 3 database.

Schema: Contains tables for daily OHLCV prices, risk-free rate series, and Fama-French factor returns, keyed by (ticker, date).

Purpose. Caches all data fetched from external APIs (e.g., Yahoo Finance, FRED, Kenneth French Data Library) to avoid redundant network calls. The engine reads from this cache on subsequent runs if the data is sufficiently recent, respecting a configurable staleness threshold. This design pattern reduces API rate-limit risk and improves pipeline latency.

2.3 niw_prior_state.pkl β€” Bayesian Prior State

Format: Python pickle (protocol 5).

Content: Serialised hyperparameters of the Normal-Inverse-Wishart (NIW) conjugate prior used by the Bayesian shrinkage models. Contains the prior mean vector (ΞΌβ‚€), scale matrix (Ξ¨), degrees of freedom (Ξ½), and concentration parameter (ΞΊ).

Purpose. Enables warm-starting of Bayesian estimation across pipeline runs. Rather than re-deriving the prior from scratch, the engine loads the previously calibrated state and performs a Bayesian update with newly observed data. This implements the sequential learning paradigm described by Meucci (2010) and Kolm & Ritter (2017).

Caution. Pickle files are not portable across Python major versions. Re-derive the prior when upgrading the runtime environment.

2.4 portfolio_config.json β€” Configuration Snapshot

Format: JSON.

Content: The fully resolved configuration dictionary at the time of execution, including:

  • Asset universe and sector classification map.
  • Risk parameters (risk aversion level, GARCH/CVaR flags, tax rates).
  • Solver tolerances and constraint bounds.
  • Benchmark tickers, trading-days-per-year, and currency symbol.

Purpose. Captures the exact parameterisation of a given run. Together with engine.log, this file enables bit-exact reproduction of any historical optimisation. Any parameter overrides from the CLI wizard or API are reflected here.

2.5 portfolio_db.sqlite3 β€” Persistent Portfolio Database

Format: SQLite 3 database.

Content: Stores portfolio-level state across runs, including historical weight snapshots, transaction records, and performance attribution data. Used by the database.py module for read/write operations when a PostgreSQL connection is unavailable.

Purpose. Provides a lightweight, zero-configuration persistence layer for local development and single-node deployments. In production, this is superseded by a managed PostgreSQL instance (see deploy/ documentation).

2.6 portfolio_report.html β€” Interactive Report

Format: Self-contained HTML with embedded Chart.js visualisations.

Content: A comprehensive, interactive report containing:

  • Portfolio Summary: Target weights, expected returns, volatility, Sharpe ratio, beta, and Treynor ratio.
  • Backtest Results: Historical equity curves, drawdown analysis, and rolling performance metrics.
  • Walk-Forward Validation: Out-of-sample equity curves from expanding-window cross-validation.
  • Monte Carlo Simulation: Fan charts with percentile confidence bands.
  • Risk Attribution: Marginal VaR, CVaR component decomposition, and factor exposures.
  • Econometric Validation: Diebold-Mariano test, Christoffersen conditional coverage, Probabilistic Sharpe Ratio, and Deflated Sharpe Ratio.
  • Stress Testing: Scenario impact analysis and sensitivity heatmaps.
  • GARCH Diagnostics: Per-asset volatility regime classification.
  • Constraint Diagnostics: Binding constraints and relaxation cascade history.

The report is rendered by report.py, which delegates data formatting to report_data.py and HTML assembly to report_html.py using the Jinja-style template in report_template.html.

Purpose. This is the primary deliverable for human consumption. It is designed to support investment committee presentations and client reporting.

2.7 portfolio_state.json β€” Position Ledger

Format: JSON.

Content: Current holdings expressed as a dictionary mapping tickers to:

  • avg_cost: Volume-weighted average cost basis per share.
  • shares: Current share count.
  • purchase_date: Date of initial acquisition.
  • last_updated: Most recent modification date.

Purpose. Enables tax-lot accounting for tax-loss harvesting and capital gains optimisation. The solver reads this file to compute unrealised gains and tax drag when tax_enabled = True.

2.8 portfolio_summary.csv β€” Tabular Export

Format: Comma-separated values (CSV).

Content: Flat table with one row per asset, columns for weight, expected return, volatility, beta, bid-ask spread, dollar allocation, and share count.

Purpose. Machine-readable export for downstream integration with order management systems (OMS), risk management platforms, or spreadsheet-based review workflows.


3. Data Flow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  External    β”‚     β”‚  core_engine β”‚     β”‚  output/         β”‚
β”‚  APIs        │────▢│  Pipeline    │────▢│  (artefacts)     β”‚
β”‚  (Yahoo, FF) β”‚     β”‚  Orchestratorβ”‚     β”‚                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  solver.py   β”‚
                    β”‚  analytics.pyβ”‚
                    β”‚  validation  β”‚
                    β”‚  report.py   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

All artefacts are written atomically where possible (write-then-rename) to prevent partial outputs in the event of pipeline failure.


References

  • Kolm, P. N., & Ritter, G. (2017). On the Bayesian interpretation of Black–Litterman. European Journal of Operational Research, 258(2), 564–572.
  • Meucci, A. (2010). The Black-Litterman approach: Original model and extensions. The Encyclopedia of Quantitative Finance.