option-pdf-vis / docs /PROJECT_EXPLANATION.md
Arjit
Production-ready Option-Implied PDF Visualizer
8e1643b

Option-Implied PDF Visualizer - Complete Project Explanation

Executive Summary

This project extracts market expectations about future stock prices from options markets and presents them as intuitive 3D visualizations with AI-powered interpretations.

Status: βœ… Phase 8 Complete (100% - Production Ready) Last Updated: 2025-12-08 Repository Type: Solo project with AI assistance (Claude Code) Live Interfaces: Streamlit (port 8501) + React SPA (port 5173 + FastAPI backend 8000)


Table of Contents

  1. Non-Technical Explanation
  2. Technical Explanation
  3. Architecture Overview
  4. Mathematical Foundation
  5. Implementation Details
  6. Key Algorithms
  7. Data Flow
  8. Testing Strategy
  9. Future Enhancements

Non-Technical Explanation

What Problem Does This Solve?

When traders buy and sell options, they're essentially placing bets on where they think a stock price will go. These bets contain valuable information about the market's collective expectations. This tool extracts that hidden information and makes it visible.

What Does It Do?

Imagine you could see a 3D landscape showing:

  • X-axis: Different possible stock prices (strikes)
  • Y-axis: Time into the future (days to expiration)
  • Z-axis: How likely each price is (probability)

The tool creates this landscape and then uses AI to explain what it means in plain English.

Why Is This Useful?

For Traders: Understand where the market expects prices to move and how much uncertainty exists.

For Risk Managers: Quantify tail risk and see probability distributions.

For Researchers: Study historical probability distributions and prediction accuracy.

For Students: Learn derivatives pricing and market microstructure.

Real-World Example

Imagine SPY is trading at $450. The tool might show:

  • 68% chance price stays between $436-$467 in 30 days
  • 22% chance of +5% move (bullish tilt)
  • 18% chance of -5% move
  • Negative skewness (-0.15) suggests slight downside bias
  • The AI explains: "Market is pricing in moderate uncertainty with slight bearish lean, similar to pre-Fed-announcement patterns in October 2023."

Technical Explanation

Core Concept: Risk-Neutral Probability Density

Options markets implicitly encode a risk-neutral probability distribution for future asset prices. The Breeden-Litzenberger (1978) formula allows us to extract this distribution by taking the second derivative of call option prices with respect to strike:

f(K) = e^(rT) Γ— βˆ‚Β²C/βˆ‚KΒ²

Where:

  • f(K) = risk-neutral probability density at strike K
  • C = call option price as a function of strike
  • r = risk-free rate
  • T = time to expiration
  • e^(rT) = discount factor

Why This Matters

Traditional Approach: Implied volatility gives a single number (expected magnitude of moves)

This Approach: Full probability distribution showing:

  • Mean and variance (expected price and uncertainty)
  • Skewness (directional bias)
  • Kurtosis (fat tails / crash risk)
  • Specific probabilities for any price level

Technical Stack

Backend:

  • Python 3.11+ (type hints, modern syntax)
  • NumPy/SciPy (numerical computation)
  • Pandas (data manipulation)

Data Sources:

  • OpenBB Terminal (primary option chain data)
  • yfinance (backup data source)
  • FRED API (risk-free rate)

Models:

  • SABR (Stochastic Alpha Beta Rho) volatility model
  • Cubic spline interpolation (fallback)
  • Cosine similarity for pattern matching

AI:

  • Ollama (local LLM inference)
  • Qwen3-7B (7 billion parameter language model)
  • Intelligent fallback for offline operation

Visualization:

  • Plotly (interactive 3D graphics)
  • Dark theme with professional styling

Database (Phase 5):

  • SQLite (time series storage)
  • ChromaDB (vector search for patterns)

Frontend (Phase 6):

  • Streamlit (Python web framework)

Deployment (Phase 7):

  • Docker containerization
  • HuggingFace Spaces hosting

Architecture Overview

Layer 1: Data Acquisition

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         DataManager (Facade)            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  OpenBB Client (Primary)        β”‚   β”‚
β”‚  β”‚  YFinance Client (Backup)       β”‚   β”‚
β”‚  β”‚  FRED Client (Risk-Free Rate)   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Cache Layer (15min TTL)        β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Pattern: Facade pattern with automatic fallback Resilience: Dual data sources, file-based caching Performance: Minimizes API calls via intelligent caching

Layer 2: Mathematical Core

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     BreedenlitzenbergPDF                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  1. SABR Calibration            β”‚   β”‚
β”‚  β”‚  2. IV Interpolation            β”‚   β”‚
β”‚  β”‚  3. Call Price Calculation      β”‚   β”‚
β”‚  β”‚  4. Numerical Differentiation   β”‚   β”‚
β”‚  β”‚  5. PDF Normalization           β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  PDFStatistics Calculator       β”‚   β”‚
β”‚  β”‚  (mean, std, skew, kurtosis)    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Pattern: Pipeline pattern Numerical Methods: Savitzky-Golay smoothing, gradient-based derivatives Robustness: Edge case handling, non-negativity constraints

Layer 3: AI Interpretation

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        PDFInterpreter                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  PDFPatternMatcher              β”‚   β”‚
β”‚  β”‚  (cosine similarity)            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Ollama Client                  β”‚   β”‚
β”‚  β”‚  (with fallback)                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Prompt Templates               β”‚   β”‚
β”‚  β”‚  (4 analysis modes)             β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Pattern: Strategy pattern (4 interpretation modes) AI Architecture: Local LLM with graceful degradation Pattern Matching: 70% shape similarity + 30% statistical similarity

Layer 4: Visualization

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Plotly Visualization Suite         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  2D PDF Plots                   β”‚   β”‚
β”‚  β”‚  PDF Comparison Plots           β”‚   β”‚
β”‚  β”‚  CDF Plots                      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            +                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  3D Surface (StrikeΓ—TimeΓ—Prob)  β”‚   β”‚
β”‚  β”‚  Heatmap (2D alternative)       β”‚   β”‚
β”‚  β”‚  Wireframe (skeleton view)      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            +                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Probability Tables             β”‚   β”‚
β”‚  β”‚  (color-coded, interactive)     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Design Pattern: Factory pattern for plot creation Theming: Dark theme with consistent styling Interactivity: Full Plotly features (hover, zoom, rotate)


Mathematical Foundation

1. Breeden-Litzenberger Formula (Core Algorithm)

Derivation:

The price of a European call option can be expressed as:

C(K) = e^(-rT) Γ— ∫[K to ∞] (S - K) Γ— f(S) dS

Taking the first derivative:

βˆ‚C/βˆ‚K = -e^(-rT) Γ— ∫[K to ∞] f(S) dS = -e^(-rT) Γ— P(S > K)

Taking the second derivative:

βˆ‚Β²C/βˆ‚KΒ² = e^(-rT) Γ— f(K)

Rearranging:

f(K) = e^(rT) Γ— βˆ‚Β²C/βˆ‚KΒ²

Implementation Challenges:

  1. Need smooth call price function β†’ Use SABR interpolation
  2. Numerical differentiation is noisy β†’ Apply Savitzky-Golay filter
  3. Can produce negative densities β†’ Enforce non-negativity
  4. Must integrate to 1 β†’ Normalize using trapezoid rule

2. SABR Volatility Model

Model Equations:

dF = Ξ± Γ— F^Ξ² Γ— dW₁
dΞ± = Ξ½ Γ— Ξ± Γ— dWβ‚‚
dW₁ Γ— dWβ‚‚ = ρ dt

Parameters:

  • Ξ± = volatility of volatility
  • Ξ² = elasticity (typically 0.5 for equities)
  • ρ = correlation between price and volatility
  • Ξ½ = vol-of-vol

Calibration: Minimize sum of squared errors between market IV and model IV using Nelder-Mead optimization.

Why SABR?: Captures volatility smile/skew better than Black-Scholes, industry standard for equity options.

3. Pattern Matching Algorithm

Similarity Score:

similarity = 0.7 Γ— shape_similarity + 0.3 Γ— stats_similarity

Shape Similarity (Cosine):

cos(ΞΈ) = (A Β· B) / (||A|| Γ— ||B||)

Where A and B are PDF vectors.

Stats Similarity:

sim = 1 - mean_abs_diff([skew, kurtosis, implied_move])

Why This Works: Combines global shape (cosine) with specific moments (stats) to find truly similar distributions.


Implementation Details

Phase 1: Foundation (100% Complete)

Files Created:

  • src/data/openbb_client.py (169 lines)
  • src/data/yfinance_client.py (142 lines)
  • src/data/fred_client.py (89 lines)
  • src/data/cache.py (98 lines)
  • src/data/data_manager.py (186 lines)

Key Features:

  • Automatic fallback between data sources
  • File-based caching with TTL
  • Comprehensive error handling
  • Type hints throughout

Testing: 15 tests covering normal operation and edge cases

Phase 2: Core Math (100% Complete)

Files Created:

  • src/core/breeden_litz.py (287 lines) ⭐ CORE ALGORITHM
  • src/core/sabr.py (201 lines)
  • src/core/statistics.py (234 lines)

Key Features:

  • SABR calibration with fallback to cubic spline
  • Breeden-Litzenberger with numerical smoothing
  • Complete PDF statistics (13 metrics)
  • CDF and probability queries

Testing: 22 tests covering calculation accuracy and edge cases

Phase 3: Visualization (100% Complete)

Files Created:

  • src/visualization/themes.py (73 lines)
  • src/visualization/pdf_2d.py (312 lines)
  • src/visualization/surface_3d.py (264 lines)
  • src/visualization/probability_table.py (189 lines)

Key Features:

  • Dark theme configuration system
  • 4 types of 2D plots
  • 3 types of 3D visualizations
  • 4 types of probability tables
  • Full interactivity

Testing: 18 tests covering plot generation and formatting

Phase 4: AI Interpretation (100% Complete)

Files Created:

  • src/ai/prompts.py (198 lines)
  • src/ai/interpreter.py (245 lines)
  • src/core/patterns.py (276 lines)

Key Features:

  • 4 interpretation modes (standard, conservative, aggressive, educational)
  • Pattern matching with cosine similarity
  • Ollama client with graceful fallback
  • Rule-based interpretation system

Testing: 21 tests covering AI components and pattern matching

Phases 5-7: Pending

Phase 5: SQLite schema, SQLAlchemy models, prediction tracking Phase 6: Streamlit UI with 4 pages Phase 7: Docker, testing, deployment to HuggingFace Spaces


Key Algorithms

Algorithm 1: PDF Extraction

def _breeden_litzenberger(self, strikes, call_prices, r, T):
    """
    Extract risk-neutral PDF from call prices.

    f(K) = e^(rT) Γ— βˆ‚Β²C/βˆ‚KΒ²
    """
    # Calculate gradients (numerical derivatives)
    dK = np.gradient(strikes)
    dC_dK = np.gradient(call_prices, strikes)
    d2C_dK2 = np.gradient(dC_dK, strikes)

    # Apply Breeden-Litzenberger formula
    pdf = np.exp(r * T) * d2C_dK2

    # Enforce non-negativity
    pdf = np.maximum(pdf, 0)

    # Normalize to integrate to 1
    pdf = pdf / np.trapz(pdf, strikes)

    return pdf

Complexity: O(n) where n = number of strikes Accuracy: Depends on strike density and IV smoothness

Algorithm 2: SABR Calibration

def calibrate(self, strikes, implied_vols, forward, tau):
    """
    Calibrate SABR to market IV smile.
    """
    def objective(params):
        alpha, rho, nu = params
        model_vols = self._sabr_formula(strikes, forward, alpha, rho, nu, self.beta, tau)
        return np.sum((model_vols - implied_vols) ** 2)

    initial_guess = [0.2, -0.3, 0.4]
    bounds = [(0.001, 2.0), (-0.999, 0.999), (0.001, 2.0)]

    result = minimize(
        objective,
        initial_guess,
        method='Nelder-Mead',
        bounds=bounds,
        options={'maxiter': 1000}
    )

    self.alpha, self.rho, self.nu = result.x
    return result

Complexity: O(m Γ— n) where m = iterations, n = strikes Convergence: Typically <100 iterations for equity options

Algorithm 3: Pattern Matching

def _calculate_similarity(self, current_pdf, current_strikes, hist_pdf, hist_strikes, current_stats, hist_stats):
    """
    Calculate combined similarity score.
    """
    # Shape similarity (cosine)
    shape_sim = self._pdf_shape_similarity(
        current_pdf, current_strikes,
        hist_pdf, hist_strikes
    )

    # Statistical similarity
    stats_sim = self._stats_similarity(current_stats, hist_stats)

    # Weighted combination
    return 0.7 * shape_sim + 0.3 * stats_sim

Complexity: O(n) for interpolation + O(n) for dot product Accuracy: Validated against synthetic test cases


Data Flow

Complete Pipeline (End-to-End)

1. User Request
   ↓
2. DataManager.get_options("SPY")
   β”œβ”€> OpenBB Client (try primary)
   └─> YFinance Client (fallback if needed)
   ↓
3. FREDClient.get_risk_free_rate()
   ↓
4. SABRModel.calibrate(strikes, IVs)
   β”œβ”€> Optimize Ξ±, ρ, Ξ½ parameters
   └─> Fallback to CubicSpline if fails
   ↓
5. SABRModel.interpolate_iv(fine_strikes)
   ↓
6. BreedenlitzenbergPDF.calculate_pdf(strikes, IVs, spot, r, T)
   β”œβ”€> Calculate call prices
   β”œβ”€> Apply Breeden-Litzenberger formula
   β”œβ”€> Smooth with Savitzky-Golay
   └─> Normalize to integrate to 1
   ↓
7. PDFStatistics.calculate_all_stats()
   β”œβ”€> Mean, std, skewness, kurtosis
   β”œβ”€> Implied move, tail probabilities
   └─> Confidence intervals
   ↓
8. PDFPatternMatcher.find_similar_patterns()
   β”œβ”€> Load historical PDFs
   β”œβ”€> Calculate similarity scores
   └─> Return top matches
   ↓
9. PDFInterpreter.interpret_single_pdf()
   β”œβ”€> Format prompt with stats & patterns
   β”œβ”€> Try Ollama.generate()
   └─> Fallback to rule-based if Ollama unavailable
   ↓
10. Visualization Functions
    β”œβ”€> create_3d_surface()
    β”œβ”€> plot_pdf_2d()
    β”œβ”€> create_probability_table()
    └─> Return interactive Plotly figures
    ↓
11. Return Results to User
    β”œβ”€> PDF values & strikes
    β”œβ”€> All statistics
    β”œβ”€> Pattern matches
    β”œβ”€> AI interpretation
    └─> Plotly figures

Data Flow Timing (Approximate)

  • Data fetch: ~1-3 seconds (or instant if cached)
  • SABR calibration: ~0.1-0.5 seconds
  • PDF calculation: ~0.05 seconds
  • Statistics: ~0.01 seconds
  • Pattern matching: ~0.1-1 second (depends on history size)
  • AI interpretation: ~2-5 seconds (or ~0.01s for fallback)
  • Visualization: ~0.1-0.5 seconds

Total: 3-10 seconds for complete analysis


Testing Strategy

Unit Tests

Coverage: High coverage across all modules

Approach:

  • Test normal operation
  • Test edge cases (empty data, extreme values)
  • Test error conditions
  • Test fallback mechanisms

Example (from test_core_math.py):

def test_pdf_normalization():
    """Ensure PDF integrates to 1.0."""
    pdf_calc = BreedenlitzenbergPDF()
    pdf = pdf_calc.calculate_pdf(...)
    integral = trapz(pdf, strikes)
    assert abs(integral - 1.0) < 1e-6

Integration Tests

Example (from test_ai_components.py):

def test_integration_ai_workflow():
    """Test complete AI workflow end-to-end."""
    # 1. Create PDF
    # 2. Calculate statistics
    # 3. Find patterns
    # 4. Generate interpretation
    # All steps must complete successfully

Fallback Tests

Critical: Ensure system works when external dependencies fail

Tests:

  • OpenBB fails β†’ YFinance succeeds
  • SABR fails β†’ Cubic spline succeeds
  • Ollama unavailable β†’ Rule-based interpretation succeeds

Future Enhancements

Phase 5: Database & History

Schema Design:

CREATE TABLE pdf_snapshots (
    id INTEGER PRIMARY KEY,
    timestamp DATETIME,
    ticker TEXT,
    days_to_expiry INTEGER,
    spot_price REAL,
    strikes BLOB,
    pdf_values BLOB,
    stats JSON,
    interpretation TEXT,
    model_used TEXT
);

CREATE TABLE predictions (
    id INTEGER PRIMARY KEY,
    forecast_date DATETIME,
    target_date DATETIME,
    predicted_prob REAL,
    condition TEXT,
    target_level REAL,
    actual_outcome BOOLEAN,
    actual_price REAL,
    evaluation_date DATETIME
);

CREATE TABLE pattern_matches (
    id INTEGER PRIMARY KEY,
    current_snapshot_id INTEGER,
    historical_snapshot_id INTEGER,
    similarity_score REAL,
    shape_similarity REAL,
    stats_similarity REAL
);

ChromaDB Integration:

  • Store PDF embeddings for vector search
  • Fast similarity search across thousands of historical PDFs

Phase 6: Streamlit App

Pages:

  1. Live Analysis: Real-time PDF extraction and visualization
  2. Historical: Browse past PDFs and patterns
  3. Predictions: Track accuracy of market expectations
  4. About: Documentation and explanation

Features:

  • Ticker selection
  • Expiration date selection
  • Analysis mode selection
  • Export to CSV/PNG
  • Dark/light theme toggle

Phase 7: Deployment

Docker:

FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["streamlit", "run", "app/streamlit_app.py"]

HuggingFace Spaces:

  • Free hosting for ML demos
  • Automatic builds from Git
  • Environment variables for API keys

Technical Challenges & Solutions

Challenge 1: Noisy Numerical Derivatives

Problem: βˆ‚Β²C/βˆ‚KΒ² amplifies noise in option prices Solution:

  1. Use SABR to interpolate IV (creates smooth curve)
  2. Apply Savitzky-Golay filter (polynomial smoothing)
  3. Use dense strike grid (200+ points)

Challenge 2: Data Source Reliability

Problem: APIs can be down or rate-limited Solution:

  1. Dual data sources (OpenBB + yfinance)
  2. Automatic fallback in DataManager
  3. File-based caching (15min TTL)

Challenge 3: SABR Calibration Failures

Problem: Sometimes fails to converge Solution:

  1. Fallback to cubic spline interpolation
  2. Graceful degradation (still produces PDF)
  3. Log warnings for debugging

Challenge 4: Ollama Availability

Problem: User may not have Ollama installed Solution:

  1. Check availability at runtime
  2. Provide rule-based interpretation fallback
  3. Fallback quality is surprisingly good (tested)

Performance Optimization

Current Performance

Bottlenecks:

  1. API calls (1-3 seconds) β†’ Mitigated by caching
  2. SABR calibration (~0.3 seconds) β†’ Acceptable
  3. Pattern matching (~0.5 seconds) β†’ Will improve with indexing

Future Optimizations:

  1. Database indexing: Speed up historical queries
  2. Caching layer: Redis for distributed caching
  3. Parallel processing: Multiple expirations in parallel
  4. Incremental updates: Only recalculate changed data

Conclusion

This project demonstrates:

  • Advanced quantitative finance: Option-implied probabilities
  • Robust software engineering: Error handling, fallbacks, testing
  • Modern ML/AI: Local LLM integration with graceful degradation
  • Interactive visualization: 3D graphics with full interactivity
  • Production-ready code: Type hints, documentation, comprehensive tests

Current Status: 57% complete (4/7 phases) Next Milestone: Phase 5 - Database & History Timeline: Ready for deployment after Phase 7


Last Updated: 2025-12-01 Author: Built with Claude Code License: MIT