Spaces:

arjitmat
/

option-pdf-vis

Sleeping

App Files Files Community

option-pdf-vis / docs /PROJECT_EXPLANATION.md

Arjit

Production-ready Option-Implied PDF Visualizer

8e1643b about 2 months ago

preview code

raw

history blame contribute delete

22.9 kB

Option-Implied PDF Visualizer - Complete Project Explanation

Executive Summary

This project extracts market expectations about future stock prices from options markets and presents them as intuitive 3D visualizations with AI-powered interpretations.

Status: ✅ Phase 8 Complete (100% - Production Ready) Last Updated: 2025-12-08 Repository Type: Solo project with AI assistance (Claude Code) Live Interfaces: Streamlit (port 8501) + React SPA (port 5173 + FastAPI backend 8000)

Non-Technical Explanation
Technical Explanation
Architecture Overview
Mathematical Foundation
Implementation Details
Key Algorithms
Data Flow
Testing Strategy
Future Enhancements

Non-Technical Explanation

What Problem Does This Solve?

When traders buy and sell options, they're essentially placing bets on where they think a stock price will go. These bets contain valuable information about the market's collective expectations. This tool extracts that hidden information and makes it visible.

What Does It Do?

Imagine you could see a 3D landscape showing:

X-axis: Different possible stock prices (strikes)
Y-axis: Time into the future (days to expiration)
Z-axis: How likely each price is (probability)

The tool creates this landscape and then uses AI to explain what it means in plain English.

Why Is This Useful?

For Traders: Understand where the market expects prices to move and how much uncertainty exists.

For Risk Managers: Quantify tail risk and see probability distributions.

For Researchers: Study historical probability distributions and prediction accuracy.

For Students: Learn derivatives pricing and market microstructure.

Real-World Example

Imagine SPY is trading at $450. The tool might show:

68% chance price stays between $436-$467 in 30 days
22% chance of +5% move (bullish tilt)
18% chance of -5% move
Negative skewness (-0.15) suggests slight downside bias
The AI explains: "Market is pricing in moderate uncertainty with slight bearish lean, similar to pre-Fed-announcement patterns in October 2023."

Technical Explanation

Core Concept: Risk-Neutral Probability Density

Options markets implicitly encode a risk-neutral probability distribution for future asset prices. The Breeden-Litzenberger (1978) formula allows us to extract this distribution by taking the second derivative of call option prices with respect to strike:

f(K) = e^(rT) × ∂²C/∂K²

Where:

f(K) = risk-neutral probability density at strike K
C = call option price as a function of strike
r = risk-free rate
T = time to expiration
e^(rT) = discount factor

Why This Matters

Traditional Approach: Implied volatility gives a single number (expected magnitude of moves)

This Approach: Full probability distribution showing:

Mean and variance (expected price and uncertainty)
Skewness (directional bias)
Kurtosis (fat tails / crash risk)
Specific probabilities for any price level

Technical Stack

Backend:

Python 3.11+ (type hints, modern syntax)
NumPy/SciPy (numerical computation)
Pandas (data manipulation)

Data Sources:

OpenBB Terminal (primary option chain data)
yfinance (backup data source)
FRED API (risk-free rate)

Models:

SABR (Stochastic Alpha Beta Rho) volatility model
Cubic spline interpolation (fallback)
Cosine similarity for pattern matching

AI:

Ollama (local LLM inference)
Qwen3-7B (7 billion parameter language model)
Intelligent fallback for offline operation

Visualization:

Plotly (interactive 3D graphics)
Dark theme with professional styling

Database (Phase 5):

SQLite (time series storage)
ChromaDB (vector search for patterns)

Frontend (Phase 6):

Streamlit (Python web framework)

Deployment (Phase 7):

Docker containerization
HuggingFace Spaces hosting

Architecture Overview

Layer 1: Data Acquisition

┌─────────────────────────────────────────┐
│         DataManager (Facade)            │
│  ┌─────────────────────────────────┐   │
│  │  OpenBB Client (Primary)        │   │
│  │  YFinance Client (Backup)       │   │
│  │  FRED Client (Risk-Free Rate)   │   │
│  └─────────────────────────────────┘   │
│            ↓                            │
│  ┌─────────────────────────────────┐   │
│  │  Cache Layer (15min TTL)        │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Design Pattern: Facade pattern with automatic fallback Resilience: Dual data sources, file-based caching Performance: Minimizes API calls via intelligent caching

Layer 2: Mathematical Core

┌─────────────────────────────────────────┐
│     BreedenlitzenbergPDF                │
│  ┌─────────────────────────────────┐   │
│  │  1. SABR Calibration            │   │
│  │  2. IV Interpolation            │   │
│  │  3. Call Price Calculation      │   │
│  │  4. Numerical Differentiation   │   │
│  │  5. PDF Normalization           │   │
│  └─────────────────────────────────┘   │
│            ↓                            │
│  ┌─────────────────────────────────┐   │
│  │  PDFStatistics Calculator       │   │
│  │  (mean, std, skew, kurtosis)    │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Design Pattern: Pipeline pattern Numerical Methods: Savitzky-Golay smoothing, gradient-based derivatives Robustness: Edge case handling, non-negativity constraints

Layer 3: AI Interpretation

┌─────────────────────────────────────────┐
│        PDFInterpreter                   │
│  ┌─────────────────────────────────┐   │
│  │  PDFPatternMatcher              │   │
│  │  (cosine similarity)            │   │
│  └─────────────────────────────────┘   │
│            ↓                            │
│  ┌─────────────────────────────────┐   │
│  │  Ollama Client                  │   │
│  │  (with fallback)                │   │
│  └─────────────────────────────────┘   │
│            ↓                            │
│  ┌─────────────────────────────────┐   │
│  │  Prompt Templates               │   │
│  │  (4 analysis modes)             │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Design Pattern: Strategy pattern (4 interpretation modes) AI Architecture: Local LLM with graceful degradation Pattern Matching: 70% shape similarity + 30% statistical similarity

Layer 4: Visualization

┌─────────────────────────────────────────┐
│      Plotly Visualization Suite         │
│  ┌─────────────────────────────────┐   │
│  │  2D PDF Plots                   │   │
│  │  PDF Comparison Plots           │   │
│  │  CDF Plots                      │   │
│  └─────────────────────────────────┘   │
│            +                            │
│  ┌─────────────────────────────────┐   │
│  │  3D Surface (Strike×Time×Prob)  │   │
│  │  Heatmap (2D alternative)       │   │
│  │  Wireframe (skeleton view)      │   │
│  └─────────────────────────────────┘   │
│            +                            │
│  ┌─────────────────────────────────┐   │
│  │  Probability Tables             │   │
│  │  (color-coded, interactive)     │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

Design Pattern: Factory pattern for plot creation Theming: Dark theme with consistent styling Interactivity: Full Plotly features (hover, zoom, rotate)

Mathematical Foundation

1. Breeden-Litzenberger Formula (Core Algorithm)

Derivation:

The price of a European call option can be expressed as:

C(K) = e^(-rT) × ∫[K to ∞] (S - K) × f(S) dS

Taking the first derivative:

∂C/∂K = -e^(-rT) × ∫[K to ∞] f(S) dS = -e^(-rT) × P(S > K)

Taking the second derivative:

∂²C/∂K² = e^(-rT) × f(K)

Rearranging:

f(K) = e^(rT) × ∂²C/∂K²

Implementation Challenges:

Need smooth call price function → Use SABR interpolation
Numerical differentiation is noisy → Apply Savitzky-Golay filter
Can produce negative densities → Enforce non-negativity
Must integrate to 1 → Normalize using trapezoid rule

2. SABR Volatility Model

Model Equations:

dF = α × F^β × dW₁
dα = ν × α × dW₂
dW₁ × dW₂ = ρ dt

Parameters:

α = volatility of volatility
β = elasticity (typically 0.5 for equities)
ρ = correlation between price and volatility
ν = vol-of-vol

Calibration: Minimize sum of squared errors between market IV and model IV using Nelder-Mead optimization.

Why SABR?: Captures volatility smile/skew better than Black-Scholes, industry standard for equity options.

3. Pattern Matching Algorithm

Similarity Score:

similarity = 0.7 × shape_similarity + 0.3 × stats_similarity

Shape Similarity (Cosine):

cos(θ) = (A · B) / (||A|| × ||B||)

Where A and B are PDF vectors.

Stats Similarity:

sim = 1 - mean_abs_diff([skew, kurtosis, implied_move])

Why This Works: Combines global shape (cosine) with specific moments (stats) to find truly similar distributions.

Implementation Details

Phase 1: Foundation (100% Complete)

Files Created:

src/data/openbb_client.py (169 lines)
src/data/yfinance_client.py (142 lines)
src/data/fred_client.py (89 lines)
src/data/cache.py (98 lines)
src/data/data_manager.py (186 lines)

Key Features:

Automatic fallback between data sources
File-based caching with TTL
Comprehensive error handling
Type hints throughout

Testing: 15 tests covering normal operation and edge cases

Phase 2: Core Math (100% Complete)

Files Created:

src/core/breeden_litz.py (287 lines) ⭐ CORE ALGORITHM
src/core/sabr.py (201 lines)
src/core/statistics.py (234 lines)

Key Features:

SABR calibration with fallback to cubic spline
Breeden-Litzenberger with numerical smoothing
Complete PDF statistics (13 metrics)
CDF and probability queries

Testing: 22 tests covering calculation accuracy and edge cases

Phase 3: Visualization (100% Complete)

Files Created:

src/visualization/themes.py (73 lines)
src/visualization/pdf_2d.py (312 lines)
src/visualization/surface_3d.py (264 lines)
src/visualization/probability_table.py (189 lines)

Key Features:

Dark theme configuration system
4 types of 2D plots
3 types of 3D visualizations
4 types of probability tables
Full interactivity

Testing: 18 tests covering plot generation and formatting

Phase 4: AI Interpretation (100% Complete)

Files Created:

src/ai/prompts.py (198 lines)
src/ai/interpreter.py (245 lines)
src/core/patterns.py (276 lines)

Key Features:

4 interpretation modes (standard, conservative, aggressive, educational)
Pattern matching with cosine similarity
Ollama client with graceful fallback
Rule-based interpretation system

Testing: 21 tests covering AI components and pattern matching

Phases 5-7: Pending

Phase 5: SQLite schema, SQLAlchemy models, prediction tracking Phase 6: Streamlit UI with 4 pages Phase 7: Docker, testing, deployment to HuggingFace Spaces

Key Algorithms

Algorithm 1: PDF Extraction

def _breeden_litzenberger(self, strikes, call_prices, r, T):
    """
    Extract risk-neutral PDF from call prices.

    f(K) = e^(rT) × ∂²C/∂K²
    """
    # Calculate gradients (numerical derivatives)
    dK = np.gradient(strikes)
    dC_dK = np.gradient(call_prices, strikes)
    d2C_dK2 = np.gradient(dC_dK, strikes)

    # Apply Breeden-Litzenberger formula
    pdf = np.exp(r * T) * d2C_dK2

    # Enforce non-negativity
    pdf = np.maximum(pdf, 0)

    # Normalize to integrate to 1
    pdf = pdf / np.trapz(pdf, strikes)

    return pdf

Complexity: O(n) where n = number of strikes Accuracy: Depends on strike density and IV smoothness

Algorithm 2: SABR Calibration

def calibrate(self, strikes, implied_vols, forward, tau):
    """
    Calibrate SABR to market IV smile.
    """
    def objective(params):
        alpha, rho, nu = params
        model_vols = self._sabr_formula(strikes, forward, alpha, rho, nu, self.beta, tau)
        return np.sum((model_vols - implied_vols) ** 2)

    initial_guess = [0.2, -0.3, 0.4]
    bounds = [(0.001, 2.0), (-0.999, 0.999), (0.001, 2.0)]

    result = minimize(
        objective,
        initial_guess,
        method='Nelder-Mead',
        bounds=bounds,
        options={'maxiter': 1000}
    )

    self.alpha, self.rho, self.nu = result.x
    return result

Complexity: O(m × n) where m = iterations, n = strikes Convergence: Typically <100 iterations for equity options

Algorithm 3: Pattern Matching

def _calculate_similarity(self, current_pdf, current_strikes, hist_pdf, hist_strikes, current_stats, hist_stats):
    """
    Calculate combined similarity score.
    """
    # Shape similarity (cosine)
    shape_sim = self._pdf_shape_similarity(
        current_pdf, current_strikes,
        hist_pdf, hist_strikes
    )

    # Statistical similarity
    stats_sim = self._stats_similarity(current_stats, hist_stats)

    # Weighted combination
    return 0.7 * shape_sim + 0.3 * stats_sim

Complexity: O(n) for interpolation + O(n) for dot product Accuracy: Validated against synthetic test cases

Data Flow

Complete Pipeline (End-to-End)

1. User Request
   ↓
2. DataManager.get_options("SPY")
   ├─> OpenBB Client (try primary)
   └─> YFinance Client (fallback if needed)
   ↓
3. FREDClient.get_risk_free_rate()
   ↓
4. SABRModel.calibrate(strikes, IVs)
   ├─> Optimize α, ρ, ν parameters
   └─> Fallback to CubicSpline if fails
   ↓
5. SABRModel.interpolate_iv(fine_strikes)
   ↓
6. BreedenlitzenbergPDF.calculate_pdf(strikes, IVs, spot, r, T)
   ├─> Calculate call prices
   ├─> Apply Breeden-Litzenberger formula
   ├─> Smooth with Savitzky-Golay
   └─> Normalize to integrate to 1
   ↓
7. PDFStatistics.calculate_all_stats()
   ├─> Mean, std, skewness, kurtosis
   ├─> Implied move, tail probabilities
   └─> Confidence intervals
   ↓
8. PDFPatternMatcher.find_similar_patterns()
   ├─> Load historical PDFs
   ├─> Calculate similarity scores
   └─> Return top matches
   ↓
9. PDFInterpreter.interpret_single_pdf()
   ├─> Format prompt with stats & patterns
   ├─> Try Ollama.generate()
   └─> Fallback to rule-based if Ollama unavailable
   ↓
10. Visualization Functions
    ├─> create_3d_surface()
    ├─> plot_pdf_2d()
    ├─> create_probability_table()
    └─> Return interactive Plotly figures
    ↓
11. Return Results to User
    ├─> PDF values & strikes
    ├─> All statistics
    ├─> Pattern matches
    ├─> AI interpretation
    └─> Plotly figures

Data Flow Timing (Approximate)

Data fetch: ~1-3 seconds (or instant if cached)
SABR calibration: ~0.1-0.5 seconds
PDF calculation: ~0.05 seconds
Statistics: ~0.01 seconds
Pattern matching: ~0.1-1 second (depends on history size)
AI interpretation: ~2-5 seconds (or ~0.01s for fallback)
Visualization: ~0.1-0.5 seconds

Total: 3-10 seconds for complete analysis

Testing Strategy

Unit Tests

Coverage: High coverage across all modules

Approach:

Test normal operation
Test edge cases (empty data, extreme values)
Test error conditions
Test fallback mechanisms

Example (from test_core_math.py):

def test_pdf_normalization():
    """Ensure PDF integrates to 1.0."""
    pdf_calc = BreedenlitzenbergPDF()
    pdf = pdf_calc.calculate_pdf(...)
    integral = trapz(pdf, strikes)
    assert abs(integral - 1.0) < 1e-6

Integration Tests

Example (from test_ai_components.py):

def test_integration_ai_workflow():
    """Test complete AI workflow end-to-end."""
    # 1. Create PDF
    # 2. Calculate statistics
    # 3. Find patterns
    # 4. Generate interpretation
    # All steps must complete successfully

Fallback Tests

Critical: Ensure system works when external dependencies fail

Tests:

OpenBB fails → YFinance succeeds
SABR fails → Cubic spline succeeds
Ollama unavailable → Rule-based interpretation succeeds

Future Enhancements

Phase 5: Database & History

Schema Design:

CREATE TABLE pdf_snapshots (
    id INTEGER PRIMARY KEY,
    timestamp DATETIME,
    ticker TEXT,
    days_to_expiry INTEGER,
    spot_price REAL,
    strikes BLOB,
    pdf_values BLOB,
    stats JSON,
    interpretation TEXT,
    model_used TEXT
);

CREATE TABLE predictions (
    id INTEGER PRIMARY KEY,
    forecast_date DATETIME,
    target_date DATETIME,
    predicted_prob REAL,
    condition TEXT,
    target_level REAL,
    actual_outcome BOOLEAN,
    actual_price REAL,
    evaluation_date DATETIME
);

CREATE TABLE pattern_matches (
    id INTEGER PRIMARY KEY,
    current_snapshot_id INTEGER,
    historical_snapshot_id INTEGER,
    similarity_score REAL,
    shape_similarity REAL,
    stats_similarity REAL
);

ChromaDB Integration:

Store PDF embeddings for vector search
Fast similarity search across thousands of historical PDFs

Phase 6: Streamlit App

Pages:

Live Analysis: Real-time PDF extraction and visualization
Historical: Browse past PDFs and patterns
Predictions: Track accuracy of market expectations
About: Documentation and explanation

Features:

Ticker selection
Expiration date selection
Analysis mode selection
Export to CSV/PNG
Dark/light theme toggle

Phase 7: Deployment

Docker:

FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["streamlit", "run", "app/streamlit_app.py"]

HuggingFace Spaces:

Free hosting for ML demos
Automatic builds from Git
Environment variables for API keys

Technical Challenges & Solutions

Challenge 1: Noisy Numerical Derivatives

Problem: ∂²C/∂K² amplifies noise in option prices Solution:

Use SABR to interpolate IV (creates smooth curve)
Apply Savitzky-Golay filter (polynomial smoothing)
Use dense strike grid (200+ points)

Challenge 2: Data Source Reliability

Problem: APIs can be down or rate-limited Solution:

Dual data sources (OpenBB + yfinance)
Automatic fallback in DataManager
File-based caching (15min TTL)

Challenge 3: SABR Calibration Failures

Problem: Sometimes fails to converge Solution:

Fallback to cubic spline interpolation
Graceful degradation (still produces PDF)
Log warnings for debugging

Challenge 4: Ollama Availability

Problem: User may not have Ollama installed Solution:

Check availability at runtime
Provide rule-based interpretation fallback
Fallback quality is surprisingly good (tested)

Performance Optimization

Current Performance

Bottlenecks:

API calls (1-3 seconds) → Mitigated by caching
SABR calibration (~0.3 seconds) → Acceptable
Pattern matching (~0.5 seconds) → Will improve with indexing

Future Optimizations:

Database indexing: Speed up historical queries
Caching layer: Redis for distributed caching
Parallel processing: Multiple expirations in parallel
Incremental updates: Only recalculate changed data

Conclusion

This project demonstrates:

Advanced quantitative finance: Option-implied probabilities
Robust software engineering: Error handling, fallbacks, testing
Modern ML/AI: Local LLM integration with graceful degradation
Interactive visualization: 3D graphics with full interactivity
Production-ready code: Type hints, documentation, comprehensive tests

Current Status: 57% complete (4/7 phases) Next Milestone: Phase 5 - Database & History Timeline: Ready for deployment after Phase 7

Last Updated: 2025-12-01 Author: Built with Claude Code License: MIT

Option-Implied PDF Visualizer - Complete Project Explanation

Executive Summary

Table of Contents

Non-Technical Explanation

What Problem Does This Solve?

What Does It Do?

Why Is This Useful?

Real-World Example

Technical Explanation

Core Concept: Risk-Neutral Probability Density

Why This Matters

Technical Stack

Architecture Overview

Layer 1: Data Acquisition

Layer 2: Mathematical Core

Layer 3: AI Interpretation

Layer 4: Visualization

Mathematical Foundation

1. Breeden-Litzenberger Formula (Core Algorithm)

2. SABR Volatility Model

3. Pattern Matching Algorithm

Implementation Details

Phase 1: Foundation (100% Complete)

Phase 2: Core Math (100% Complete)

Phase 3: Visualization (100% Complete)

Phase 4: AI Interpretation (100% Complete)

Phases 5-7: Pending

Key Algorithms

Algorithm 1: PDF Extraction

Algorithm 2: SABR Calibration

Algorithm 3: Pattern Matching

Data Flow

Complete Pipeline (End-to-End)

Data Flow Timing (Approximate)

Testing Strategy

Unit Tests

Integration Tests

Fallback Tests

Future Enhancements

Phase 5: Database & History

Phase 6: Streamlit App

Phase 7: Deployment

Technical Challenges & Solutions

Challenge 1: Noisy Numerical Derivatives

Challenge 2: Data Source Reliability

Challenge 3: SABR Calibration Failures

Challenge 4: Ollama Availability

Performance Optimization

Current Performance

Conclusion