Spaces:
Sleeping
Option-Implied PDF Visualizer - Complete Project Explanation
Executive Summary
This project extracts market expectations about future stock prices from options markets and presents them as intuitive 3D visualizations with AI-powered interpretations.
Status: β Phase 8 Complete (100% - Production Ready) Last Updated: 2025-12-08 Repository Type: Solo project with AI assistance (Claude Code) Live Interfaces: Streamlit (port 8501) + React SPA (port 5173 + FastAPI backend 8000)
Table of Contents
- Non-Technical Explanation
- Technical Explanation
- Architecture Overview
- Mathematical Foundation
- Implementation Details
- Key Algorithms
- Data Flow
- Testing Strategy
- Future Enhancements
Non-Technical Explanation
What Problem Does This Solve?
When traders buy and sell options, they're essentially placing bets on where they think a stock price will go. These bets contain valuable information about the market's collective expectations. This tool extracts that hidden information and makes it visible.
What Does It Do?
Imagine you could see a 3D landscape showing:
- X-axis: Different possible stock prices (strikes)
- Y-axis: Time into the future (days to expiration)
- Z-axis: How likely each price is (probability)
The tool creates this landscape and then uses AI to explain what it means in plain English.
Why Is This Useful?
For Traders: Understand where the market expects prices to move and how much uncertainty exists.
For Risk Managers: Quantify tail risk and see probability distributions.
For Researchers: Study historical probability distributions and prediction accuracy.
For Students: Learn derivatives pricing and market microstructure.
Real-World Example
Imagine SPY is trading at $450. The tool might show:
- 68% chance price stays between $436-$467 in 30 days
- 22% chance of +5% move (bullish tilt)
- 18% chance of -5% move
- Negative skewness (-0.15) suggests slight downside bias
- The AI explains: "Market is pricing in moderate uncertainty with slight bearish lean, similar to pre-Fed-announcement patterns in October 2023."
Technical Explanation
Core Concept: Risk-Neutral Probability Density
Options markets implicitly encode a risk-neutral probability distribution for future asset prices. The Breeden-Litzenberger (1978) formula allows us to extract this distribution by taking the second derivative of call option prices with respect to strike:
f(K) = e^(rT) Γ βΒ²C/βKΒ²
Where:
f(K)= risk-neutral probability density at strike KC= call option price as a function of striker= risk-free rateT= time to expiratione^(rT)= discount factor
Why This Matters
Traditional Approach: Implied volatility gives a single number (expected magnitude of moves)
This Approach: Full probability distribution showing:
- Mean and variance (expected price and uncertainty)
- Skewness (directional bias)
- Kurtosis (fat tails / crash risk)
- Specific probabilities for any price level
Technical Stack
Backend:
- Python 3.11+ (type hints, modern syntax)
- NumPy/SciPy (numerical computation)
- Pandas (data manipulation)
Data Sources:
- OpenBB Terminal (primary option chain data)
- yfinance (backup data source)
- FRED API (risk-free rate)
Models:
- SABR (Stochastic Alpha Beta Rho) volatility model
- Cubic spline interpolation (fallback)
- Cosine similarity for pattern matching
AI:
- Ollama (local LLM inference)
- Qwen3-7B (7 billion parameter language model)
- Intelligent fallback for offline operation
Visualization:
- Plotly (interactive 3D graphics)
- Dark theme with professional styling
Database (Phase 5):
- SQLite (time series storage)
- ChromaDB (vector search for patterns)
Frontend (Phase 6):
- Streamlit (Python web framework)
Deployment (Phase 7):
- Docker containerization
- HuggingFace Spaces hosting
Architecture Overview
Layer 1: Data Acquisition
βββββββββββββββββββββββββββββββββββββββββββ
β DataManager (Facade) β
β βββββββββββββββββββββββββββββββββββ β
β β OpenBB Client (Primary) β β
β β YFinance Client (Backup) β β
β β FRED Client (Risk-Free Rate) β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β Cache Layer (15min TTL) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Design Pattern: Facade pattern with automatic fallback Resilience: Dual data sources, file-based caching Performance: Minimizes API calls via intelligent caching
Layer 2: Mathematical Core
βββββββββββββββββββββββββββββββββββββββββββ
β BreedenlitzenbergPDF β
β βββββββββββββββββββββββββββββββββββ β
β β 1. SABR Calibration β β
β β 2. IV Interpolation β β
β β 3. Call Price Calculation β β
β β 4. Numerical Differentiation β β
β β 5. PDF Normalization β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β PDFStatistics Calculator β β
β β (mean, std, skew, kurtosis) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Design Pattern: Pipeline pattern Numerical Methods: Savitzky-Golay smoothing, gradient-based derivatives Robustness: Edge case handling, non-negativity constraints
Layer 3: AI Interpretation
βββββββββββββββββββββββββββββββββββββββββββ
β PDFInterpreter β
β βββββββββββββββββββββββββββββββββββ β
β β PDFPatternMatcher β β
β β (cosine similarity) β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β Ollama Client β β
β β (with fallback) β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β Prompt Templates β β
β β (4 analysis modes) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Design Pattern: Strategy pattern (4 interpretation modes) AI Architecture: Local LLM with graceful degradation Pattern Matching: 70% shape similarity + 30% statistical similarity
Layer 4: Visualization
βββββββββββββββββββββββββββββββββββββββββββ
β Plotly Visualization Suite β
β βββββββββββββββββββββββββββββββββββ β
β β 2D PDF Plots β β
β β PDF Comparison Plots β β
β β CDF Plots β β
β βββββββββββββββββββββββββββββββββββ β
β + β
β βββββββββββββββββββββββββββββββββββ β
β β 3D Surface (StrikeΓTimeΓProb) β β
β β Heatmap (2D alternative) β β
β β Wireframe (skeleton view) β β
β βββββββββββββββββββββββββββββββββββ β
β + β
β βββββββββββββββββββββββββββββββββββ β
β β Probability Tables β β
β β (color-coded, interactive) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
Design Pattern: Factory pattern for plot creation Theming: Dark theme with consistent styling Interactivity: Full Plotly features (hover, zoom, rotate)
Mathematical Foundation
1. Breeden-Litzenberger Formula (Core Algorithm)
Derivation:
The price of a European call option can be expressed as:
C(K) = e^(-rT) Γ β«[K to β] (S - K) Γ f(S) dS
Taking the first derivative:
βC/βK = -e^(-rT) Γ β«[K to β] f(S) dS = -e^(-rT) Γ P(S > K)
Taking the second derivative:
βΒ²C/βKΒ² = e^(-rT) Γ f(K)
Rearranging:
f(K) = e^(rT) Γ βΒ²C/βKΒ²
Implementation Challenges:
- Need smooth call price function β Use SABR interpolation
- Numerical differentiation is noisy β Apply Savitzky-Golay filter
- Can produce negative densities β Enforce non-negativity
- Must integrate to 1 β Normalize using trapezoid rule
2. SABR Volatility Model
Model Equations:
dF = Ξ± Γ F^Ξ² Γ dWβ
dΞ± = Ξ½ Γ Ξ± Γ dWβ
dWβ Γ dWβ = Ο dt
Parameters:
Ξ±= volatility of volatilityΞ²= elasticity (typically 0.5 for equities)Ο= correlation between price and volatilityΞ½= vol-of-vol
Calibration: Minimize sum of squared errors between market IV and model IV using Nelder-Mead optimization.
Why SABR?: Captures volatility smile/skew better than Black-Scholes, industry standard for equity options.
3. Pattern Matching Algorithm
Similarity Score:
similarity = 0.7 Γ shape_similarity + 0.3 Γ stats_similarity
Shape Similarity (Cosine):
cos(ΞΈ) = (A Β· B) / (||A|| Γ ||B||)
Where A and B are PDF vectors.
Stats Similarity:
sim = 1 - mean_abs_diff([skew, kurtosis, implied_move])
Why This Works: Combines global shape (cosine) with specific moments (stats) to find truly similar distributions.
Implementation Details
Phase 1: Foundation (100% Complete)
Files Created:
src/data/openbb_client.py(169 lines)src/data/yfinance_client.py(142 lines)src/data/fred_client.py(89 lines)src/data/cache.py(98 lines)src/data/data_manager.py(186 lines)
Key Features:
- Automatic fallback between data sources
- File-based caching with TTL
- Comprehensive error handling
- Type hints throughout
Testing: 15 tests covering normal operation and edge cases
Phase 2: Core Math (100% Complete)
Files Created:
src/core/breeden_litz.py(287 lines) β CORE ALGORITHMsrc/core/sabr.py(201 lines)src/core/statistics.py(234 lines)
Key Features:
- SABR calibration with fallback to cubic spline
- Breeden-Litzenberger with numerical smoothing
- Complete PDF statistics (13 metrics)
- CDF and probability queries
Testing: 22 tests covering calculation accuracy and edge cases
Phase 3: Visualization (100% Complete)
Files Created:
src/visualization/themes.py(73 lines)src/visualization/pdf_2d.py(312 lines)src/visualization/surface_3d.py(264 lines)src/visualization/probability_table.py(189 lines)
Key Features:
- Dark theme configuration system
- 4 types of 2D plots
- 3 types of 3D visualizations
- 4 types of probability tables
- Full interactivity
Testing: 18 tests covering plot generation and formatting
Phase 4: AI Interpretation (100% Complete)
Files Created:
src/ai/prompts.py(198 lines)src/ai/interpreter.py(245 lines)src/core/patterns.py(276 lines)
Key Features:
- 4 interpretation modes (standard, conservative, aggressive, educational)
- Pattern matching with cosine similarity
- Ollama client with graceful fallback
- Rule-based interpretation system
Testing: 21 tests covering AI components and pattern matching
Phases 5-7: Pending
Phase 5: SQLite schema, SQLAlchemy models, prediction tracking Phase 6: Streamlit UI with 4 pages Phase 7: Docker, testing, deployment to HuggingFace Spaces
Key Algorithms
Algorithm 1: PDF Extraction
def _breeden_litzenberger(self, strikes, call_prices, r, T):
"""
Extract risk-neutral PDF from call prices.
f(K) = e^(rT) Γ βΒ²C/βKΒ²
"""
# Calculate gradients (numerical derivatives)
dK = np.gradient(strikes)
dC_dK = np.gradient(call_prices, strikes)
d2C_dK2 = np.gradient(dC_dK, strikes)
# Apply Breeden-Litzenberger formula
pdf = np.exp(r * T) * d2C_dK2
# Enforce non-negativity
pdf = np.maximum(pdf, 0)
# Normalize to integrate to 1
pdf = pdf / np.trapz(pdf, strikes)
return pdf
Complexity: O(n) where n = number of strikes Accuracy: Depends on strike density and IV smoothness
Algorithm 2: SABR Calibration
def calibrate(self, strikes, implied_vols, forward, tau):
"""
Calibrate SABR to market IV smile.
"""
def objective(params):
alpha, rho, nu = params
model_vols = self._sabr_formula(strikes, forward, alpha, rho, nu, self.beta, tau)
return np.sum((model_vols - implied_vols) ** 2)
initial_guess = [0.2, -0.3, 0.4]
bounds = [(0.001, 2.0), (-0.999, 0.999), (0.001, 2.0)]
result = minimize(
objective,
initial_guess,
method='Nelder-Mead',
bounds=bounds,
options={'maxiter': 1000}
)
self.alpha, self.rho, self.nu = result.x
return result
Complexity: O(m Γ n) where m = iterations, n = strikes Convergence: Typically <100 iterations for equity options
Algorithm 3: Pattern Matching
def _calculate_similarity(self, current_pdf, current_strikes, hist_pdf, hist_strikes, current_stats, hist_stats):
"""
Calculate combined similarity score.
"""
# Shape similarity (cosine)
shape_sim = self._pdf_shape_similarity(
current_pdf, current_strikes,
hist_pdf, hist_strikes
)
# Statistical similarity
stats_sim = self._stats_similarity(current_stats, hist_stats)
# Weighted combination
return 0.7 * shape_sim + 0.3 * stats_sim
Complexity: O(n) for interpolation + O(n) for dot product Accuracy: Validated against synthetic test cases
Data Flow
Complete Pipeline (End-to-End)
1. User Request
β
2. DataManager.get_options("SPY")
ββ> OpenBB Client (try primary)
ββ> YFinance Client (fallback if needed)
β
3. FREDClient.get_risk_free_rate()
β
4. SABRModel.calibrate(strikes, IVs)
ββ> Optimize Ξ±, Ο, Ξ½ parameters
ββ> Fallback to CubicSpline if fails
β
5. SABRModel.interpolate_iv(fine_strikes)
β
6. BreedenlitzenbergPDF.calculate_pdf(strikes, IVs, spot, r, T)
ββ> Calculate call prices
ββ> Apply Breeden-Litzenberger formula
ββ> Smooth with Savitzky-Golay
ββ> Normalize to integrate to 1
β
7. PDFStatistics.calculate_all_stats()
ββ> Mean, std, skewness, kurtosis
ββ> Implied move, tail probabilities
ββ> Confidence intervals
β
8. PDFPatternMatcher.find_similar_patterns()
ββ> Load historical PDFs
ββ> Calculate similarity scores
ββ> Return top matches
β
9. PDFInterpreter.interpret_single_pdf()
ββ> Format prompt with stats & patterns
ββ> Try Ollama.generate()
ββ> Fallback to rule-based if Ollama unavailable
β
10. Visualization Functions
ββ> create_3d_surface()
ββ> plot_pdf_2d()
ββ> create_probability_table()
ββ> Return interactive Plotly figures
β
11. Return Results to User
ββ> PDF values & strikes
ββ> All statistics
ββ> Pattern matches
ββ> AI interpretation
ββ> Plotly figures
Data Flow Timing (Approximate)
- Data fetch: ~1-3 seconds (or instant if cached)
- SABR calibration: ~0.1-0.5 seconds
- PDF calculation: ~0.05 seconds
- Statistics: ~0.01 seconds
- Pattern matching: ~0.1-1 second (depends on history size)
- AI interpretation: ~2-5 seconds (or ~0.01s for fallback)
- Visualization: ~0.1-0.5 seconds
Total: 3-10 seconds for complete analysis
Testing Strategy
Unit Tests
Coverage: High coverage across all modules
Approach:
- Test normal operation
- Test edge cases (empty data, extreme values)
- Test error conditions
- Test fallback mechanisms
Example (from test_core_math.py):
def test_pdf_normalization():
"""Ensure PDF integrates to 1.0."""
pdf_calc = BreedenlitzenbergPDF()
pdf = pdf_calc.calculate_pdf(...)
integral = trapz(pdf, strikes)
assert abs(integral - 1.0) < 1e-6
Integration Tests
Example (from test_ai_components.py):
def test_integration_ai_workflow():
"""Test complete AI workflow end-to-end."""
# 1. Create PDF
# 2. Calculate statistics
# 3. Find patterns
# 4. Generate interpretation
# All steps must complete successfully
Fallback Tests
Critical: Ensure system works when external dependencies fail
Tests:
- OpenBB fails β YFinance succeeds
- SABR fails β Cubic spline succeeds
- Ollama unavailable β Rule-based interpretation succeeds
Future Enhancements
Phase 5: Database & History
Schema Design:
CREATE TABLE pdf_snapshots (
id INTEGER PRIMARY KEY,
timestamp DATETIME,
ticker TEXT,
days_to_expiry INTEGER,
spot_price REAL,
strikes BLOB,
pdf_values BLOB,
stats JSON,
interpretation TEXT,
model_used TEXT
);
CREATE TABLE predictions (
id INTEGER PRIMARY KEY,
forecast_date DATETIME,
target_date DATETIME,
predicted_prob REAL,
condition TEXT,
target_level REAL,
actual_outcome BOOLEAN,
actual_price REAL,
evaluation_date DATETIME
);
CREATE TABLE pattern_matches (
id INTEGER PRIMARY KEY,
current_snapshot_id INTEGER,
historical_snapshot_id INTEGER,
similarity_score REAL,
shape_similarity REAL,
stats_similarity REAL
);
ChromaDB Integration:
- Store PDF embeddings for vector search
- Fast similarity search across thousands of historical PDFs
Phase 6: Streamlit App
Pages:
- Live Analysis: Real-time PDF extraction and visualization
- Historical: Browse past PDFs and patterns
- Predictions: Track accuracy of market expectations
- About: Documentation and explanation
Features:
- Ticker selection
- Expiration date selection
- Analysis mode selection
- Export to CSV/PNG
- Dark/light theme toggle
Phase 7: Deployment
Docker:
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["streamlit", "run", "app/streamlit_app.py"]
HuggingFace Spaces:
- Free hosting for ML demos
- Automatic builds from Git
- Environment variables for API keys
Technical Challenges & Solutions
Challenge 1: Noisy Numerical Derivatives
Problem: βΒ²C/βKΒ² amplifies noise in option prices Solution:
- Use SABR to interpolate IV (creates smooth curve)
- Apply Savitzky-Golay filter (polynomial smoothing)
- Use dense strike grid (200+ points)
Challenge 2: Data Source Reliability
Problem: APIs can be down or rate-limited Solution:
- Dual data sources (OpenBB + yfinance)
- Automatic fallback in DataManager
- File-based caching (15min TTL)
Challenge 3: SABR Calibration Failures
Problem: Sometimes fails to converge Solution:
- Fallback to cubic spline interpolation
- Graceful degradation (still produces PDF)
- Log warnings for debugging
Challenge 4: Ollama Availability
Problem: User may not have Ollama installed Solution:
- Check availability at runtime
- Provide rule-based interpretation fallback
- Fallback quality is surprisingly good (tested)
Performance Optimization
Current Performance
Bottlenecks:
- API calls (1-3 seconds) β Mitigated by caching
- SABR calibration (~0.3 seconds) β Acceptable
- Pattern matching (~0.5 seconds) β Will improve with indexing
Future Optimizations:
- Database indexing: Speed up historical queries
- Caching layer: Redis for distributed caching
- Parallel processing: Multiple expirations in parallel
- Incremental updates: Only recalculate changed data
Conclusion
This project demonstrates:
- Advanced quantitative finance: Option-implied probabilities
- Robust software engineering: Error handling, fallbacks, testing
- Modern ML/AI: Local LLM integration with graceful degradation
- Interactive visualization: 3D graphics with full interactivity
- Production-ready code: Type hints, documentation, comprehensive tests
Current Status: 57% complete (4/7 phases) Next Milestone: Phase 5 - Database & History Timeline: Ready for deployment after Phase 7
Last Updated: 2025-12-01 Author: Built with Claude Code License: MIT