Spaces:
Sleeping
Sleeping
File size: 22,902 Bytes
8e1643b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 | # Option-Implied PDF Visualizer - Complete Project Explanation
## Executive Summary
This project extracts market expectations about future stock prices from options markets and presents them as intuitive 3D visualizations with AI-powered interpretations.
**Status**: β
Phase 8 Complete (100% - Production Ready)
**Last Updated**: 2025-12-08
**Repository Type**: Solo project with AI assistance (Claude Code)
**Live Interfaces**: Streamlit (port 8501) + React SPA (port 5173 + FastAPI backend 8000)
---
## Table of Contents
1. [Non-Technical Explanation](#non-technical-explanation)
2. [Technical Explanation](#technical-explanation)
3. [Architecture Overview](#architecture-overview)
4. [Mathematical Foundation](#mathematical-foundation)
5. [Implementation Details](#implementation-details)
6. [Key Algorithms](#key-algorithms)
7. [Data Flow](#data-flow)
8. [Testing Strategy](#testing-strategy)
9. [Future Enhancements](#future-enhancements)
---
## Non-Technical Explanation
### What Problem Does This Solve?
When traders buy and sell options, they're essentially placing bets on where they think a stock price will go. These bets contain valuable information about the market's collective expectations. This tool extracts that hidden information and makes it visible.
### What Does It Do?
Imagine you could see a 3D landscape showing:
- **X-axis**: Different possible stock prices (strikes)
- **Y-axis**: Time into the future (days to expiration)
- **Z-axis**: How likely each price is (probability)
The tool creates this landscape and then uses AI to explain what it means in plain English.
### Why Is This Useful?
**For Traders**: Understand where the market expects prices to move and how much uncertainty exists.
**For Risk Managers**: Quantify tail risk and see probability distributions.
**For Researchers**: Study historical probability distributions and prediction accuracy.
**For Students**: Learn derivatives pricing and market microstructure.
### Real-World Example
Imagine SPY is trading at $450. The tool might show:
- 68% chance price stays between $436-$467 in 30 days
- 22% chance of +5% move (bullish tilt)
- 18% chance of -5% move
- Negative skewness (-0.15) suggests slight downside bias
- The AI explains: "Market is pricing in moderate uncertainty with slight bearish lean, similar to pre-Fed-announcement patterns in October 2023."
---
## Technical Explanation
### Core Concept: Risk-Neutral Probability Density
Options markets implicitly encode a **risk-neutral probability distribution** for future asset prices. The Breeden-Litzenberger (1978) formula allows us to extract this distribution by taking the second derivative of call option prices with respect to strike:
```
f(K) = e^(rT) Γ βΒ²C/βKΒ²
```
Where:
- `f(K)` = risk-neutral probability density at strike K
- `C` = call option price as a function of strike
- `r` = risk-free rate
- `T` = time to expiration
- `e^(rT)` = discount factor
### Why This Matters
**Traditional Approach**: Implied volatility gives a single number (expected magnitude of moves)
**This Approach**: Full probability distribution showing:
- Mean and variance (expected price and uncertainty)
- Skewness (directional bias)
- Kurtosis (fat tails / crash risk)
- Specific probabilities for any price level
### Technical Stack
**Backend**:
- Python 3.11+ (type hints, modern syntax)
- NumPy/SciPy (numerical computation)
- Pandas (data manipulation)
**Data Sources**:
- OpenBB Terminal (primary option chain data)
- yfinance (backup data source)
- FRED API (risk-free rate)
**Models**:
- SABR (Stochastic Alpha Beta Rho) volatility model
- Cubic spline interpolation (fallback)
- Cosine similarity for pattern matching
**AI**:
- Ollama (local LLM inference)
- Qwen3-7B (7 billion parameter language model)
- Intelligent fallback for offline operation
**Visualization**:
- Plotly (interactive 3D graphics)
- Dark theme with professional styling
**Database** (Phase 5):
- SQLite (time series storage)
- ChromaDB (vector search for patterns)
**Frontend** (Phase 6):
- Streamlit (Python web framework)
**Deployment** (Phase 7):
- Docker containerization
- HuggingFace Spaces hosting
---
## Architecture Overview
### Layer 1: Data Acquisition
```
βββββββββββββββββββββββββββββββββββββββββββ
β DataManager (Facade) β
β βββββββββββββββββββββββββββββββββββ β
β β OpenBB Client (Primary) β β
β β YFinance Client (Backup) β β
β β FRED Client (Risk-Free Rate) β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β Cache Layer (15min TTL) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
```
**Design Pattern**: Facade pattern with automatic fallback
**Resilience**: Dual data sources, file-based caching
**Performance**: Minimizes API calls via intelligent caching
### Layer 2: Mathematical Core
```
βββββββββββββββββββββββββββββββββββββββββββ
β BreedenlitzenbergPDF β
β βββββββββββββββββββββββββββββββββββ β
β β 1. SABR Calibration β β
β β 2. IV Interpolation β β
β β 3. Call Price Calculation β β
β β 4. Numerical Differentiation β β
β β 5. PDF Normalization β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β PDFStatistics Calculator β β
β β (mean, std, skew, kurtosis) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
```
**Design Pattern**: Pipeline pattern
**Numerical Methods**: Savitzky-Golay smoothing, gradient-based derivatives
**Robustness**: Edge case handling, non-negativity constraints
### Layer 3: AI Interpretation
```
βββββββββββββββββββββββββββββββββββββββββββ
β PDFInterpreter β
β βββββββββββββββββββββββββββββββββββ β
β β PDFPatternMatcher β β
β β (cosine similarity) β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β Ollama Client β β
β β (with fallback) β β
β βββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββ β
β β Prompt Templates β β
β β (4 analysis modes) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
```
**Design Pattern**: Strategy pattern (4 interpretation modes)
**AI Architecture**: Local LLM with graceful degradation
**Pattern Matching**: 70% shape similarity + 30% statistical similarity
### Layer 4: Visualization
```
βββββββββββββββββββββββββββββββββββββββββββ
β Plotly Visualization Suite β
β βββββββββββββββββββββββββββββββββββ β
β β 2D PDF Plots β β
β β PDF Comparison Plots β β
β β CDF Plots β β
β βββββββββββββββββββββββββββββββββββ β
β + β
β βββββββββββββββββββββββββββββββββββ β
β β 3D Surface (StrikeΓTimeΓProb) β β
β β Heatmap (2D alternative) β β
β β Wireframe (skeleton view) β β
β βββββββββββββββββββββββββββββββββββ β
β + β
β βββββββββββββββββββββββββββββββββββ β
β β Probability Tables β β
β β (color-coded, interactive) β β
β βββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
```
**Design Pattern**: Factory pattern for plot creation
**Theming**: Dark theme with consistent styling
**Interactivity**: Full Plotly features (hover, zoom, rotate)
---
## Mathematical Foundation
### 1. Breeden-Litzenberger Formula (Core Algorithm)
**Derivation**:
The price of a European call option can be expressed as:
```
C(K) = e^(-rT) Γ β«[K to β] (S - K) Γ f(S) dS
```
Taking the first derivative:
```
βC/βK = -e^(-rT) Γ β«[K to β] f(S) dS = -e^(-rT) Γ P(S > K)
```
Taking the second derivative:
```
βΒ²C/βKΒ² = e^(-rT) Γ f(K)
```
Rearranging:
```
f(K) = e^(rT) Γ βΒ²C/βKΒ²
```
**Implementation Challenges**:
1. Need smooth call price function β Use SABR interpolation
2. Numerical differentiation is noisy β Apply Savitzky-Golay filter
3. Can produce negative densities β Enforce non-negativity
4. Must integrate to 1 β Normalize using trapezoid rule
### 2. SABR Volatility Model
**Model Equations**:
```
dF = Ξ± Γ F^Ξ² Γ dWβ
dΞ± = Ξ½ Γ Ξ± Γ dWβ
dWβ Γ dWβ = Ο dt
```
**Parameters**:
- `Ξ±` = volatility of volatility
- `Ξ²` = elasticity (typically 0.5 for equities)
- `Ο` = correlation between price and volatility
- `Ξ½` = vol-of-vol
**Calibration**: Minimize sum of squared errors between market IV and model IV using Nelder-Mead optimization.
**Why SABR?**: Captures volatility smile/skew better than Black-Scholes, industry standard for equity options.
### 3. Pattern Matching Algorithm
**Similarity Score**:
```
similarity = 0.7 Γ shape_similarity + 0.3 Γ stats_similarity
```
**Shape Similarity** (Cosine):
```
cos(ΞΈ) = (A Β· B) / (||A|| Γ ||B||)
```
Where A and B are PDF vectors.
**Stats Similarity**:
```
sim = 1 - mean_abs_diff([skew, kurtosis, implied_move])
```
**Why This Works**: Combines global shape (cosine) with specific moments (stats) to find truly similar distributions.
---
## Implementation Details
### Phase 1: Foundation (100% Complete)
**Files Created**:
- `src/data/openbb_client.py` (169 lines)
- `src/data/yfinance_client.py` (142 lines)
- `src/data/fred_client.py` (89 lines)
- `src/data/cache.py` (98 lines)
- `src/data/data_manager.py` (186 lines)
**Key Features**:
- Automatic fallback between data sources
- File-based caching with TTL
- Comprehensive error handling
- Type hints throughout
**Testing**: 15 tests covering normal operation and edge cases
### Phase 2: Core Math (100% Complete)
**Files Created**:
- `src/core/breeden_litz.py` (287 lines) β **CORE ALGORITHM**
- `src/core/sabr.py` (201 lines)
- `src/core/statistics.py` (234 lines)
**Key Features**:
- SABR calibration with fallback to cubic spline
- Breeden-Litzenberger with numerical smoothing
- Complete PDF statistics (13 metrics)
- CDF and probability queries
**Testing**: 22 tests covering calculation accuracy and edge cases
### Phase 3: Visualization (100% Complete)
**Files Created**:
- `src/visualization/themes.py` (73 lines)
- `src/visualization/pdf_2d.py` (312 lines)
- `src/visualization/surface_3d.py` (264 lines)
- `src/visualization/probability_table.py` (189 lines)
**Key Features**:
- Dark theme configuration system
- 4 types of 2D plots
- 3 types of 3D visualizations
- 4 types of probability tables
- Full interactivity
**Testing**: 18 tests covering plot generation and formatting
### Phase 4: AI Interpretation (100% Complete)
**Files Created**:
- `src/ai/prompts.py` (198 lines)
- `src/ai/interpreter.py` (245 lines)
- `src/core/patterns.py` (276 lines)
**Key Features**:
- 4 interpretation modes (standard, conservative, aggressive, educational)
- Pattern matching with cosine similarity
- Ollama client with graceful fallback
- Rule-based interpretation system
**Testing**: 21 tests covering AI components and pattern matching
### Phases 5-7: Pending
**Phase 5**: SQLite schema, SQLAlchemy models, prediction tracking
**Phase 6**: Streamlit UI with 4 pages
**Phase 7**: Docker, testing, deployment to HuggingFace Spaces
---
## Key Algorithms
### Algorithm 1: PDF Extraction
```python
def _breeden_litzenberger(self, strikes, call_prices, r, T):
"""
Extract risk-neutral PDF from call prices.
f(K) = e^(rT) Γ βΒ²C/βKΒ²
"""
# Calculate gradients (numerical derivatives)
dK = np.gradient(strikes)
dC_dK = np.gradient(call_prices, strikes)
d2C_dK2 = np.gradient(dC_dK, strikes)
# Apply Breeden-Litzenberger formula
pdf = np.exp(r * T) * d2C_dK2
# Enforce non-negativity
pdf = np.maximum(pdf, 0)
# Normalize to integrate to 1
pdf = pdf / np.trapz(pdf, strikes)
return pdf
```
**Complexity**: O(n) where n = number of strikes
**Accuracy**: Depends on strike density and IV smoothness
### Algorithm 2: SABR Calibration
```python
def calibrate(self, strikes, implied_vols, forward, tau):
"""
Calibrate SABR to market IV smile.
"""
def objective(params):
alpha, rho, nu = params
model_vols = self._sabr_formula(strikes, forward, alpha, rho, nu, self.beta, tau)
return np.sum((model_vols - implied_vols) ** 2)
initial_guess = [0.2, -0.3, 0.4]
bounds = [(0.001, 2.0), (-0.999, 0.999), (0.001, 2.0)]
result = minimize(
objective,
initial_guess,
method='Nelder-Mead',
bounds=bounds,
options={'maxiter': 1000}
)
self.alpha, self.rho, self.nu = result.x
return result
```
**Complexity**: O(m Γ n) where m = iterations, n = strikes
**Convergence**: Typically <100 iterations for equity options
### Algorithm 3: Pattern Matching
```python
def _calculate_similarity(self, current_pdf, current_strikes, hist_pdf, hist_strikes, current_stats, hist_stats):
"""
Calculate combined similarity score.
"""
# Shape similarity (cosine)
shape_sim = self._pdf_shape_similarity(
current_pdf, current_strikes,
hist_pdf, hist_strikes
)
# Statistical similarity
stats_sim = self._stats_similarity(current_stats, hist_stats)
# Weighted combination
return 0.7 * shape_sim + 0.3 * stats_sim
```
**Complexity**: O(n) for interpolation + O(n) for dot product
**Accuracy**: Validated against synthetic test cases
---
## Data Flow
### Complete Pipeline (End-to-End)
```
1. User Request
β
2. DataManager.get_options("SPY")
ββ> OpenBB Client (try primary)
ββ> YFinance Client (fallback if needed)
β
3. FREDClient.get_risk_free_rate()
β
4. SABRModel.calibrate(strikes, IVs)
ββ> Optimize Ξ±, Ο, Ξ½ parameters
ββ> Fallback to CubicSpline if fails
β
5. SABRModel.interpolate_iv(fine_strikes)
β
6. BreedenlitzenbergPDF.calculate_pdf(strikes, IVs, spot, r, T)
ββ> Calculate call prices
ββ> Apply Breeden-Litzenberger formula
ββ> Smooth with Savitzky-Golay
ββ> Normalize to integrate to 1
β
7. PDFStatistics.calculate_all_stats()
ββ> Mean, std, skewness, kurtosis
ββ> Implied move, tail probabilities
ββ> Confidence intervals
β
8. PDFPatternMatcher.find_similar_patterns()
ββ> Load historical PDFs
ββ> Calculate similarity scores
ββ> Return top matches
β
9. PDFInterpreter.interpret_single_pdf()
ββ> Format prompt with stats & patterns
ββ> Try Ollama.generate()
ββ> Fallback to rule-based if Ollama unavailable
β
10. Visualization Functions
ββ> create_3d_surface()
ββ> plot_pdf_2d()
ββ> create_probability_table()
ββ> Return interactive Plotly figures
β
11. Return Results to User
ββ> PDF values & strikes
ββ> All statistics
ββ> Pattern matches
ββ> AI interpretation
ββ> Plotly figures
```
### Data Flow Timing (Approximate)
- Data fetch: ~1-3 seconds (or instant if cached)
- SABR calibration: ~0.1-0.5 seconds
- PDF calculation: ~0.05 seconds
- Statistics: ~0.01 seconds
- Pattern matching: ~0.1-1 second (depends on history size)
- AI interpretation: ~2-5 seconds (or ~0.01s for fallback)
- Visualization: ~0.1-0.5 seconds
**Total**: 3-10 seconds for complete analysis
---
## Testing Strategy
### Unit Tests
**Coverage**: High coverage across all modules
**Approach**:
- Test normal operation
- Test edge cases (empty data, extreme values)
- Test error conditions
- Test fallback mechanisms
**Example** (from `test_core_math.py`):
```python
def test_pdf_normalization():
"""Ensure PDF integrates to 1.0."""
pdf_calc = BreedenlitzenbergPDF()
pdf = pdf_calc.calculate_pdf(...)
integral = trapz(pdf, strikes)
assert abs(integral - 1.0) < 1e-6
```
### Integration Tests
**Example** (from `test_ai_components.py`):
```python
def test_integration_ai_workflow():
"""Test complete AI workflow end-to-end."""
# 1. Create PDF
# 2. Calculate statistics
# 3. Find patterns
# 4. Generate interpretation
# All steps must complete successfully
```
### Fallback Tests
**Critical**: Ensure system works when external dependencies fail
**Tests**:
- OpenBB fails β YFinance succeeds
- SABR fails β Cubic spline succeeds
- Ollama unavailable β Rule-based interpretation succeeds
---
## Future Enhancements
### Phase 5: Database & History
**Schema Design**:
```sql
CREATE TABLE pdf_snapshots (
id INTEGER PRIMARY KEY,
timestamp DATETIME,
ticker TEXT,
days_to_expiry INTEGER,
spot_price REAL,
strikes BLOB,
pdf_values BLOB,
stats JSON,
interpretation TEXT,
model_used TEXT
);
CREATE TABLE predictions (
id INTEGER PRIMARY KEY,
forecast_date DATETIME,
target_date DATETIME,
predicted_prob REAL,
condition TEXT,
target_level REAL,
actual_outcome BOOLEAN,
actual_price REAL,
evaluation_date DATETIME
);
CREATE TABLE pattern_matches (
id INTEGER PRIMARY KEY,
current_snapshot_id INTEGER,
historical_snapshot_id INTEGER,
similarity_score REAL,
shape_similarity REAL,
stats_similarity REAL
);
```
**ChromaDB Integration**:
- Store PDF embeddings for vector search
- Fast similarity search across thousands of historical PDFs
### Phase 6: Streamlit App
**Pages**:
1. **Live Analysis**: Real-time PDF extraction and visualization
2. **Historical**: Browse past PDFs and patterns
3. **Predictions**: Track accuracy of market expectations
4. **About**: Documentation and explanation
**Features**:
- Ticker selection
- Expiration date selection
- Analysis mode selection
- Export to CSV/PNG
- Dark/light theme toggle
### Phase 7: Deployment
**Docker**:
```dockerfile
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["streamlit", "run", "app/streamlit_app.py"]
```
**HuggingFace Spaces**:
- Free hosting for ML demos
- Automatic builds from Git
- Environment variables for API keys
---
## Technical Challenges & Solutions
### Challenge 1: Noisy Numerical Derivatives
**Problem**: βΒ²C/βKΒ² amplifies noise in option prices
**Solution**:
1. Use SABR to interpolate IV (creates smooth curve)
2. Apply Savitzky-Golay filter (polynomial smoothing)
3. Use dense strike grid (200+ points)
### Challenge 2: Data Source Reliability
**Problem**: APIs can be down or rate-limited
**Solution**:
1. Dual data sources (OpenBB + yfinance)
2. Automatic fallback in DataManager
3. File-based caching (15min TTL)
### Challenge 3: SABR Calibration Failures
**Problem**: Sometimes fails to converge
**Solution**:
1. Fallback to cubic spline interpolation
2. Graceful degradation (still produces PDF)
3. Log warnings for debugging
### Challenge 4: Ollama Availability
**Problem**: User may not have Ollama installed
**Solution**:
1. Check availability at runtime
2. Provide rule-based interpretation fallback
3. Fallback quality is surprisingly good (tested)
---
## Performance Optimization
### Current Performance
**Bottlenecks**:
1. API calls (1-3 seconds) β Mitigated by caching
2. SABR calibration (~0.3 seconds) β Acceptable
3. Pattern matching (~0.5 seconds) β Will improve with indexing
**Future Optimizations**:
1. **Database indexing**: Speed up historical queries
2. **Caching layer**: Redis for distributed caching
3. **Parallel processing**: Multiple expirations in parallel
4. **Incremental updates**: Only recalculate changed data
---
## Conclusion
This project demonstrates:
- **Advanced quantitative finance**: Option-implied probabilities
- **Robust software engineering**: Error handling, fallbacks, testing
- **Modern ML/AI**: Local LLM integration with graceful degradation
- **Interactive visualization**: 3D graphics with full interactivity
- **Production-ready code**: Type hints, documentation, comprehensive tests
**Current Status**: 57% complete (4/7 phases)
**Next Milestone**: Phase 5 - Database & History
**Timeline**: Ready for deployment after Phase 7
---
**Last Updated**: 2025-12-01
**Author**: Built with Claude Code
**License**: MIT
|