File size: 22,902 Bytes
8e1643b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
# Option-Implied PDF Visualizer - Complete Project Explanation

## Executive Summary

This project extracts market expectations about future stock prices from options markets and presents them as intuitive 3D visualizations with AI-powered interpretations.

**Status**: βœ… Phase 8 Complete (100% - Production Ready)
**Last Updated**: 2025-12-08
**Repository Type**: Solo project with AI assistance (Claude Code)
**Live Interfaces**: Streamlit (port 8501) + React SPA (port 5173 + FastAPI backend 8000)

---

## Table of Contents

1. [Non-Technical Explanation](#non-technical-explanation)
2. [Technical Explanation](#technical-explanation)
3. [Architecture Overview](#architecture-overview)
4. [Mathematical Foundation](#mathematical-foundation)
5. [Implementation Details](#implementation-details)
6. [Key Algorithms](#key-algorithms)
7. [Data Flow](#data-flow)
8. [Testing Strategy](#testing-strategy)
9. [Future Enhancements](#future-enhancements)

---

## Non-Technical Explanation

### What Problem Does This Solve?

When traders buy and sell options, they're essentially placing bets on where they think a stock price will go. These bets contain valuable information about the market's collective expectations. This tool extracts that hidden information and makes it visible.

### What Does It Do?

Imagine you could see a 3D landscape showing:
- **X-axis**: Different possible stock prices (strikes)
- **Y-axis**: Time into the future (days to expiration)
- **Z-axis**: How likely each price is (probability)

The tool creates this landscape and then uses AI to explain what it means in plain English.

### Why Is This Useful?

**For Traders**: Understand where the market expects prices to move and how much uncertainty exists.

**For Risk Managers**: Quantify tail risk and see probability distributions.

**For Researchers**: Study historical probability distributions and prediction accuracy.

**For Students**: Learn derivatives pricing and market microstructure.

### Real-World Example

Imagine SPY is trading at $450. The tool might show:
- 68% chance price stays between $436-$467 in 30 days
- 22% chance of +5% move (bullish tilt)
- 18% chance of -5% move
- Negative skewness (-0.15) suggests slight downside bias
- The AI explains: "Market is pricing in moderate uncertainty with slight bearish lean, similar to pre-Fed-announcement patterns in October 2023."

---

## Technical Explanation

### Core Concept: Risk-Neutral Probability Density

Options markets implicitly encode a **risk-neutral probability distribution** for future asset prices. The Breeden-Litzenberger (1978) formula allows us to extract this distribution by taking the second derivative of call option prices with respect to strike:

```
f(K) = e^(rT) Γ— βˆ‚Β²C/βˆ‚KΒ²
```

Where:
- `f(K)` = risk-neutral probability density at strike K
- `C` = call option price as a function of strike
- `r` = risk-free rate
- `T` = time to expiration
- `e^(rT)` = discount factor

### Why This Matters

**Traditional Approach**: Implied volatility gives a single number (expected magnitude of moves)

**This Approach**: Full probability distribution showing:
- Mean and variance (expected price and uncertainty)
- Skewness (directional bias)
- Kurtosis (fat tails / crash risk)
- Specific probabilities for any price level

### Technical Stack

**Backend**:
- Python 3.11+ (type hints, modern syntax)
- NumPy/SciPy (numerical computation)
- Pandas (data manipulation)

**Data Sources**:
- OpenBB Terminal (primary option chain data)
- yfinance (backup data source)
- FRED API (risk-free rate)

**Models**:
- SABR (Stochastic Alpha Beta Rho) volatility model
- Cubic spline interpolation (fallback)
- Cosine similarity for pattern matching

**AI**:
- Ollama (local LLM inference)
- Qwen3-7B (7 billion parameter language model)
- Intelligent fallback for offline operation

**Visualization**:
- Plotly (interactive 3D graphics)
- Dark theme with professional styling

**Database** (Phase 5):
- SQLite (time series storage)
- ChromaDB (vector search for patterns)

**Frontend** (Phase 6):
- Streamlit (Python web framework)

**Deployment** (Phase 7):
- Docker containerization
- HuggingFace Spaces hosting

---

## Architecture Overview

### Layer 1: Data Acquisition

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         DataManager (Facade)            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  OpenBB Client (Primary)        β”‚   β”‚
β”‚  β”‚  YFinance Client (Backup)       β”‚   β”‚
β”‚  β”‚  FRED Client (Risk-Free Rate)   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Cache Layer (15min TTL)        β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Design Pattern**: Facade pattern with automatic fallback
**Resilience**: Dual data sources, file-based caching
**Performance**: Minimizes API calls via intelligent caching

### Layer 2: Mathematical Core

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     BreedenlitzenbergPDF                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  1. SABR Calibration            β”‚   β”‚
β”‚  β”‚  2. IV Interpolation            β”‚   β”‚
β”‚  β”‚  3. Call Price Calculation      β”‚   β”‚
β”‚  β”‚  4. Numerical Differentiation   β”‚   β”‚
β”‚  β”‚  5. PDF Normalization           β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  PDFStatistics Calculator       β”‚   β”‚
β”‚  β”‚  (mean, std, skew, kurtosis)    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Design Pattern**: Pipeline pattern
**Numerical Methods**: Savitzky-Golay smoothing, gradient-based derivatives
**Robustness**: Edge case handling, non-negativity constraints

### Layer 3: AI Interpretation

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        PDFInterpreter                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  PDFPatternMatcher              β”‚   β”‚
β”‚  β”‚  (cosine similarity)            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Ollama Client                  β”‚   β”‚
β”‚  β”‚  (with fallback)                β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            ↓                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Prompt Templates               β”‚   β”‚
β”‚  β”‚  (4 analysis modes)             β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Design Pattern**: Strategy pattern (4 interpretation modes)
**AI Architecture**: Local LLM with graceful degradation
**Pattern Matching**: 70% shape similarity + 30% statistical similarity

### Layer 4: Visualization

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Plotly Visualization Suite         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  2D PDF Plots                   β”‚   β”‚
β”‚  β”‚  PDF Comparison Plots           β”‚   β”‚
β”‚  β”‚  CDF Plots                      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            +                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  3D Surface (StrikeΓ—TimeΓ—Prob)  β”‚   β”‚
β”‚  β”‚  Heatmap (2D alternative)       β”‚   β”‚
β”‚  β”‚  Wireframe (skeleton view)      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚            +                            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Probability Tables             β”‚   β”‚
β”‚  β”‚  (color-coded, interactive)     β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

**Design Pattern**: Factory pattern for plot creation
**Theming**: Dark theme with consistent styling
**Interactivity**: Full Plotly features (hover, zoom, rotate)

---

## Mathematical Foundation

### 1. Breeden-Litzenberger Formula (Core Algorithm)

**Derivation**:

The price of a European call option can be expressed as:
```
C(K) = e^(-rT) Γ— ∫[K to ∞] (S - K) Γ— f(S) dS
```

Taking the first derivative:
```
βˆ‚C/βˆ‚K = -e^(-rT) Γ— ∫[K to ∞] f(S) dS = -e^(-rT) Γ— P(S > K)
```

Taking the second derivative:
```
βˆ‚Β²C/βˆ‚KΒ² = e^(-rT) Γ— f(K)
```

Rearranging:
```
f(K) = e^(rT) Γ— βˆ‚Β²C/βˆ‚KΒ²
```

**Implementation Challenges**:
1. Need smooth call price function β†’ Use SABR interpolation
2. Numerical differentiation is noisy β†’ Apply Savitzky-Golay filter
3. Can produce negative densities β†’ Enforce non-negativity
4. Must integrate to 1 β†’ Normalize using trapezoid rule

### 2. SABR Volatility Model

**Model Equations**:
```
dF = Ξ± Γ— F^Ξ² Γ— dW₁
dΞ± = Ξ½ Γ— Ξ± Γ— dWβ‚‚
dW₁ Γ— dWβ‚‚ = ρ dt
```

**Parameters**:
- `Ξ±` = volatility of volatility
- `Ξ²` = elasticity (typically 0.5 for equities)
- `ρ` = correlation between price and volatility
- `Ξ½` = vol-of-vol

**Calibration**: Minimize sum of squared errors between market IV and model IV using Nelder-Mead optimization.

**Why SABR?**: Captures volatility smile/skew better than Black-Scholes, industry standard for equity options.

### 3. Pattern Matching Algorithm

**Similarity Score**:
```
similarity = 0.7 Γ— shape_similarity + 0.3 Γ— stats_similarity
```

**Shape Similarity** (Cosine):
```
cos(ΞΈ) = (A Β· B) / (||A|| Γ— ||B||)
```
Where A and B are PDF vectors.

**Stats Similarity**:
```
sim = 1 - mean_abs_diff([skew, kurtosis, implied_move])
```

**Why This Works**: Combines global shape (cosine) with specific moments (stats) to find truly similar distributions.

---

## Implementation Details

### Phase 1: Foundation (100% Complete)

**Files Created**:
- `src/data/openbb_client.py` (169 lines)
- `src/data/yfinance_client.py` (142 lines)
- `src/data/fred_client.py` (89 lines)
- `src/data/cache.py` (98 lines)
- `src/data/data_manager.py` (186 lines)

**Key Features**:
- Automatic fallback between data sources
- File-based caching with TTL
- Comprehensive error handling
- Type hints throughout

**Testing**: 15 tests covering normal operation and edge cases

### Phase 2: Core Math (100% Complete)

**Files Created**:
- `src/core/breeden_litz.py` (287 lines) ⭐ **CORE ALGORITHM**
- `src/core/sabr.py` (201 lines)
- `src/core/statistics.py` (234 lines)

**Key Features**:
- SABR calibration with fallback to cubic spline
- Breeden-Litzenberger with numerical smoothing
- Complete PDF statistics (13 metrics)
- CDF and probability queries

**Testing**: 22 tests covering calculation accuracy and edge cases

### Phase 3: Visualization (100% Complete)

**Files Created**:
- `src/visualization/themes.py` (73 lines)
- `src/visualization/pdf_2d.py` (312 lines)
- `src/visualization/surface_3d.py` (264 lines)
- `src/visualization/probability_table.py` (189 lines)

**Key Features**:
- Dark theme configuration system
- 4 types of 2D plots
- 3 types of 3D visualizations
- 4 types of probability tables
- Full interactivity

**Testing**: 18 tests covering plot generation and formatting

### Phase 4: AI Interpretation (100% Complete)

**Files Created**:
- `src/ai/prompts.py` (198 lines)
- `src/ai/interpreter.py` (245 lines)
- `src/core/patterns.py` (276 lines)

**Key Features**:
- 4 interpretation modes (standard, conservative, aggressive, educational)
- Pattern matching with cosine similarity
- Ollama client with graceful fallback
- Rule-based interpretation system

**Testing**: 21 tests covering AI components and pattern matching

### Phases 5-7: Pending

**Phase 5**: SQLite schema, SQLAlchemy models, prediction tracking
**Phase 6**: Streamlit UI with 4 pages
**Phase 7**: Docker, testing, deployment to HuggingFace Spaces

---

## Key Algorithms

### Algorithm 1: PDF Extraction

```python
def _breeden_litzenberger(self, strikes, call_prices, r, T):
    """
    Extract risk-neutral PDF from call prices.

    f(K) = e^(rT) Γ— βˆ‚Β²C/βˆ‚KΒ²
    """
    # Calculate gradients (numerical derivatives)
    dK = np.gradient(strikes)
    dC_dK = np.gradient(call_prices, strikes)
    d2C_dK2 = np.gradient(dC_dK, strikes)

    # Apply Breeden-Litzenberger formula
    pdf = np.exp(r * T) * d2C_dK2

    # Enforce non-negativity
    pdf = np.maximum(pdf, 0)

    # Normalize to integrate to 1
    pdf = pdf / np.trapz(pdf, strikes)

    return pdf
```

**Complexity**: O(n) where n = number of strikes
**Accuracy**: Depends on strike density and IV smoothness

### Algorithm 2: SABR Calibration

```python
def calibrate(self, strikes, implied_vols, forward, tau):
    """
    Calibrate SABR to market IV smile.
    """
    def objective(params):
        alpha, rho, nu = params
        model_vols = self._sabr_formula(strikes, forward, alpha, rho, nu, self.beta, tau)
        return np.sum((model_vols - implied_vols) ** 2)

    initial_guess = [0.2, -0.3, 0.4]
    bounds = [(0.001, 2.0), (-0.999, 0.999), (0.001, 2.0)]

    result = minimize(
        objective,
        initial_guess,
        method='Nelder-Mead',
        bounds=bounds,
        options={'maxiter': 1000}
    )

    self.alpha, self.rho, self.nu = result.x
    return result
```

**Complexity**: O(m Γ— n) where m = iterations, n = strikes
**Convergence**: Typically <100 iterations for equity options

### Algorithm 3: Pattern Matching

```python
def _calculate_similarity(self, current_pdf, current_strikes, hist_pdf, hist_strikes, current_stats, hist_stats):
    """
    Calculate combined similarity score.
    """
    # Shape similarity (cosine)
    shape_sim = self._pdf_shape_similarity(
        current_pdf, current_strikes,
        hist_pdf, hist_strikes
    )

    # Statistical similarity
    stats_sim = self._stats_similarity(current_stats, hist_stats)

    # Weighted combination
    return 0.7 * shape_sim + 0.3 * stats_sim
```

**Complexity**: O(n) for interpolation + O(n) for dot product
**Accuracy**: Validated against synthetic test cases

---

## Data Flow

### Complete Pipeline (End-to-End)

```
1. User Request
   ↓
2. DataManager.get_options("SPY")
   β”œβ”€> OpenBB Client (try primary)
   └─> YFinance Client (fallback if needed)
   ↓
3. FREDClient.get_risk_free_rate()
   ↓
4. SABRModel.calibrate(strikes, IVs)
   β”œβ”€> Optimize Ξ±, ρ, Ξ½ parameters
   └─> Fallback to CubicSpline if fails
   ↓
5. SABRModel.interpolate_iv(fine_strikes)
   ↓
6. BreedenlitzenbergPDF.calculate_pdf(strikes, IVs, spot, r, T)
   β”œβ”€> Calculate call prices
   β”œβ”€> Apply Breeden-Litzenberger formula
   β”œβ”€> Smooth with Savitzky-Golay
   └─> Normalize to integrate to 1
   ↓
7. PDFStatistics.calculate_all_stats()
   β”œβ”€> Mean, std, skewness, kurtosis
   β”œβ”€> Implied move, tail probabilities
   └─> Confidence intervals
   ↓
8. PDFPatternMatcher.find_similar_patterns()
   β”œβ”€> Load historical PDFs
   β”œβ”€> Calculate similarity scores
   └─> Return top matches
   ↓
9. PDFInterpreter.interpret_single_pdf()
   β”œβ”€> Format prompt with stats & patterns
   β”œβ”€> Try Ollama.generate()
   └─> Fallback to rule-based if Ollama unavailable
   ↓
10. Visualization Functions
    β”œβ”€> create_3d_surface()
    β”œβ”€> plot_pdf_2d()
    β”œβ”€> create_probability_table()
    └─> Return interactive Plotly figures
    ↓
11. Return Results to User
    β”œβ”€> PDF values & strikes
    β”œβ”€> All statistics
    β”œβ”€> Pattern matches
    β”œβ”€> AI interpretation
    └─> Plotly figures
```

### Data Flow Timing (Approximate)

- Data fetch: ~1-3 seconds (or instant if cached)
- SABR calibration: ~0.1-0.5 seconds
- PDF calculation: ~0.05 seconds
- Statistics: ~0.01 seconds
- Pattern matching: ~0.1-1 second (depends on history size)
- AI interpretation: ~2-5 seconds (or ~0.01s for fallback)
- Visualization: ~0.1-0.5 seconds

**Total**: 3-10 seconds for complete analysis

---

## Testing Strategy

### Unit Tests

**Coverage**: High coverage across all modules

**Approach**:
- Test normal operation
- Test edge cases (empty data, extreme values)
- Test error conditions
- Test fallback mechanisms

**Example** (from `test_core_math.py`):
```python
def test_pdf_normalization():
    """Ensure PDF integrates to 1.0."""
    pdf_calc = BreedenlitzenbergPDF()
    pdf = pdf_calc.calculate_pdf(...)
    integral = trapz(pdf, strikes)
    assert abs(integral - 1.0) < 1e-6
```

### Integration Tests

**Example** (from `test_ai_components.py`):
```python
def test_integration_ai_workflow():
    """Test complete AI workflow end-to-end."""
    # 1. Create PDF
    # 2. Calculate statistics
    # 3. Find patterns
    # 4. Generate interpretation
    # All steps must complete successfully
```

### Fallback Tests

**Critical**: Ensure system works when external dependencies fail

**Tests**:
- OpenBB fails β†’ YFinance succeeds
- SABR fails β†’ Cubic spline succeeds
- Ollama unavailable β†’ Rule-based interpretation succeeds

---

## Future Enhancements

### Phase 5: Database & History

**Schema Design**:
```sql
CREATE TABLE pdf_snapshots (
    id INTEGER PRIMARY KEY,
    timestamp DATETIME,
    ticker TEXT,
    days_to_expiry INTEGER,
    spot_price REAL,
    strikes BLOB,
    pdf_values BLOB,
    stats JSON,
    interpretation TEXT,
    model_used TEXT
);

CREATE TABLE predictions (
    id INTEGER PRIMARY KEY,
    forecast_date DATETIME,
    target_date DATETIME,
    predicted_prob REAL,
    condition TEXT,
    target_level REAL,
    actual_outcome BOOLEAN,
    actual_price REAL,
    evaluation_date DATETIME
);

CREATE TABLE pattern_matches (
    id INTEGER PRIMARY KEY,
    current_snapshot_id INTEGER,
    historical_snapshot_id INTEGER,
    similarity_score REAL,
    shape_similarity REAL,
    stats_similarity REAL
);
```

**ChromaDB Integration**:
- Store PDF embeddings for vector search
- Fast similarity search across thousands of historical PDFs

### Phase 6: Streamlit App

**Pages**:
1. **Live Analysis**: Real-time PDF extraction and visualization
2. **Historical**: Browse past PDFs and patterns
3. **Predictions**: Track accuracy of market expectations
4. **About**: Documentation and explanation

**Features**:
- Ticker selection
- Expiration date selection
- Analysis mode selection
- Export to CSV/PNG
- Dark/light theme toggle

### Phase 7: Deployment

**Docker**:
```dockerfile
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["streamlit", "run", "app/streamlit_app.py"]
```

**HuggingFace Spaces**:
- Free hosting for ML demos
- Automatic builds from Git
- Environment variables for API keys

---

## Technical Challenges & Solutions

### Challenge 1: Noisy Numerical Derivatives

**Problem**: βˆ‚Β²C/βˆ‚KΒ² amplifies noise in option prices
**Solution**:
1. Use SABR to interpolate IV (creates smooth curve)
2. Apply Savitzky-Golay filter (polynomial smoothing)
3. Use dense strike grid (200+ points)

### Challenge 2: Data Source Reliability

**Problem**: APIs can be down or rate-limited
**Solution**:
1. Dual data sources (OpenBB + yfinance)
2. Automatic fallback in DataManager
3. File-based caching (15min TTL)

### Challenge 3: SABR Calibration Failures

**Problem**: Sometimes fails to converge
**Solution**:
1. Fallback to cubic spline interpolation
2. Graceful degradation (still produces PDF)
3. Log warnings for debugging

### Challenge 4: Ollama Availability

**Problem**: User may not have Ollama installed
**Solution**:
1. Check availability at runtime
2. Provide rule-based interpretation fallback
3. Fallback quality is surprisingly good (tested)

---

## Performance Optimization

### Current Performance

**Bottlenecks**:
1. API calls (1-3 seconds) β†’ Mitigated by caching
2. SABR calibration (~0.3 seconds) β†’ Acceptable
3. Pattern matching (~0.5 seconds) β†’ Will improve with indexing

**Future Optimizations**:
1. **Database indexing**: Speed up historical queries
2. **Caching layer**: Redis for distributed caching
3. **Parallel processing**: Multiple expirations in parallel
4. **Incremental updates**: Only recalculate changed data

---

## Conclusion

This project demonstrates:
- **Advanced quantitative finance**: Option-implied probabilities
- **Robust software engineering**: Error handling, fallbacks, testing
- **Modern ML/AI**: Local LLM integration with graceful degradation
- **Interactive visualization**: 3D graphics with full interactivity
- **Production-ready code**: Type hints, documentation, comprehensive tests

**Current Status**: 57% complete (4/7 phases)
**Next Milestone**: Phase 5 - Database & History
**Timeline**: Ready for deployment after Phase 7

---

**Last Updated**: 2025-12-01
**Author**: Built with Claude Code
**License**: MIT