Update CLAUDE.md with v6 results
Browse files
CLAUDE.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
# CLAUDE.md — PatchTST + Wavelet S&P 500 Forecasting: Complete Agent Reference
|
| 2 |
|
| 3 |
> **Single-source-of-truth** for an AI agent working on this project.
|
| 4 |
-
> Updated: 2026-05-02 after completing v1→v2→v3→v4→v5.
|
| 5 |
|
| 6 |
---
|
| 7 |
|
|
@@ -9,7 +9,7 @@
|
|
| 9 |
|
| 10 |
**Goal**: Investigated PatchTST Transformer with wavelet denoising on S&P 500 data for next-day price forecasting.
|
| 11 |
|
| 12 |
-
**Final Conclusion**: S&P 500 daily direction is NOT predictable from price/technical indicators with honest methodology. The v3 walk-forward of 53.3% was entirely due to wavelet look-ahead bias. Without it: ≤50%.
|
| 13 |
|
| 14 |
**Repo**: `tbukuai/patchtst-wavelet-sp500-research` on HF Hub.
|
| 15 |
|
|
@@ -29,43 +29,17 @@ All model versions live in one repo with branches:
|
|
| 29 |
| `v3` | 13 | Global | 60.8%/53.3% WF | Best (biased) | `revision="v3"` |
|
| 30 |
| `v4` | 25 | Causal | 60.1%/~43% WF | Honest, overfits | `revision="v4"` |
|
| 31 |
| `v5` | 8 | None | 46.1% | Honest baseline | `revision="v5"` |
|
|
|
|
| 32 |
|
| 33 |
-
###
|
| 34 |
-
|
| 35 |
-
- Each version gets its own branch with README explaining results + findings
|
| 36 |
-
- Model files (`model.safetensors`, `config.json`, `scaler.pkl`, `metrics.json`) on each branch
|
| 37 |
-
- README on each branch documents: results, what was tested, why it worked/failed, lesson learned
|
| 38 |
-
- Use `revision="v3"` in `from_pretrained()` to load specific versions
|
| 39 |
-
- When a new version beats main, update main branch
|
| 40 |
-
|
| 41 |
-
### Creating a New Version
|
| 42 |
-
```bash
|
| 43 |
-
# 1. Create branch
|
| 44 |
-
# (via HF API or hf_repo_git create_branch)
|
| 45 |
-
|
| 46 |
-
# 2. Upload model files to branch
|
| 47 |
-
hf upload tbukuai/stock-patchtst-sp500 ./model_files --revision vN --commit-message "vN: description"
|
| 48 |
-
|
| 49 |
-
# 3. Add README to branch explaining results + findings
|
| 50 |
-
|
| 51 |
-
# 4. If it beats main, update main:
|
| 52 |
-
hf download tbukuai/stock-patchtst-sp500 --revision vN --local-dir ./best
|
| 53 |
-
hf upload tbukuai/stock-patchtst-sp500 ./best --commit-message "main: update to vN (new best)"
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
### Standalone Repos (legacy, may be deleted)
|
| 57 |
-
These still exist but branches are the canonical source:
|
| 58 |
-
- `tbukuai/stock-patchtst-sp500-v2`
|
| 59 |
-
- `tbukuai/stock-patchtst-sp500-v3`
|
| 60 |
-
- `tbukuai/stock-patchtst-sp500-v4`
|
| 61 |
-
- `tbukuai/stock-patchtst-sp500-v5`
|
| 62 |
|
| 63 |
---
|
| 64 |
|
| 65 |
## CRITICAL LESSONS LEARNED (READ FIRST)
|
| 66 |
|
| 67 |
### 1. DA(diff) is FAKE — always use DA(ctx)
|
| 68 |
-
- **DA(diff)** =
|
| 69 |
- **DA(ctx)** = `np.sign(pred - last_known_close) == np.sign(actual - last_known_close)` → **Real metric.**
|
| 70 |
|
| 71 |
### 2. Global wavelet denoising has LOOK-AHEAD BIAS
|
|
@@ -76,26 +50,24 @@ These still exist but branches are the canonical source:
|
|
| 76 |
### 3. MADL loss COLLAPSES
|
| 77 |
- v2: MADL at α=0.3 → predicts "up" 98.9% of the time. Use MSE.
|
| 78 |
|
| 79 |
-
### 4.
|
| 80 |
-
- v3 (13ch): 53.3% WF
|
| 81 |
- v4 (25ch): ~43% WF — **worse** with more features
|
| 82 |
-
- v5 (8ch): 46.1% single — fewer features didn't help
|
|
|
|
| 83 |
|
| 84 |
### 5. S&P 500 daily direction is NOT predictable from price history
|
| 85 |
-
- Confirmed across
|
| 86 |
- All honest methods give ≤50% DA (random)
|
| 87 |
-
-
|
| 88 |
|
| 89 |
### 6. Training practices that WORK
|
| 90 |
-
- **BATCH=256** on T4 GPU (
|
|
|
|
| 91 |
- **num_workers=2, persistent_workers=True** in DataLoaders
|
| 92 |
- **80/10/10 split**, early stopping patience=30
|
| 93 |
- **Walk-forward validation** (quarterly, 63-day windows) for honest metrics
|
| 94 |
-
- **Push model to Hub BEFORE walk-forward** to avoid data loss
|
| 95 |
- **5-minute checkpoints** to Hub for resumability on Colab
|
| 96 |
-
- **yf.Ticker().history()** not `yf.download()` (avoids MultiIndex bugs)
|
| 97 |
-
- **df.reset_index(drop=True)** before numpy array assignment (avoids alignment bugs)
|
| 98 |
-
- **Pure numpy for features** (avoid pd.Series.rolling() + DataFrame assignment issues)
|
| 99 |
|
| 100 |
---
|
| 101 |
|
|
@@ -108,102 +80,32 @@ These still exist but branches are the canonical source:
|
|
| 108 |
| v3 | **60.8%** | **53.3%** | Fixed metric, walk-forward | Best but biased by global wavelet |
|
| 109 |
| v4 | 60.1% | **~43%** | Causal wavelet, 25ch | More features = worse, bias was the signal |
|
| 110 |
| v5 | **46.1%** | — | No wavelet, 8ch | Below random — no signal exists |
|
|
|
|
| 111 |
|
| 112 |
---
|
| 113 |
|
| 114 |
## COLAB TRAINING PRACTICES
|
| 115 |
|
| 116 |
-
###
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
### GPU Optimization
|
| 122 |
-
- **BATCH=256** (not 32) — uses 8GB/15GB VRAM, 2-3× faster
|
| 123 |
-
- **num_workers=2, persistent_workers=True** — overlaps data loading
|
| 124 |
-
- **pin_memory=True** — faster CPU→GPU transfer
|
| 125 |
-
- PatchTST 600K params: ~3-4s/epoch at BATCH=256 on T4
|
| 126 |
|
| 127 |
-
### yfinance Compatibility (as of 2026)
|
| 128 |
```python
|
| 129 |
-
#
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
#
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
close = df['Close'].values.astype(float) # numpy array
|
| 141 |
-
# Compute features as numpy, assign back:
|
| 142 |
-
df['MA5'] = rolling_mean_np(close, 5) # custom numpy function
|
| 143 |
-
```
|
| 144 |
-
|
| 145 |
-
### Checkpoint Schema
|
| 146 |
-
```json
|
| 147 |
-
{
|
| 148 |
-
"step": "start|training|train_done|walkforward|wf_done|all_done",
|
| 149 |
-
"detail": "human-readable progress",
|
| 150 |
-
"metrics": {"da_ctx": 60.8, "sharpe_ctx": 4.32, "rmse": 26.84},
|
| 151 |
-
"wf_results": [{"window": 0, "da_ctx": 52.4, ...}],
|
| 152 |
-
"saved_at": "2026-05-01 03:00:00 UTC"
|
| 153 |
-
}
|
| 154 |
```
|
| 155 |
|
| 156 |
-
### Trackio
|
| 157 |
-
```python
|
| 158 |
-
trackio.init(project='patchtst-sp500-vN', name='vN-run', space_id='tbukuai/mlintern-ptv3stck')
|
| 159 |
-
# NOTE: uses `name=`, NOT `run_name=`
|
| 160 |
-
# Auth issues common — falls back to local logging, non-critical
|
| 161 |
-
```
|
| 162 |
-
|
| 163 |
-
---
|
| 164 |
-
|
| 165 |
-
## EVALUATION CODE
|
| 166 |
-
|
| 167 |
-
```python
|
| 168 |
-
def evaluate(model, test_dl, scaler, n_ch):
|
| 169 |
-
model.eval()
|
| 170 |
-
all_p, all_t, all_c = [], [], []
|
| 171 |
-
with torch.no_grad():
|
| 172 |
-
for x, y in test_dl:
|
| 173 |
-
x = x.to(device)
|
| 174 |
-
pred = model(past_values=x).prediction_outputs.cpu()
|
| 175 |
-
all_p.append(pred); all_t.append(y); all_c.append(x[:,-1,:].cpu())
|
| 176 |
-
preds=torch.cat(all_p,0).numpy(); targets=torch.cat(all_t,0).numpy(); contexts=torch.cat(all_c,0).numpy()
|
| 177 |
-
def inv(vals):
|
| 178 |
-
d=np.zeros((len(vals),n_ch)); d[:,CLOSE_IDX]=vals
|
| 179 |
-
return scaler.inverse_transform(d)[:,CLOSE_IDX]
|
| 180 |
-
pc=inv(preds[:,0,CLOSE_IDX]); tc=inv(targets[:,0,CLOSE_IDX]); cc=inv(contexts[:,CLOSE_IDX])
|
| 181 |
-
# DA(ctx) — PRIMARY
|
| 182 |
-
pd_dir=np.sign(pc-cc); td_dir=np.sign(tc-cc); mask=td_dir!=0
|
| 183 |
-
da_ctx=np.mean(pd_dir[mask]==td_dir[mask])*100 if mask.sum()>0 else 50
|
| 184 |
-
# Sharpe
|
| 185 |
-
act_ret=(tc-cc)/(cc+1e-10); sr2=np.sign(pc-cc)*act_ret
|
| 186 |
-
sharpe_ctx=np.mean(sr2)/(np.std(sr2)+1e-10)*np.sqrt(252)
|
| 187 |
-
return {'da_ctx': da_ctx, 'sharpe_ctx': sharpe_ctx, ...}
|
| 188 |
-
```
|
| 189 |
-
|
| 190 |
-
---
|
| 191 |
-
|
| 192 |
-
## PITFALLS TO AVOID
|
| 193 |
-
|
| 194 |
-
| Pitfall | What Happens | Fix |
|
| 195 |
-
|---------|-------------|-----|
|
| 196 |
-
| Using DA(diff) | Reports ~70%, misleading | Always use DA(ctx) |
|
| 197 |
-
| Global wavelet before split | Look-ahead bias (+7-10% fake DA) | Use causal or none |
|
| 198 |
-
| MADL loss at α≥0.1 | Collapses to always-predict-up | Use MSE |
|
| 199 |
-
| Too many features | Overfitting (v4: 25ch → 43% WF) | Fewer, uncorrelated features |
|
| 200 |
-
| yf.download() | MultiIndex bugs in 2026 | Use yf.Ticker().history() |
|
| 201 |
-
| pd.Series.rolling().values assignment | Silent NaN from index alignment | Use pure numpy |
|
| 202 |
-
| Single-split only | v4 showed 60% single, 43% WF | Always walk-forward |
|
| 203 |
-
| BATCH=32 on GPU | 10-20% GPU utilization | Use BATCH=256 |
|
| 204 |
-
| No checkpoints | Colab disconnect = lost work | Save every 5 min to Hub |
|
| 205 |
-
| `trackio.init(run_name=...)` | TypeError | Use `name=` |
|
| 206 |
-
|
| 207 |
---
|
| 208 |
|
| 209 |
## FILES IN THIS REPO
|
|
@@ -211,19 +113,14 @@ def evaluate(model, test_dl, scaler, n_ch):
|
|
| 211 |
| File | Purpose |
|
| 212 |
|------|---------|
|
| 213 |
| **`CLAUDE.md`** | THIS FILE |
|
| 214 |
-
| `
|
| 215 |
-
| `
|
| 216 |
-
| `RESEARCH_FINDINGS.md` | Full experimental evidence + literature review |
|
| 217 |
-
| `V4_DESIGN.md` | v4 design (25ch causal wavelet — failed) |
|
| 218 |
-
| `notebooks/multi_market_test.ipynb` | **Next: test 5 markets** |
|
| 219 |
| `notebooks/v5_train_colab.ipynb` | v5 (8ch, no wavelet) |
|
| 220 |
-
| `
|
| 221 |
-
| `
|
| 222 |
-
| `notebooks/v3_inference_colab.ipynb` | Use v3 model |
|
| 223 |
| `results/v3_results.json` | v3 final (79 WF windows) |
|
| 224 |
-
| `results/v4_live_checkpoint.json` | v4 partial (
|
| 225 |
-
| `
|
| 226 |
-
| `results/multi_market_results.json` | Multi-market test (after run) |
|
| 227 |
|
| 228 |
---
|
| 229 |
|
|
@@ -231,10 +128,10 @@ def evaluate(model, test_dl, scaler, n_ch):
|
|
| 231 |
|
| 232 |
### Confirmed by our experiments
|
| 233 |
- xLSTM-TS F1=73% (2408.12408) — **inflated by look-ahead bias** (we proved this)
|
| 234 |
-
- S&P 500 daily DA ~50% with honest methodology — **confirmed across v4, v5**
|
| 235 |
|
| 236 |
-
### What actually works (
|
| 237 |
- News sentiment: [2304.07619](https://arxiv.org/abs/2304.07619) *Journal of Finance*
|
| 238 |
- Stock ranking: [2603.16985](https://arxiv.org/abs/2603.16985) TIPS SR=1.506
|
| 239 |
- Less efficient markets: xLSTM-TS better on EWZ than S&P 500
|
| 240 |
-
- Financial foundation model: [2508.02739](https://arxiv.org/abs/2508.02739) Kronos +93% RankIC
|
|
|
|
| 1 |
# CLAUDE.md — PatchTST + Wavelet S&P 500 Forecasting: Complete Agent Reference
|
| 2 |
|
| 3 |
> **Single-source-of-truth** for an AI agent working on this project.
|
| 4 |
+
> Updated: 2026-05-02 after completing v1→v2→v3→v4→v5→v6.
|
| 5 |
|
| 6 |
---
|
| 7 |
|
|
|
|
| 9 |
|
| 10 |
**Goal**: Investigated PatchTST Transformer with wavelet denoising on S&P 500 data for next-day price forecasting.
|
| 11 |
|
| 12 |
+
**Final Conclusion**: S&P 500 daily direction is NOT predictable from price/technical indicators with honest methodology. The v3 walk-forward of 53.3% was entirely due to wavelet look-ahead bias. Without it: ≤50%. v6 with 12 custom technical indicators = 49.6% WF (random).
|
| 13 |
|
| 14 |
**Repo**: `tbukuai/patchtst-wavelet-sp500-research` on HF Hub.
|
| 15 |
|
|
|
|
| 29 |
| `v3` | 13 | Global | 60.8%/53.3% WF | Best (biased) | `revision="v3"` |
|
| 30 |
| `v4` | 25 | Causal | 60.1%/~43% WF | Honest, overfits | `revision="v4"` |
|
| 31 |
| `v5` | 8 | None | 46.1% | Honest baseline | `revision="v5"` |
|
| 32 |
+
| `v6` | 12 custom | None | 51.9%/49.6% WF | Random (confirmed) | `revision="v6"` |
|
| 33 |
|
| 34 |
+
### v6 Features (12 channels)
|
| 35 |
+
`['Open', 'High', 'Low', 'Close', 'Volume', 'MA5', 'MA23', 'MA53', 'RSI', 'MACD', 'VIX', 'MAVOL']`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
---
|
| 38 |
|
| 39 |
## CRITICAL LESSONS LEARNED (READ FIRST)
|
| 40 |
|
| 41 |
### 1. DA(diff) is FAKE — always use DA(ctx)
|
| 42 |
+
- **DA(diff)** = Always ~70%. **USELESS.**
|
| 43 |
- **DA(ctx)** = `np.sign(pred - last_known_close) == np.sign(actual - last_known_close)` → **Real metric.**
|
| 44 |
|
| 45 |
### 2. Global wavelet denoising has LOOK-AHEAD BIAS
|
|
|
|
| 50 |
### 3. MADL loss COLLAPSES
|
| 51 |
- v2: MADL at α=0.3 → predicts "up" 98.9% of the time. Use MSE.
|
| 52 |
|
| 53 |
+
### 4. Feature engineering doesn't help direction prediction
|
| 54 |
+
- v3 (13ch): 53.3% WF (biased)
|
| 55 |
- v4 (25ch): ~43% WF — **worse** with more features
|
| 56 |
+
- v5 (8ch): 46.1% single — fewer features didn't help
|
| 57 |
+
- v6 (12ch custom): 49.6% WF — multi-scale MAs + VIX = still random
|
| 58 |
|
| 59 |
### 5. S&P 500 daily direction is NOT predictable from price history
|
| 60 |
+
- Confirmed across 6 versions with different features and approaches
|
| 61 |
- All honest methods give ≤50% DA (random)
|
| 62 |
+
- v6 with carefully selected technical indicators = 49.6% WF = random
|
| 63 |
|
| 64 |
### 6. Training practices that WORK
|
| 65 |
+
- **BATCH=256-1024** on T4 GPU (256=8GB, 1024=10GB with fp16)
|
| 66 |
+
- **torch.cuda.amp.autocast()** + **GradScaler** for fp16 mixed precision (~2× speedup)
|
| 67 |
- **num_workers=2, persistent_workers=True** in DataLoaders
|
| 68 |
- **80/10/10 split**, early stopping patience=30
|
| 69 |
- **Walk-forward validation** (quarterly, 63-day windows) for honest metrics
|
|
|
|
| 70 |
- **5-minute checkpoints** to Hub for resumability on Colab
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
---
|
| 73 |
|
|
|
|
| 80 |
| v3 | **60.8%** | **53.3%** | Fixed metric, walk-forward | Best but biased by global wavelet |
|
| 81 |
| v4 | 60.1% | **~43%** | Causal wavelet, 25ch | More features = worse, bias was the signal |
|
| 82 |
| v5 | **46.1%** | — | No wavelet, 8ch | Below random — no signal exists |
|
| 83 |
+
| **v6** | **51.9%** | **49.6%** | **12ch custom, no wavelet** | **Random. Custom features don't help.** |
|
| 84 |
|
| 85 |
---
|
| 86 |
|
| 87 |
## COLAB TRAINING PRACTICES
|
| 88 |
|
| 89 |
+
### GPU Optimization (v6 tested)
|
| 90 |
+
- **BATCH=1024** + **fp16 mixed precision** → ~5-8× speedup over BATCH=256 fp32
|
| 91 |
+
- Uses ~10GB/15GB VRAM (67% utilization vs 25% at BATCH=256)
|
| 92 |
+
- Single-split training: ~1-2 min (was 8 min)
|
| 93 |
+
- Walk-forward 30 epochs/window: ~40 sec (was ~3 min)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
|
|
|
| 95 |
```python
|
| 96 |
+
# Setup
|
| 97 |
+
BATCH = 1024
|
| 98 |
+
scaler_amp = torch.cuda.amp.GradScaler()
|
| 99 |
+
|
| 100 |
+
# Training loop
|
| 101 |
+
with torch.cuda.amp.autocast():
|
| 102 |
+
out = model(past_values=x, future_values=y)
|
| 103 |
+
scaler_amp.scale(out.loss).backward()
|
| 104 |
+
scaler_amp.unscale_(opt)
|
| 105 |
+
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
|
| 106 |
+
scaler_amp.step(opt); scaler_amp.update()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
```
|
| 108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
---
|
| 110 |
|
| 111 |
## FILES IN THIS REPO
|
|
|
|
| 113 |
| File | Purpose |
|
| 114 |
|------|---------|
|
| 115 |
| **`CLAUDE.md`** | THIS FILE |
|
| 116 |
+
| `PROJECT_CONCLUSION.md` | Final results + conclusions (v1-v6) |
|
| 117 |
+
| `notebooks/v6_train_colab.ipynb` | v6 (12ch custom, no wavelet) |
|
|
|
|
|
|
|
|
|
|
| 118 |
| `notebooks/v5_train_colab.ipynb` | v5 (8ch, no wavelet) |
|
| 119 |
+
| `results/v6_results.json` | v6 final results (20/78 WF windows) |
|
| 120 |
+
| `results/v6_live_checkpoint.json` | v6 training checkpoint |
|
|
|
|
| 121 |
| `results/v3_results.json` | v3 final (79 WF windows) |
|
| 122 |
+
| `results/v4_live_checkpoint.json` | v4 partial (31 WF windows) |
|
| 123 |
+
| `train_v6.py` | v6 standalone training script |
|
|
|
|
| 124 |
|
| 125 |
---
|
| 126 |
|
|
|
|
| 128 |
|
| 129 |
### Confirmed by our experiments
|
| 130 |
- xLSTM-TS F1=73% (2408.12408) — **inflated by look-ahead bias** (we proved this)
|
| 131 |
+
- S&P 500 daily DA ~50% with honest methodology — **confirmed across v4, v5, v6**
|
| 132 |
|
| 133 |
+
### What actually works (from literature)
|
| 134 |
- News sentiment: [2304.07619](https://arxiv.org/abs/2304.07619) *Journal of Finance*
|
| 135 |
- Stock ranking: [2603.16985](https://arxiv.org/abs/2603.16985) TIPS SR=1.506
|
| 136 |
- Less efficient markets: xLSTM-TS better on EWZ than S&P 500
|
| 137 |
+
- Financial foundation model: [2508.02739](https://arxiv.org/abs/2508.02739) Kronos +93% RankIC
|