tbukuai
/

patchtst-wavelet-sp500-research

Safetensors

patchtst

Model card Files Files and versions

xet

Community

tbukuai commited on 30 days ago

Commit

b014e78

verified ·

1 Parent(s): 649d82a

Update CLAUDE.md with v6 results

Browse files

Files changed (1) hide show

CLAUDE.md +40 -143

CLAUDE.md CHANGED Viewed

@@ -1,7 +1,7 @@
 # CLAUDE.md — PatchTST + Wavelet S&P 500 Forecasting: Complete Agent Reference
 > **Single-source-of-truth** for an AI agent working on this project.
-> Updated: 2026-05-02 after completing v1→v2→v3→v4→v5.
 ---
@@ -9,7 +9,7 @@
 **Goal**: Investigated PatchTST Transformer with wavelet denoising on S&P 500 data for next-day price forecasting.
-**Final Conclusion**: S&P 500 daily direction is NOT predictable from price/technical indicators with honest methodology. The v3 walk-forward of 53.3% was entirely due to wavelet look-ahead bias. Without it: ≤50%.
 **Repo**: `tbukuai/patchtst-wavelet-sp500-research` on HF Hub.
@@ -29,43 +29,17 @@ All model versions live in one repo with branches:
 | `v3` | 13 | Global | 60.8%/53.3% WF | Best (biased) | `revision="v3"` |
 | `v4` | 25 | Causal | 60.1%/~43% WF | Honest, overfits | `revision="v4"` |
 | `v5` | 8 | None | 46.1% | Honest baseline | `revision="v5"` |
-### Branch Practices
-- **main** = best performing model (currently v3)
-- Each version gets its own branch with README explaining results + findings
-- Model files (`model.safetensors`, `config.json`, `scaler.pkl`, `metrics.json`) on each branch
-- README on each branch documents: results, what was tested, why it worked/failed, lesson learned
-- Use `revision="v3"` in `from_pretrained()` to load specific versions
-- When a new version beats main, update main branch
-### Creating a New Version
-```bash
-# 1. Create branch
-# (via HF API or hf_repo_git create_branch)
-# 2. Upload model files to branch
-hf upload tbukuai/stock-patchtst-sp500 ./model_files --revision vN --commit-message "vN: description"
-# 3. Add README to branch explaining results + findings
-# 4. If it beats main, update main:
-hf download tbukuai/stock-patchtst-sp500 --revision vN --local-dir ./best
-hf upload tbukuai/stock-patchtst-sp500 ./best --commit-message "main: update to vN (new best)"
-```
-### Standalone Repos (legacy, may be deleted)
-These still exist but branches are the canonical source:
-- `tbukuai/stock-patchtst-sp500-v2`
-- `tbukuai/stock-patchtst-sp500-v3`
-- `tbukuai/stock-patchtst-sp500-v4`
-- `tbukuai/stock-patchtst-sp500-v5`
 ---
 ## CRITICAL LESSONS LEARNED (READ FIRST)
 ### 1. DA(diff) is FAKE — always use DA(ctx)
-- **DA(diff)** = `np.sign(np.diff(pred)) == np.sign(np.diff(actual))` → Always ~70%. **USELESS.**
 - **DA(ctx)** = `np.sign(pred - last_known_close) == np.sign(actual - last_known_close)` → **Real metric.**
 ### 2. Global wavelet denoising has LOOK-AHEAD BIAS
@@ -76,26 +50,24 @@ These still exist but branches are the canonical source:
 ### 3. MADL loss COLLAPSES
 - v2: MADL at α=0.3 → predicts "up" 98.9% of the time. Use MSE.
-### 4. More features = MORE overfitting
-- v3 (13ch): 53.3% WF
 - v4 (25ch): ~43% WF — **worse** with more features
-- v5 (8ch): 46.1% single — fewer features didn't help either without wavelet
 ### 5. S&P 500 daily direction is NOT predictable from price history
-- Confirmed across 5 versions with different approaches
 - All honest methods give ≤50% DA (random)
-- The literature consensus is correct
 ### 6. Training practices that WORK
-- **BATCH=256** on T4 GPU (uses 8GB/15GB, 2-3× faster than BATCH=32)
 - **num_workers=2, persistent_workers=True** in DataLoaders
 - **80/10/10 split**, early stopping patience=30
 - **Walk-forward validation** (quarterly, 63-day windows) for honest metrics
-- **Push model to Hub BEFORE walk-forward** to avoid data loss
 - **5-minute checkpoints** to Hub for resumability on Colab
-- **yf.Ticker().history()** not `yf.download()` (avoids MultiIndex bugs)
-- **df.reset_index(drop=True)** before numpy array assignment (avoids alignment bugs)
-- **Pure numpy for features** (avoid pd.Series.rolling() + DataFrame assignment issues)
 ---
@@ -108,102 +80,32 @@ These still exist but branches are the canonical source:
 | v3 | **60.8%** | **53.3%** | Fixed metric, walk-forward | Best but biased by global wavelet |
 | v4 | 60.1% | **~43%** | Causal wavelet, 25ch | More features = worse, bias was the signal |
 | v5 | **46.1%** | — | No wavelet, 8ch | Below random — no signal exists |
 ---
 ## COLAB TRAINING PRACTICES
-### Resumable Notebooks
-1. **Checkpoint to Hub every 5 minutes** via `results/v{N}_live_checkpoint.json`
-2. **On restart**: load checkpoint, skip completed steps, resume from last window
-3. **Push model immediately after single-split training** — before walk-forward starts
-### GPU Optimization
-- **BATCH=256** (not 32) — uses 8GB/15GB VRAM, 2-3× faster
-- **num_workers=2, persistent_workers=True** — overlaps data loading
-- **pin_memory=True** — faster CPU→GPU transfer
-- PatchTST 600K params: ~3-4s/epoch at BATCH=256 on T4
-### yfinance Compatibility (as of 2026)
 ```python
-# USE THIS (flat DataFrame, no MultiIndex):
-sp500 = yf.Ticker('^GSPC')
-df = sp500.history(start='2000-01-01', end='2025-01-01')
-# NOT THIS (broken MultiIndex with new yfinance):
-df = yf.download('^GSPC', ...)  # returns MultiIndex, causes bugs
-```
-### Feature Engineering (numpy only, avoid pandas alignment bugs)
-```python
-df = df.reset_index(drop=True)  # integer index
-close = df['Close'].values.astype(float)  # numpy array
-# Compute features as numpy, assign back:
-df['MA5'] = rolling_mean_np(close, 5)  # custom numpy function
-```
-### Checkpoint Schema
-```json
-{
-  "step": "start|training|train_done|walkforward|wf_done|all_done",
-  "detail": "human-readable progress",
-  "metrics": {"da_ctx": 60.8, "sharpe_ctx": 4.32, "rmse": 26.84},
-  "wf_results": [{"window": 0, "da_ctx": 52.4, ...}],
-  "saved_at": "2026-05-01 03:00:00 UTC"
-}
 ```
-### Trackio
-```python
-trackio.init(project='patchtst-sp500-vN', name='vN-run', space_id='tbukuai/mlintern-ptv3stck')
-# NOTE: uses `name=`, NOT `run_name=`
-# Auth issues common — falls back to local logging, non-critical
-```
----
-## EVALUATION CODE
-```python
-def evaluate(model, test_dl, scaler, n_ch):
-    model.eval()
-    all_p, all_t, all_c = [], [], []
-    with torch.no_grad():
-        for x, y in test_dl:
-            x = x.to(device)
-            pred = model(past_values=x).prediction_outputs.cpu()
-            all_p.append(pred); all_t.append(y); all_c.append(x[:,-1,:].cpu())
-    preds=torch.cat(all_p,0).numpy(); targets=torch.cat(all_t,0).numpy(); contexts=torch.cat(all_c,0).numpy()
-    def inv(vals):
-        d=np.zeros((len(vals),n_ch)); d[:,CLOSE_IDX]=vals
-        return scaler.inverse_transform(d)[:,CLOSE_IDX]
-    pc=inv(preds[:,0,CLOSE_IDX]); tc=inv(targets[:,0,CLOSE_IDX]); cc=inv(contexts[:,CLOSE_IDX])
-    # DA(ctx) — PRIMARY
-    pd_dir=np.sign(pc-cc); td_dir=np.sign(tc-cc); mask=td_dir!=0
-    da_ctx=np.mean(pd_dir[mask]==td_dir[mask])*100 if mask.sum()>0 else 50
-    # Sharpe
-    act_ret=(tc-cc)/(cc+1e-10); sr2=np.sign(pc-cc)*act_ret
-    sharpe_ctx=np.mean(sr2)/(np.std(sr2)+1e-10)*np.sqrt(252)
-    return {'da_ctx': da_ctx, 'sharpe_ctx': sharpe_ctx, ...}
-```
----
-## PITFALLS TO AVOID
-| Pitfall | What Happens | Fix |
-|---------|-------------|-----|
-| Using DA(diff) | Reports ~70%, misleading | Always use DA(ctx) |
-| Global wavelet before split | Look-ahead bias (+7-10% fake DA) | Use causal or none |
-| MADL loss at α≥0.1 | Collapses to always-predict-up | Use MSE |
-| Too many features | Overfitting (v4: 25ch → 43% WF) | Fewer, uncorrelated features |
-| yf.download() | MultiIndex bugs in 2026 | Use yf.Ticker().history() |
-| pd.Series.rolling().values assignment | Silent NaN from index alignment | Use pure numpy |
-| Single-split only | v4 showed 60% single, 43% WF | Always walk-forward |
-| BATCH=32 on GPU | 10-20% GPU utilization | Use BATCH=256 |
-| No checkpoints | Colab disconnect = lost work | Save every 5 min to Hub |
-| `trackio.init(run_name=...)` | TypeError | Use `name=` |
 ---
 ## FILES IN THIS REPO
@@ -211,19 +113,14 @@ def evaluate(model, test_dl, scaler, n_ch):
 | File | Purpose |
 |------|---------|
 | **`CLAUDE.md`** | THIS FILE |
-| `AGENT_HANDOFF.md` | Quick-start for new agents |
-| `README.md` | Project summary with final conclusions |
-| `RESEARCH_FINDINGS.md` | Full experimental evidence + literature review |
-| `V4_DESIGN.md` | v4 design (25ch causal wavelet — failed) |
-| `notebooks/multi_market_test.ipynb` | **Next: test 5 markets** |
 | `notebooks/v5_train_colab.ipynb` | v5 (8ch, no wavelet) |
-| `notebooks/v4_train_colab.ipynb` | v4 (25ch, causal wavelet) |
-| `notebooks/v3_train_colab.ipynb` | v3 (completed, best) |
-| `notebooks/v3_inference_colab.ipynb` | Use v3 model |
 | `results/v3_results.json` | v3 final (79 WF windows) |
-| `results/v4_live_checkpoint.json` | v4 partial (29 WF windows) |
-| `results/v5_live_checkpoint.json` | v5 training metrics |
-| `results/multi_market_results.json` | Multi-market test (after run) |
 ---
@@ -231,10 +128,10 @@ def evaluate(model, test_dl, scaler, n_ch):
 ### Confirmed by our experiments
 - xLSTM-TS F1=73% (2408.12408) — **inflated by look-ahead bias** (we proved this)
-- S&P 500 daily DA ~50% with honest methodology — **confirmed across v4, v5**
-### What actually works (not our experiments, from literature)
 - News sentiment: [2304.07619](https://arxiv.org/abs/2304.07619) *Journal of Finance*
 - Stock ranking: [2603.16985](https://arxiv.org/abs/2603.16985) TIPS SR=1.506
 - Less efficient markets: xLSTM-TS better on EWZ than S&P 500
-- Financial foundation model: [2508.02739](https://arxiv.org/abs/2508.02739) Kronos +93% RankIC

 # CLAUDE.md — PatchTST + Wavelet S&P 500 Forecasting: Complete Agent Reference
 > **Single-source-of-truth** for an AI agent working on this project.
+> Updated: 2026-05-02 after completing v1→v2→v3→v4→v5→v6.
 ---
 **Goal**: Investigated PatchTST Transformer with wavelet denoising on S&P 500 data for next-day price forecasting.
+**Final Conclusion**: S&P 500 daily direction is NOT predictable from price/technical indicators with honest methodology. The v3 walk-forward of 53.3% was entirely due to wavelet look-ahead bias. Without it: ≤50%. v6 with 12 custom technical indicators = 49.6% WF (random).
 **Repo**: `tbukuai/patchtst-wavelet-sp500-research` on HF Hub.
 | `v3` | 13 | Global | 60.8%/53.3% WF | Best (biased) | `revision="v3"` |
 | `v4` | 25 | Causal | 60.1%/~43% WF | Honest, overfits | `revision="v4"` |
 | `v5` | 8 | None | 46.1% | Honest baseline | `revision="v5"` |
+| `v6` | 12 custom | None | 51.9%/49.6% WF | Random (confirmed) | `revision="v6"` |
+### v6 Features (12 channels)
+`['Open', 'High', 'Low', 'Close', 'Volume', 'MA5', 'MA23', 'MA53', 'RSI', 'MACD', 'VIX', 'MAVOL']`
 ---
 ## CRITICAL LESSONS LEARNED (READ FIRST)
 ### 1. DA(diff) is FAKE — always use DA(ctx)
+- **DA(diff)** = Always ~70%. **USELESS.**
 - **DA(ctx)** = `np.sign(pred - last_known_close) == np.sign(actual - last_known_close)` → **Real metric.**
 ### 2. Global wavelet denoising has LOOK-AHEAD BIAS
 ### 3. MADL loss COLLAPSES
 - v2: MADL at α=0.3 → predicts "up" 98.9% of the time. Use MSE.
+### 4. Feature engineering doesn't help direction prediction
+- v3 (13ch): 53.3% WF (biased)
 - v4 (25ch): ~43% WF — **worse** with more features
+- v5 (8ch): 46.1% single — fewer features didn't help
+- v6 (12ch custom): 49.6% WF — multi-scale MAs + VIX = still random
 ### 5. S&P 500 daily direction is NOT predictable from price history
+- Confirmed across 6 versions with different features and approaches
 - All honest methods give ≤50% DA (random)
+- v6 with carefully selected technical indicators = 49.6% WF = random
 ### 6. Training practices that WORK
+- **BATCH=256-1024** on T4 GPU (256=8GB, 1024=10GB with fp16)
+- **torch.cuda.amp.autocast()** + **GradScaler** for fp16 mixed precision (~2× speedup)
 - **num_workers=2, persistent_workers=True** in DataLoaders
 - **80/10/10 split**, early stopping patience=30
 - **Walk-forward validation** (quarterly, 63-day windows) for honest metrics
 - **5-minute checkpoints** to Hub for resumability on Colab
 ---
 | v3 | **60.8%** | **53.3%** | Fixed metric, walk-forward | Best but biased by global wavelet |
 | v4 | 60.1% | **~43%** | Causal wavelet, 25ch | More features = worse, bias was the signal |
 | v5 | **46.1%** | — | No wavelet, 8ch | Below random — no signal exists |
+| **v6** | **51.9%** | **49.6%** | **12ch custom, no wavelet** | **Random. Custom features don't help.** |
 ---
 ## COLAB TRAINING PRACTICES
+### GPU Optimization (v6 tested)
+- **BATCH=1024** + **fp16 mixed precision** → ~5-8× speedup over BATCH=256 fp32
+- Uses ~10GB/15GB VRAM (67% utilization vs 25% at BATCH=256)
+- Single-split training: ~1-2 min (was 8 min)
+- Walk-forward 30 epochs/window: ~40 sec (was ~3 min)
 ```python
+# Setup
+BATCH = 1024
+scaler_amp = torch.cuda.amp.GradScaler()
+# Training loop
+with torch.cuda.amp.autocast():
+    out = model(past_values=x, future_values=y)
+scaler_amp.scale(out.loss).backward()
+scaler_amp.unscale_(opt)
+torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+scaler_amp.step(opt); scaler_amp.update()
 ```
 ---
 ## FILES IN THIS REPO
 | File | Purpose |
 |------|---------|
 | **`CLAUDE.md`** | THIS FILE |
+| `PROJECT_CONCLUSION.md` | Final results + conclusions (v1-v6) |
+| `notebooks/v6_train_colab.ipynb` | v6 (12ch custom, no wavelet) |
 | `notebooks/v5_train_colab.ipynb` | v5 (8ch, no wavelet) |
+| `results/v6_results.json` | v6 final results (20/78 WF windows) |
+| `results/v6_live_checkpoint.json` | v6 training checkpoint |
 | `results/v3_results.json` | v3 final (79 WF windows) |
+| `results/v4_live_checkpoint.json` | v4 partial (31 WF windows) |
+| `train_v6.py` | v6 standalone training script |
 ---
 ### Confirmed by our experiments
 - xLSTM-TS F1=73% (2408.12408) — **inflated by look-ahead bias** (we proved this)
+- S&P 500 daily DA ~50% with honest methodology — **confirmed across v4, v5, v6**
+### What actually works (from literature)
 - News sentiment: [2304.07619](https://arxiv.org/abs/2304.07619) *Journal of Finance*
 - Stock ranking: [2603.16985](https://arxiv.org/abs/2603.16985) TIPS SR=1.506
 - Less efficient markets: xLSTM-TS better on EWZ than S&P 500
+- Financial foundation model: [2508.02739](https://arxiv.org/abs/2508.02739) Kronos +93% RankIC