anomaly-gpt
GPT-based financial anomaly detector. Uses next-token perplexity as a market surprise score, evaluated on 7 historical crises against rolling volatility, EWMA, and Isolation Forest baselines.
Model Description
Decoder-only GPT (203K parameters) trained on discretized SPY log-returns (2010β2019). Anomaly score = negative log-likelihood of observed return token given 20-day context:
Anomaly Score(t) = -log P_ΞΈ(r_t | r_{t-19}, ..., r_{t-1})
High score = the model's learned prior was violated = potential regime change.
Results
GPT perplexity detects 4/7 documented market crises vs 1/7 for rolling volatility.
| Event | GPT | Rolling Vol | EWMA | Isolation Forest |
|---|---|---|---|---|
| Flash Crash 2010 | β | β | β | β |
| China Devaluation 2015 | β | β | β | β |
| XIV Vol Cascade 2018 | β | β | β | β |
| COVID Crash 2020 | β | β | β | β |
| Meme Stock Squeeze 2021 | β | β | β | β |
| Fed Rate Shock 2022 | β | β | β | β |
| SVB Collapse 2023 | β | β | β | β |
| Total | 4/7 | 1/7 | 2/7 | 2/7 |
GPT exclusively detects SVB Collapse (2023) and Meme Stock Squeeze (2021) β missed by every baseline. These are sequential anomalies, not pure magnitude events β exactly what a language model captures and rolling volatility cannot.
Architecture
- vocab_size: 3 (down / flat / up β 3-bin return discretization)
- context_length: 20 trading days (~1 month)
- n_layer: 4, n_head: 4, n_embd: 64
- Parameters: 203K
- Weight tying: token embedding β output head
- Optimizer: AdamW (Ξ²=0.9, 0.95) + cosine LR schedule with linear warmup
- Training data: SPY log-returns 2010β2019 (pre-COVID only)
- Evaluation: SPY 2010β2024
Ablation Results
3Γ3 grid search across vocab size and context length:
| vocab_size | avg val loss |
|---|---|
| 3 | 0.4959 β best |
| 5 | 0.5273 |
| 9 | 0.5888 |
Coarser tokenization generalizes 15.3% better. Longer context consistently hurts β financial returns are near-memoryless beyond 20 days.
Usage
import torch
from huggingface_hub import hf_hub_download
# Download checkpoint
ckpt_path = hf_hub_download(repo_id="AriaMF/anomaly-gpt", filename="final_model.pt")
ckpt = torch.load(ckpt_path, map_location="cpu")
# Load model (clone repo first: github.com/ariamousavifar/anomaly-gpt)
from gpt.model import GPT
model = GPT(vocab_size=3, context_length=20)
model.load_state_dict(ckpt["model"])
model.eval()
# Score returns
from data.tokenizer import ReturnTokenizer
from anomaly.scorer import AnomalyScorer
from data.loader import download_returns
tok = ReturnTokenizer(vocab_size=3)
scorer = AnomalyScorer(model, tok, context_length=20)
returns = download_returns("SPY")
scores = scorer.rolling_score(returns)
print(scores.tail())
Limitations
- Trained on US equity data only (SPY)
- Daily frequency β intraday events (e.g. Flash Crash) are not detectable
- Binary tokenization loses within-bin magnitude information
- Results evaluated on known historical events β not a forward-looking backtest
- Not a trading signal β research platform only