anomaly-gpt

GPT-based financial anomaly detector. Uses next-token perplexity as a market surprise score, evaluated on 7 historical crises against rolling volatility, EWMA, and Isolation Forest baselines.

Model Description

Decoder-only GPT (203K parameters) trained on discretized SPY log-returns (2010–2019). Anomaly score = negative log-likelihood of observed return token given 20-day context:

Anomaly Score(t) = -log P_ΞΈ(r_t | r_{t-19}, ..., r_{t-1})

High score = the model's learned prior was violated = potential regime change.

Results

GPT perplexity detects 4/7 documented market crises vs 1/7 for rolling volatility.

Event GPT Rolling Vol EWMA Isolation Forest
Flash Crash 2010 β€” β€” β€” β€”
China Devaluation 2015 β€” β€” βœ“ β€”
XIV Vol Cascade 2018 β€” β€” β€” β€”
COVID Crash 2020 βœ“ βœ“ βœ“ βœ“
Meme Stock Squeeze 2021 βœ“ β€” β€” β€”
Fed Rate Shock 2022 βœ“ β€” β€” βœ“
SVB Collapse 2023 βœ“ β€” β€” β€”
Total 4/7 1/7 2/7 2/7

GPT exclusively detects SVB Collapse (2023) and Meme Stock Squeeze (2021) β€” missed by every baseline. These are sequential anomalies, not pure magnitude events β€” exactly what a language model captures and rolling volatility cannot.

Architecture

  • vocab_size: 3 (down / flat / up β€” 3-bin return discretization)
  • context_length: 20 trading days (~1 month)
  • n_layer: 4, n_head: 4, n_embd: 64
  • Parameters: 203K
  • Weight tying: token embedding ↔ output head
  • Optimizer: AdamW (Ξ²=0.9, 0.95) + cosine LR schedule with linear warmup
  • Training data: SPY log-returns 2010–2019 (pre-COVID only)
  • Evaluation: SPY 2010–2024

Ablation Results

3Γ—3 grid search across vocab size and context length:

vocab_size avg val loss
3 0.4959 βœ“ best
5 0.5273
9 0.5888

Coarser tokenization generalizes 15.3% better. Longer context consistently hurts β€” financial returns are near-memoryless beyond 20 days.

Usage

import torch
from huggingface_hub import hf_hub_download

# Download checkpoint
ckpt_path = hf_hub_download(repo_id="AriaMF/anomaly-gpt", filename="final_model.pt")
ckpt = torch.load(ckpt_path, map_location="cpu")

# Load model (clone repo first: github.com/ariamousavifar/anomaly-gpt)
from gpt.model import GPT
model = GPT(vocab_size=3, context_length=20)
model.load_state_dict(ckpt["model"])
model.eval()

# Score returns
from data.tokenizer import ReturnTokenizer
from anomaly.scorer import AnomalyScorer
from data.loader import download_returns

tok    = ReturnTokenizer(vocab_size=3)
scorer = AnomalyScorer(model, tok, context_length=20)

returns = download_returns("SPY")
scores  = scorer.rolling_score(returns)
print(scores.tail())

Limitations

  • Trained on US equity data only (SPY)
  • Daily frequency β€” intraday events (e.g. Flash Crash) are not detectable
  • Binary tokenization loses within-bin magnitude information
  • Results evaluated on known historical events β€” not a forward-looking backtest
  • Not a trading signal β€” research platform only

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support