PitchPredict xLSTM
A 20-million parameter xLSTM trained on nearly a decade of MLB pitch data to predict pitch sequences โ not just pitch type, but speed, spin, trajectory, plate location, and result.
Repository: baseball-analytica/pitchpredict
Overview
Each pitch is encoded as 16 tokens covering type, speed, spin rate, spin axis, release point, initial velocity, acceleration, plate position, and result. The model processes sequences of these tokens alongside 27 context variables (pitcher/batter IDs, count, outs, bases, score, inning, etc.) to predict what comes next.
The model uses pitcher sessions (all pitches from when a pitcher enters to when they leave) as its sequence unit, giving it access to cross-at-bat patterns in a pitcher's outing.
Performance
| Metric | Value |
|---|---|
| Test Loss | 0.8631 |
| Top-1 Accuracy | 65.81% |
| Top-5 Accuracy | 97.90% |
| ECE (Calibration Error) | 0.013 |
The model is 37.3 pp above the best baseline (most-frequent token). It is also well-calibrated โ when it says 80% confident, it's right ~80% of the time.
Accuracy varies by what's being predicted: forward velocity reaches 95%, pitch type sits at 53%, and plate location bottoms out at 23%. The model correctly learned which aspects of pitching are mechanical, which are strategic, and which are irreducibly noisy.
Architecture
| Parameter | Value |
|---|---|
d_model |
384 |
num_blocks |
12 |
num_heads |
8 |
vocab_size |
258 |
seq_len |
512 |
| Total Parameters | ~20M |
The architecture is a custom xLSTM with a context adapter that fuses player embeddings, game state embeddings, and continuous features with the token sequence. See the repo for full implementation details.
Training
Trained on 113.9M tokens across 6.8M pitches from ~500K pitcher sessions (April 2016 -- October 2025, via Statcast). Hardware was 6x RTX 4090 with DDP and BF16 mixed precision. This checkpoint is from step 73,000, selected by minimum validation loss.
Usage
pip install pitchpredict
from pitchpredict import PitchPredictAPI
api = PitchPredictAPI()
result = await api.predict_pitcher(request)
The checkpoint is downloaded automatically from this repo on first use. See the pitchpredict documentation for API details.
Files
model.safetensorsโ Model weights (safetensors format, fp32)config.jsonโ Model hyperparameters
Authors
- Addison Kline โ API, data pipeline, tokenization
- Ryan Heaton โ xLSTM architecture, training
Part of the baseball-analytica project.
- Downloads last month
- 8