Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
tags:
|
| 5 |
+
- chess
|
| 6 |
+
- maia
|
| 7 |
+
- maia3
|
| 8 |
+
- chessformer
|
| 9 |
+
- move-prediction
|
| 10 |
+
- human-ai
|
| 11 |
+
- interpretability
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Maia3-23M
|
| 15 |
+
|
| 16 |
+
Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **23M-parameter variant**.
|
| 17 |
+
|
| 18 |
+
For the full model card — architecture details, training recipe, full evaluation, and ablations — see the paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026) and the [Maia3 collection](https://huggingface.co/collections/UofTCSSLab/maia3).
|
| 19 |
+
|
| 20 |
+
## Model summary
|
| 21 |
+
|
| 22 |
+
- **Family:** Maia3 — human move prediction models built on the **Chessformer** architecture
|
| 23 |
+
- **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)** — a dynamic positional encoding that adapts to the geometry of chess — and an attention-based source-destination policy head
|
| 24 |
+
- **Parameters:** 23M
|
| 25 |
+
- **Task:** predicting the move a human player of a given skill level would make from a given position
|
| 26 |
+
- **Training data:** Lichess human games, January 2023 – July 2025
|
| 27 |
+
- **License:** CC BY 4.0 (paper); see repo for code/weights license
|
| 28 |
+
|
| 29 |
+
## Intended use
|
| 30 |
+
|
| 31 |
+
Maia3 models predict human chess moves conditioned on player rating. Typical uses include:
|
| 32 |
+
|
| 33 |
+
- Research on human chess modeling and human–AI alignment
|
| 34 |
+
- Move-suggestion and analysis tools that emulate play at a chosen rating
|
| 35 |
+
- Mechanistic interpretability research — the square-token design makes attention patterns and activations directly attributable to board squares, and the repo ships a cross-layer transcoder (`transcoder.py`) for studying internal features
|
| 36 |
+
|
| 37 |
+
Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
|
| 38 |
+
|
| 39 |
+
## How to use
|
| 40 |
+
|
| 41 |
+
Maia3-23M is a PyTorch checkpoint trained with the code at [CSSLab/maia3](https://github.com/CSSLab/maia3). Clone that repo, set up the conda environment, and load the checkpoint following the instructions in its README.
|
| 42 |
+
|
| 43 |
+
Architecture hyperparameters for this variant are defined in `ablate_size.sh` in the training repo.
|
| 44 |
+
|
| 45 |
+
## Training
|
| 46 |
+
|
| 47 |
+
- **Data:** Lichess monthly game dumps, January 2023 – July 2025
|
| 48 |
+
- **Validation:** Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
|
| 49 |
+
- **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
|
| 50 |
+
- **Config:** size ablation row corresponding to 23M parameters in `ablate_size.sh`
|
| 51 |
+
|
| 52 |
+
## Evaluation
|
| 53 |
+
|
| 54 |
+
The Maia3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
|
| 55 |
+
|
| 56 |
+
## Limitations
|
| 57 |
+
|
| 58 |
+
- Trained on Lichess games only; play styles on other platforms or over-the-board may differ.
|
| 59 |
+
- Predicts moves conditioned on rating; does not produce engine-strength play. For maximum strength, use Chessformer integrated into Leela Chess Zero (see the paper).
|
| 60 |
+
- Predictions reflect patterns in human play at each rating level, including systematic blunders and stylistic biases.
|
| 61 |
+
|
| 62 |
+
## Citation
|
| 63 |
+
|
| 64 |
+
```bibtex
|
| 65 |
+
@inproceedings{monroe2026chessformer,
|
| 66 |
+
title={Chessformer: A Unified Architecture for Chess Modeling},
|
| 67 |
+
author={Daniel Monroe and George Eilender and Philip Chalmers and Zhenwei Tang and Ashton Anderson},
|
| 68 |
+
booktitle={The Fourteenth International Conference on Learning Representations},
|
| 69 |
+
year={2026},
|
| 70 |
+
url={https://openreview.net/forum?id=2ltBRzEHyd}
|
| 71 |
+
}
|
| 72 |
+
```
|