UofTCSSLab
/

Maia3-23M

+---
+language:
+- en
+tags:
+- chess
+- maia
+- maia3
+- chessformer
+- move-prediction
+- human-ai
+- interpretability
+---
+# Maia3-23M
+Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **23M-parameter variant**.
+For the full model card — architecture details, training recipe, full evaluation, and ablations — see the paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026) and the [Maia3 collection](https://huggingface.co/collections/UofTCSSLab/maia3).
+## Model summary
+- **Family:** Maia3 — human move prediction models built on the **Chessformer** architecture
+- **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)** — a dynamic positional encoding that adapts to the geometry of chess — and an attention-based source-destination policy head
+- **Parameters:** 23M
+- **Task:** predicting the move a human player of a given skill level would make from a given position
+- **Training data:** Lichess human games, January 2023 – July 2025
+- **License:** CC BY 4.0 (paper); see repo for code/weights license
+## Intended use
+Maia3 models predict human chess moves conditioned on player rating. Typical uses include:
+- Research on human chess modeling and human–AI alignment
+- Move-suggestion and analysis tools that emulate play at a chosen rating
+- Mechanistic interpretability research — the square-token design makes attention patterns and activations directly attributable to board squares, and the repo ships a cross-layer transcoder (`transcoder.py`) for studying internal features
+Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
+## How to use
+Maia3-23M is a PyTorch checkpoint trained with the code at [CSSLab/maia3](https://github.com/CSSLab/maia3). Clone that repo, set up the conda environment, and load the checkpoint following the instructions in its README.
+Architecture hyperparameters for this variant are defined in `ablate_size.sh` in the training repo.
+## Training
+- **Data:** Lichess monthly game dumps, January 2023 – July 2025
+- **Validation:** Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
+- **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
+- **Config:** size ablation row corresponding to 23M parameters in `ablate_size.sh`
+## Evaluation
+The Maia3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
+## Limitations
+- Trained on Lichess games only; play styles on other platforms or over-the-board may differ.
+- Predicts moves conditioned on rating; does not produce engine-strength play. For maximum strength, use Chessformer integrated into Leela Chess Zero (see the paper).
+- Predictions reflect patterns in human play at each rating level, including systematic blunders and stylistic biases.
+## Citation
+```bibtex
+@inproceedings{monroe2026chessformer,
+  title={Chessformer: A Unified Architecture for Chess Modeling},
+  author={Daniel Monroe and George Eilender and Philip Chalmers and Zhenwei Tang and Ashton Anderson},
+  booktitle={The Fourteenth International Conference on Learning Representations},
+  year={2026},
+  url={https://openreview.net/forum?id=2ltBRzEHyd}
+}
+```