File size: 3,150 Bytes
990dfd7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ---
language:
- en
tags:
- chess
- maia
- maia3
- chessformer
- move-prediction
- human-ai
- interpretability
---
# Maia3-3M
Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **3M-parameter variant**.
For full details — architecture details, training recipe, full evaluation, and ablations — see our paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026).
## Model summary
- **Family:** Maia3 — human move prediction models built on the **Chessformer** architecture
- **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)** — a dynamic positional encoding that adapts to the geometry of chess — and an attention-based source-destination policy head
- **Parameters:** 3M
- **Task:** predicting the move a human player of a given skill level would make from a given position
- **Training data:** Lichess human games, January 2023 – July 2025
- **License:** CC BY 4.0 (paper); see repo for code/weights license
## Intended use
Maia3 models predict human chess moves conditioned on player rating. Typical uses include:
- Research on human chess modeling and human–AI alignment
- Move-suggestion and analysis tools that emulate play at a chosen rating
- Mechanistic interpretability research — the square-token design makes attention patterns and activations directly attributable to board squares, and the repo ships a cross-layer transcoder (`transcoder.py`) for studying internal features
Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
## How to use
Maia3-3M is a PyTorch checkpoint trained with the code at [CSSLab/maia3](https://github.com/CSSLab/maia3). Clone that repo, set up the conda environment, and load the checkpoint following the instructions in its README.
Architecture hyperparameters for this variant are defined in `ablate_size.sh` in the training repo.
## Training
- **Data:** Lichess monthly game dumps, January 2023 – July 2025
- **Validation:** Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
- **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
- **Config:** size ablation row corresponding to 3M parameters in `ablate_size.sh`
## Evaluation
The Maia3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
## Citation
```bibtex
@inproceedings{monroe2026chessformer,
title={Chessformer: A Unified Architecture for Chess Modeling},
author={Daniel Monroe and George Eilender and Philip Chalmers and Zhenwei Tang and Ashton Anderson},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=2ltBRzEHyd}
}
``` |