UofTCSSLab
/

Maia3-ablate-3M

move-prediction

interpretability

Model card Files Files and versions

Maia3-ablate-3M / README.md

danielgmonroe's picture

Create README.md

990dfd7 verified 1 day ago

|

history blame contribute delete

3.15 kB

	---
	language:
	- en
	tags:
	- chess
	- maia
	- maia3
	- chessformer
	- move-prediction
	- human-ai
	- interpretability
	---

	# Maia3-3M

	Part of the [Maia3](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the 3M-parameter variant.

	For full details — architecture details, training recipe, full evaluation, and ablations — see our paper [Chessformer: A Unified Architecture for Chess Modeling](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026).

	## Model summary

	- Family: Maia3 — human move prediction models built on the Chessformer architecture
	- Architecture: encoder-only transformer with board squares as tokens, augmented by Geometric Attention Bias (GAB) — a dynamic positional encoding that adapts to the geometry of chess — and an attention-based source-destination policy head
	- Parameters: 3M
	- Task: predicting the move a human player of a given skill level would make from a given position
	- Training data: Lichess human games, January 2023 – July 2025
	- License: CC BY 4.0 (paper); see repo for code/weights license

	## Intended use

	Maia3 models predict human chess moves conditioned on player rating. Typical uses include:

	- Research on human chess modeling and human–AI alignment
	- Move-suggestion and analysis tools that emulate play at a chosen rating
	- Mechanistic interpretability research — the square-token design makes attention patterns and activations directly attributable to board squares, and the repo ships a cross-layer transcoder (`transcoder.py`) for studying internal features

	Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.

	## How to use

	Maia3-3M is a PyTorch checkpoint trained with the code at [CSSLab/maia3](https://github.com/CSSLab/maia3). Clone that repo, set up the conda environment, and load the checkpoint following the instructions in its README.

	Architecture hyperparameters for this variant are defined in `ablate_size.sh` in the training repo.

	## Training

	- Data: Lichess monthly game dumps, January 2023 – July 2025
	- Validation: Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
	- Code: [CSSLab/maia3](https://github.com/CSSLab/maia3)
	- Config: size ablation row corresponding to 3M parameters in `ablate_size.sh`

	## Evaluation

	The Maia3 family reaches 57.1% move-matching accuracy on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.

	## Citation

	```bibtex
	@inproceedings{monroe2026chessformer,
	title={Chessformer: A Unified Architecture for Chess Modeling},
	author={Daniel Monroe and George Eilender and Philip Chalmers and Zhenwei Tang and Ashton Anderson},
	booktitle={The Fourteenth International Conference on Learning Representations},
	year={2026},
	url={https://openreview.net/forum?id=2ltBRzEHyd}
	}
	```