danielgmonroe commited on
Commit
bc51501
·
verified ·
1 Parent(s): 6975c77

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - chess
6
+ - maia
7
+ - maia3
8
+ - chessformer
9
+ - move-prediction
10
+ - human-ai
11
+ - interpretability
12
+ ---
13
+
14
+ # Maia3-23M
15
+
16
+ Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **23M-parameter variant**.
17
+
18
+ For the full model card — architecture details, training recipe, full evaluation, and ablations — see the paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026) and the [Maia3 collection](https://huggingface.co/collections/UofTCSSLab/maia3).
19
+
20
+ ## Model summary
21
+
22
+ - **Family:** Maia3 — human move prediction models built on the **Chessformer** architecture
23
+ - **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)** — a dynamic positional encoding that adapts to the geometry of chess — and an attention-based source-destination policy head
24
+ - **Parameters:** 23M
25
+ - **Task:** predicting the move a human player of a given skill level would make from a given position
26
+ - **Training data:** Lichess human games, January 2023 – July 2025
27
+ - **License:** CC BY 4.0 (paper); see repo for code/weights license
28
+
29
+ ## Intended use
30
+
31
+ Maia3 models predict human chess moves conditioned on player rating. Typical uses include:
32
+
33
+ - Research on human chess modeling and human–AI alignment
34
+ - Move-suggestion and analysis tools that emulate play at a chosen rating
35
+ - Mechanistic interpretability research — the square-token design makes attention patterns and activations directly attributable to board squares, and the repo ships a cross-layer transcoder (`transcoder.py`) for studying internal features
36
+
37
+ Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
38
+
39
+ ## How to use
40
+
41
+ Maia3-23M is a PyTorch checkpoint trained with the code at [CSSLab/maia3](https://github.com/CSSLab/maia3). Clone that repo, set up the conda environment, and load the checkpoint following the instructions in its README.
42
+
43
+ Architecture hyperparameters for this variant are defined in `ablate_size.sh` in the training repo.
44
+
45
+ ## Training
46
+
47
+ - **Data:** Lichess monthly game dumps, January 2023 – July 2025
48
+ - **Validation:** Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
49
+ - **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
50
+ - **Config:** size ablation row corresponding to 23M parameters in `ablate_size.sh`
51
+
52
+ ## Evaluation
53
+
54
+ The Maia3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
55
+
56
+ ## Limitations
57
+
58
+ - Trained on Lichess games only; play styles on other platforms or over-the-board may differ.
59
+ - Predicts moves conditioned on rating; does not produce engine-strength play. For maximum strength, use Chessformer integrated into Leela Chess Zero (see the paper).
60
+ - Predictions reflect patterns in human play at each rating level, including systematic blunders and stylistic biases.
61
+
62
+ ## Citation
63
+
64
+ ```bibtex
65
+ @inproceedings{monroe2026chessformer,
66
+ title={Chessformer: A Unified Architecture for Chess Modeling},
67
+ author={Daniel Monroe and George Eilender and Philip Chalmers and Zhenwei Tang and Ashton Anderson},
68
+ booktitle={The Fourteenth International Conference on Learning Representations},
69
+ year={2026},
70
+ url={https://openreview.net/forum?id=2ltBRzEHyd}
71
+ }
72
+ ```