Update README.md
Browse files
README.md
CHANGED
|
@@ -15,24 +15,25 @@ tags:
|
|
| 15 |
|
| 16 |
Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **79M-parameter variant**.
|
| 17 |
|
| 18 |
-
For full details
|
| 19 |
|
| 20 |
## Model summary
|
| 21 |
|
| 22 |
-
- **Family:**
|
| 23 |
-
- **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)**
|
| 24 |
- **Parameters:** 79M
|
| 25 |
- **Task:** predicting the move a human player of a given skill level would make from a given position
|
| 26 |
- **Training data:** Lichess human games, January 2023 – July 2025
|
| 27 |
-
- **License:**
|
| 28 |
|
| 29 |
## Intended use
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
- Research on human chess modeling and human–AI alignment
|
|
|
|
| 34 |
- Move-suggestion and analysis tools that emulate play at a chosen rating
|
| 35 |
-
- Mechanistic interpretability research
|
| 36 |
|
| 37 |
Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
|
| 38 |
|
|
@@ -45,13 +46,12 @@ Architecture hyperparameters for this variant are defined in `ablate_size.sh` in
|
|
| 45 |
## Training
|
| 46 |
|
| 47 |
- **Data:** Lichess monthly game dumps, January 2023 – July 2025
|
| 48 |
-
- **Validation:** Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
|
| 49 |
- **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
|
| 50 |
- **Config:** size ablation row corresponding to 79M parameters in `ablate_size.sh`
|
| 51 |
|
| 52 |
## Evaluation
|
| 53 |
|
| 54 |
-
The
|
| 55 |
|
| 56 |
## Citation
|
| 57 |
|
|
|
|
| 15 |
|
| 16 |
Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **79M-parameter variant**.
|
| 17 |
|
| 18 |
+
For full details (architecture details, training recipe, full evaluation, and ablations) see our paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026).
|
| 19 |
|
| 20 |
## Model summary
|
| 21 |
|
| 22 |
+
- **Family:** Maia-3, human move prediction models built on the **Chessformer** architecture
|
| 23 |
+
- **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)**, a dynamic positional encoding that adapts to the geometry of chess, and an attention-based source-destination policy head
|
| 24 |
- **Parameters:** 79M
|
| 25 |
- **Task:** predicting the move a human player of a given skill level would make from a given position
|
| 26 |
- **Training data:** Lichess human games, January 2023 – July 2025
|
| 27 |
+
- **License:** AGPLv3
|
| 28 |
|
| 29 |
## Intended use
|
| 30 |
|
| 31 |
+
Maia-3 models predict human chess moves conditioned on player rating. Typical uses include:
|
| 32 |
|
| 33 |
- Research on human chess modeling and human–AI alignment
|
| 34 |
+
- Tools for chess education and entertainment
|
| 35 |
- Move-suggestion and analysis tools that emulate play at a chosen rating
|
| 36 |
+
- Mechanistic interpretability research: the square-token design makes attention patterns and activations directly attributable to board squares
|
| 37 |
|
| 38 |
Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
|
| 39 |
|
|
|
|
| 46 |
## Training
|
| 47 |
|
| 48 |
- **Data:** Lichess monthly game dumps, January 2023 – July 2025
|
|
|
|
| 49 |
- **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
|
| 50 |
- **Config:** size ablation row corresponding to 79M parameters in `ablate_size.sh`
|
| 51 |
|
| 52 |
## Evaluation
|
| 53 |
|
| 54 |
+
The Maia-3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
|
| 55 |
|
| 56 |
## Citation
|
| 57 |
|