ashtonanderson commited on
Commit
a107d6c
·
verified ·
1 Parent(s): 606f189

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -15,24 +15,25 @@ tags:
15
 
16
  Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **79M-parameter variant**.
17
 
18
- For full details architecture details, training recipe, full evaluation, and ablations see our paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026).
19
 
20
  ## Model summary
21
 
22
- - **Family:** Maia3 human move prediction models built on the **Chessformer** architecture
23
- - **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)** a dynamic positional encoding that adapts to the geometry of chess and an attention-based source-destination policy head
24
  - **Parameters:** 79M
25
  - **Task:** predicting the move a human player of a given skill level would make from a given position
26
  - **Training data:** Lichess human games, January 2023 – July 2025
27
- - **License:** CC BY 4.0 (paper); see repo for code/weights license
28
 
29
  ## Intended use
30
 
31
- Maia3 models predict human chess moves conditioned on player rating. Typical uses include:
32
 
33
  - Research on human chess modeling and human–AI alignment
 
34
  - Move-suggestion and analysis tools that emulate play at a chosen rating
35
- - Mechanistic interpretability research the square-token design makes attention patterns and activations directly attributable to board squares, and the repo ships a cross-layer transcoder (`transcoder.py`) for studying internal features
36
 
37
  Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
38
 
@@ -45,13 +46,12 @@ Architecture hyperparameters for this variant are defined in `ablate_size.sh` in
45
  ## Training
46
 
47
  - **Data:** Lichess monthly game dumps, January 2023 – July 2025
48
- - **Validation:** Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
49
  - **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
50
  - **Config:** size ablation row corresponding to 79M parameters in `ablate_size.sh`
51
 
52
  ## Evaluation
53
 
54
- The Maia3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
55
 
56
  ## Citation
57
 
 
15
 
16
  Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **79M-parameter variant**.
17
 
18
+ For full details (architecture details, training recipe, full evaluation, and ablations) see our paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026).
19
 
20
  ## Model summary
21
 
22
+ - **Family:** Maia-3, human move prediction models built on the **Chessformer** architecture
23
+ - **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)**, a dynamic positional encoding that adapts to the geometry of chess, and an attention-based source-destination policy head
24
  - **Parameters:** 79M
25
  - **Task:** predicting the move a human player of a given skill level would make from a given position
26
  - **Training data:** Lichess human games, January 2023 – July 2025
27
+ - **License:** AGPLv3
28
 
29
  ## Intended use
30
 
31
+ Maia-3 models predict human chess moves conditioned on player rating. Typical uses include:
32
 
33
  - Research on human chess modeling and human–AI alignment
34
+ - Tools for chess education and entertainment
35
  - Move-suggestion and analysis tools that emulate play at a chosen rating
36
+ - Mechanistic interpretability research: the square-token design makes attention patterns and activations directly attributable to board squares
37
 
38
  Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
39
 
 
46
  ## Training
47
 
48
  - **Data:** Lichess monthly game dumps, January 2023 – July 2025
 
49
  - **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
50
  - **Config:** size ablation row corresponding to 79M parameters in `ablate_size.sh`
51
 
52
  ## Evaluation
53
 
54
+ The Maia-3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
55
 
56
  ## Citation
57