UofTCSSLab
/

Maia3-79M

@@ -15,24 +15,25 @@ tags:
 Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **79M-parameter variant**.
-For full details — architecture details, training recipe, full evaluation, and ablations — see our paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026).
 ## Model summary
-- **Family:** Maia3 — human move prediction models built on the **Chessformer** architecture
-- **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)** — a dynamic positional encoding that adapts to the geometry of chess — and an attention-based source-destination policy head
 - **Parameters:** 79M
 - **Task:** predicting the move a human player of a given skill level would make from a given position
 - **Training data:** Lichess human games, January 2023 – July 2025
-- **License:** CC BY 4.0 (paper); see repo for code/weights license
 ## Intended use
-Maia3 models predict human chess moves conditioned on player rating. Typical uses include:
 - Research on human chess modeling and human–AI alignment
 - Move-suggestion and analysis tools that emulate play at a chosen rating
-- Mechanistic interpretability research — the square-token design makes attention patterns and activations directly attributable to board squares, and the repo ships a cross-layer transcoder (`transcoder.py`) for studying internal features
 Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
@@ -45,13 +46,12 @@ Architecture hyperparameters for this variant are defined in `ablate_size.sh` in
 ## Training
 - **Data:** Lichess monthly game dumps, January 2023 – July 2025
-- **Validation:** Allie-style annotated test set (`data/evals/2022-test-annotated.jsonl`)
 - **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
 - **Config:** size ablation row corresponding to 79M parameters in `ablate_size.sh`
 ## Evaluation
-The Maia3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
 ## Citation

 Part of the [**Maia3**](https://huggingface.co/collections/UofTCSSLab/maia3) family of transformer models for human chess move prediction. This is the **79M-parameter variant**.
+For full details (architecture details, training recipe, full evaluation, and ablations) see our paper [*Chessformer: A Unified Architecture for Chess Modeling*](https://openreview.net/forum?id=2ltBRzEHyd) (ICLR 2026).
 ## Model summary
+- **Family:** Maia-3, human move prediction models built on the **Chessformer** architecture
+- **Architecture:** encoder-only transformer with board squares as tokens, augmented by **Geometric Attention Bias (GAB)**, a dynamic positional encoding that adapts to the geometry of chess, and an attention-based source-destination policy head
 - **Parameters:** 79M
 - **Task:** predicting the move a human player of a given skill level would make from a given position
 - **Training data:** Lichess human games, January 2023 – July 2025
+- **License:** AGPLv3
 ## Intended use
+Maia-3 models predict human chess moves conditioned on player rating. Typical uses include:
 - Research on human chess modeling and human–AI alignment
+- Tools for chess education and entertainment
 - Move-suggestion and analysis tools that emulate play at a chosen rating
+- Mechanistic interpretability research: the square-token design makes attention patterns and activations directly attributable to board squares
 Not intended for maximum playing strength. For strong engine play built on the same architecture, see the Chessformer integration into Leela Chess Zero described in the paper.
 ## Training
 - **Data:** Lichess monthly game dumps, January 2023 – July 2025
 - **Code:** [CSSLab/maia3](https://github.com/CSSLab/maia3)
 - **Config:** size ablation row corresponding to 79M parameters in `ablate_size.sh`
 ## Evaluation
+The Maia-3 family reaches **57.1% move-matching accuracy** on human moves, significantly surpassing the previous state of the art with fewer than a quarter of the parameters. Per-size accuracy curves, scaling analysis, and skill-conditioned breakdowns are reported in the paper.
 ## Citation