File size: 3,808 Bytes
68722b1 f84c4b8 68722b1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | ---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- chess
- causal-lm
- uci
- decoder-only
- llama-style
datasets:
- malcouffe/lichess-standard-rated-2025-07-uci
- malcouffe/lichess-standard-rated-2025-08-uci
- malcouffe/lichess-standard-rated-2025-09-uci
- malcouffe/lichess-standard-rated-2025-10-uci
- malcouffe/lichess-standard-rated-2025-11-uci
- malcouffe/lichess-standard-rated-2025-12-uci
- malcouffe/lichess-standard-rated-2026-01-uci
pipeline_tag: text-generation
model-index:
- name: ChessGPT
results: []
---
# ChessGPT — 432M
A decoder-only transformer trained to predict the next move in chess games using UCI notation. The model learns purely from move sequences (no board state, no evaluation) via next-token prediction on Lichess games.
## Model details
| | |
|---|---|
| **Architecture** | LLaMA-style decoder-only transformer |
| **Parameters** | 432M |
| **Context length** | 256 tokens |
| **Vocab size** | 4 211 (UCI moves + 3 special tokens) |
| **Training tokens** | 7.87B |
| **License** | Apache 2.0 |
### Architecture
- **d_model** 1 280, **n_layers** 21, **n_heads** 20 (head_dim 64), **d_ff** 3 584
- RMSNorm (pre-norm), Rotary Position Embeddings (RoPE), SwiGLU FFN
- QK-Norm before RoPE (Gemma / DeepSeek-V2 practice)
- No bias in linear layers, weight tying between embedding and output head
- Scaled residual initialization: `std / sqrt(2 * n_layers)`
## Training
### Data
7 monthly snapshots of Lichess standard rated games (July 2025 — January 2026), filtered to **both players >= 1 800 ELO**. Games are converted to space-separated UCI move strings.
Datasets are streamed and interleaved from HuggingFace Hub. **Sequence packing** concatenates games into fixed 256-token sequences to eliminate padding.
### Hyperparameters
| | |
|---|---|
| Optimizer | AdamW (betas 0.9 / 0.95, weight decay 0.1) |
| Learning rate | 3e-4 with cosine decay to 10 % of peak |
| Warmup | 9 300 steps (linear) |
| Batch size | 256 × 256 tokens = 65 536 tokens/step |
| Gradient clipping | 1.0 |
| Precision | BF16 |
| Steps | 120 155 |
## Tokenizer
Custom **UCI tokenizer** that maps every legal UCI move string to a unique integer:
| Range | Description | Count |
|---|---|---|
| 0 | `<PAD>` | 1 |
| 1 | `<BOS>` | 1 |
| 2 | `<EOS>` | 1 |
| 3 — 4 034 | Normal moves (src ≠ dst) | 4 032 |
| 4 035 — 4 210 | Promotion moves (file × direction × piece × color) | 176 |
| **Total** | | **4 211** |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"malcouffe/chessgpt", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"malcouffe/chessgpt", trust_remote_code=True
)
# Encode an opening (Italian Game)
moves = "e2e4 e7e5 g1f3 b8c6 f1c4"
input_ids = tokenizer.encode(moves, return_tensors="pt")
with torch.no_grad():
logits = model(input_ids).logits
# Get top-5 predicted next moves
top5 = logits[0, -1].topk(5)
for score, idx in zip(top5.values, top5.indices):
print(f"{tokenizer.decode([idx.item()]):>8s} {score:.2f}")
```
## Limitations
- It has no access to board state: all chess knowledge is inferred from move sequences.
- No RLHF or self-play refinement — this is a pure next-token prediction model.
- Predictions can include illegal moves; use `python-chess` to filter at inference time. (see the [chessgpt-inference](https://github.com/malcouffe/chessgpt-inference) repo for legal move masking while generating.)
## Citation
```bibtex
@misc{chessgpt2026,
author = {Matthieu Alcouffe},
title = {ChessGPT: A 432M Decoder-Only Transformer for UCI Move Prediction},
year = {2026},
url = {https://huggingface.co/malcouffe/chessgpt}
}
```
|