--- language: - en license: apache-2.0 library_name: transformers tags: - chess - causal-lm - uci - decoder-only - llama-style datasets: - malcouffe/lichess-standard-rated-2025-07-uci - malcouffe/lichess-standard-rated-2025-08-uci - malcouffe/lichess-standard-rated-2025-09-uci - malcouffe/lichess-standard-rated-2025-10-uci - malcouffe/lichess-standard-rated-2025-11-uci - malcouffe/lichess-standard-rated-2025-12-uci - malcouffe/lichess-standard-rated-2026-01-uci pipeline_tag: text-generation model-index: - name: ChessGPT results: [] --- # ChessGPT — 432M A decoder-only transformer trained to predict the next move in chess games using UCI notation. The model learns purely from move sequences (no board state, no evaluation) via next-token prediction on Lichess games. ## Model details | | | |---|---| | **Architecture** | LLaMA-style decoder-only transformer | | **Parameters** | 432M | | **Context length** | 256 tokens | | **Vocab size** | 4 211 (UCI moves + 3 special tokens) | | **Training tokens** | 7.87B | | **License** | Apache 2.0 | ### Architecture - **d_model** 1 280, **n_layers** 21, **n_heads** 20 (head_dim 64), **d_ff** 3 584 - RMSNorm (pre-norm), Rotary Position Embeddings (RoPE), SwiGLU FFN - QK-Norm before RoPE (Gemma / DeepSeek-V2 practice) - No bias in linear layers, weight tying between embedding and output head - Scaled residual initialization: `std / sqrt(2 * n_layers)` ## Training ### Data 7 monthly snapshots of Lichess standard rated games (July 2025 — January 2026), filtered to **both players >= 1 800 ELO**. Games are converted to space-separated UCI move strings. Datasets are streamed and interleaved from HuggingFace Hub. **Sequence packing** concatenates games into fixed 256-token sequences to eliminate padding. ### Hyperparameters | | | |---|---| | Optimizer | AdamW (betas 0.9 / 0.95, weight decay 0.1) | | Learning rate | 3e-4 with cosine decay to 10 % of peak | | Warmup | 9 300 steps (linear) | | Batch size | 256 × 256 tokens = 65 536 tokens/step | | Gradient clipping | 1.0 | | Precision | BF16 | | Steps | 120 155 | ## Tokenizer Custom **UCI tokenizer** that maps every legal UCI move string to a unique integer: | Range | Description | Count | |---|---|---| | 0 | `` | 1 | | 1 | `` | 1 | | 2 | `` | 1 | | 3 — 4 034 | Normal moves (src ≠ dst) | 4 032 | | 4 035 — 4 210 | Promotion moves (file × direction × piece × color) | 176 | | **Total** | | **4 211** | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "malcouffe/chessgpt", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( "malcouffe/chessgpt", trust_remote_code=True ) # Encode an opening (Italian Game) moves = "e2e4 e7e5 g1f3 b8c6 f1c4" input_ids = tokenizer.encode(moves, return_tensors="pt") with torch.no_grad(): logits = model(input_ids).logits # Get top-5 predicted next moves top5 = logits[0, -1].topk(5) for score, idx in zip(top5.values, top5.indices): print(f"{tokenizer.decode([idx.item()]):>8s} {score:.2f}") ``` ## Limitations - It has no access to board state: all chess knowledge is inferred from move sequences. - No RLHF or self-play refinement — this is a pure next-token prediction model. - Predictions can include illegal moves; use `python-chess` to filter at inference time. (see the [chessgpt-inference](https://github.com/malcouffe/chessgpt-inference) repo for legal move masking while generating.) ## Citation ```bibtex @misc{chessgpt2026, author = {Matthieu Alcouffe}, title = {ChessGPT: A 432M Decoder-Only Transformer for UCI Move Prediction}, year = {2026}, url = {https://huggingface.co/malcouffe/chessgpt} } ```