Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- chess
|
| 7 |
+
- transformer
|
| 8 |
+
- encoder-only
|
| 9 |
+
- move-prediction
|
| 10 |
+
- pytorch
|
| 11 |
+
datasets:
|
| 12 |
+
- lichess/fishnet-evals
|
| 13 |
+
pipeline_tag: other
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Chess-Bot-20M
|
| 17 |
+
|
| 18 |
+
A 39M parameter encoder-only transformer trained to predict the best chess move from a board position (FEN string).
|
| 19 |
+
|
| 20 |
+
Built for the INFOMTALC 2026 Midterm Chess Tournament at Utrecht University.
|
| 21 |
+
|
| 22 |
+
## Model Details
|
| 23 |
+
|
| 24 |
+
| Property | Value |
|
| 25 |
+
|---|---|
|
| 26 |
+
| Architecture | Encoder-only Transformer |
|
| 27 |
+
| Parameters | 38.9M |
|
| 28 |
+
| Layers | 12 |
|
| 29 |
+
| Hidden dim | 512 |
|
| 30 |
+
| Attention heads | 8 |
|
| 31 |
+
| FFN dim | 2048 |
|
| 32 |
+
| Input | FEN → 80 fixed-length tokens |
|
| 33 |
+
| Output | 1968-class classification (UCI moves) |
|
| 34 |
+
| Checkpoint size | 74 MB |
|
| 35 |
+
|
| 36 |
+
## Training
|
| 37 |
+
|
| 38 |
+
- **Data:** 10M positions from [Lichess Stockfish evaluations](https://huggingface.co/datasets/lichess/fishnet-evals), expanded to 20M with color-flip augmentation
|
| 39 |
+
- **Hardware:** RunPod RTX 6000 Ada (48GB VRAM)
|
| 40 |
+
- **Steps:** 50,000 (~5.2 epochs)
|
| 41 |
+
- **Batch size:** 2048
|
| 42 |
+
- **Optimizer:** AdamW (lr=3e-4, weight_decay=0.01, betas=0.9/0.98)
|
| 43 |
+
- **Schedule:** 2000-step linear warmup + cosine decay
|
| 44 |
+
- **Precision:** bf16 (AMP) with torch.compile
|
| 45 |
+
- **Training time:** ~9.3 hours
|
| 46 |
+
|
| 47 |
+
### Training Progress
|
| 48 |
+
|
| 49 |
+
| Step | Loss | Val Top-1 | Val Top-3 |
|
| 50 |
+
|------|------|-----------|-----------|
|
| 51 |
+
| 1000 | 3.82 | 17.3% | 33.6% |
|
| 52 |
+
| 5000 | 2.37 | 35.7% | 61.2% |
|
| 53 |
+
| 10000 | 1.94 | 42.5% | 70.1% |
|
| 54 |
+
| 50000 | ~1.5 | ~50% | ~78% |
|
| 55 |
+
|
| 56 |
+
## How It Works
|
| 57 |
+
|
| 58 |
+
1. A FEN string is tokenized into 80 tokens (64 board squares + side to move + castling rights + en passant + move counters)
|
| 59 |
+
2. Tokens are passed through 12 transformer encoder blocks with pre-norm residual connections
|
| 60 |
+
3. The CLS token representation is projected to 1968 logits (one per possible UCI move)
|
| 61 |
+
4. At inference, illegal moves are masked to `-inf` before selecting the best legal move
|
| 62 |
+
|
| 63 |
+
### Inference Enhancements
|
| 64 |
+
|
| 65 |
+
The tournament player (`player.py`) adds several layers on top of raw model predictions:
|
| 66 |
+
|
| 67 |
+
- **Opening book** for known mainline positions
|
| 68 |
+
- **Forced mate-in-1 detection** before any model call
|
| 69 |
+
- **Heuristic score adjustments** (check/capture/promotion bonuses)
|
| 70 |
+
- **Blunder detection** (avoids hanging pieces, allowing mate-in-1)
|
| 71 |
+
- **1-ply lookahead** on top-5 candidates (minimizes opponent's best response)
|
| 72 |
+
- **Syzygy tablebase** support for endgames (<=5 pieces)
|
| 73 |
+
|
| 74 |
+
## Tokenizer
|
| 75 |
+
|
| 76 |
+
Custom FEN tokenizer with a 51-token vocabulary:
|
| 77 |
+
|
| 78 |
+
- `[CLS]`, `[SEP]`, `[PAD]` — special tokens
|
| 79 |
+
- 12 piece characters (`P N B R Q K p n b r q k`)
|
| 80 |
+
- `.` for empty squares
|
| 81 |
+
- Side to move (`w`, `b`)
|
| 82 |
+
- Castling flags, file/rank characters, digits
|
| 83 |
+
|
| 84 |
+
Fixed sequence length of 80 tokens per position.
|
| 85 |
+
|
| 86 |
+
## Usage
|
| 87 |
+
|
| 88 |
+
```python
|
| 89 |
+
from player import TransformerPlayer
|
| 90 |
+
|
| 91 |
+
player = TransformerPlayer("Chess-Bot-20M")
|
| 92 |
+
move = player.get_move("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")
|
| 93 |
+
print(move) # e.g. "e2e4"
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
## Repository
|
| 97 |
+
|
| 98 |
+
- **GitHub:** [kruuusher13/Chess-Bot-20M](https://github.com/kruuusher13/Chess-Bot-20M)
|
| 99 |
+
|
| 100 |
+
## Limitations
|
| 101 |
+
|
| 102 |
+
- Trained only on Stockfish evaluations — may not generalize to all play styles
|
| 103 |
+
- No search beyond 1-ply lookahead
|
| 104 |
+
- ~50% top-1 accuracy means it picks the engine's best move about half the time; the rest of the time it picks a reasonable but suboptimal legal move
|