kruuusher13
/

MicroChess-20M

+---
+license: mit
+language:
+  - en
+tags:
+  - chess
+  - transformer
+  - encoder-only
+  - move-prediction
+  - pytorch
+datasets:
+  - lichess/fishnet-evals
+pipeline_tag: other
+---
+# Chess-Bot-20M
+A 39M parameter encoder-only transformer trained to predict the best chess move from a board position (FEN string).
+Built for the INFOMTALC 2026 Midterm Chess Tournament at Utrecht University.
+## Model Details
+| Property | Value |
+|---|---|
+| Architecture | Encoder-only Transformer |
+| Parameters | 38.9M |
+| Layers | 12 |
+| Hidden dim | 512 |
+| Attention heads | 8 |
+| FFN dim | 2048 |
+| Input | FEN → 80 fixed-length tokens |
+| Output | 1968-class classification (UCI moves) |
+| Checkpoint size | 74 MB |
+## Training
+- **Data:** 10M positions from [Lichess Stockfish evaluations](https://huggingface.co/datasets/lichess/fishnet-evals), expanded to 20M with color-flip augmentation
+- **Hardware:** RunPod RTX 6000 Ada (48GB VRAM)
+- **Steps:** 50,000 (~5.2 epochs)
+- **Batch size:** 2048
+- **Optimizer:** AdamW (lr=3e-4, weight_decay=0.01, betas=0.9/0.98)
+- **Schedule:** 2000-step linear warmup + cosine decay
+- **Precision:** bf16 (AMP) with torch.compile
+- **Training time:** ~9.3 hours
+### Training Progress
+| Step | Loss | Val Top-1 | Val Top-3 |
+|------|------|-----------|-----------|
+| 1000 | 3.82 | 17.3% | 33.6% |
+| 5000 | 2.37 | 35.7% | 61.2% |
+| 10000 | 1.94 | 42.5% | 70.1% |
+| 50000 | ~1.5 | ~50% | ~78% |
+## How It Works
+1. A FEN string is tokenized into 80 tokens (64 board squares + side to move + castling rights + en passant + move counters)
+2. Tokens are passed through 12 transformer encoder blocks with pre-norm residual connections
+3. The CLS token representation is projected to 1968 logits (one per possible UCI move)
+4. At inference, illegal moves are masked to `-inf` before selecting the best legal move
+### Inference Enhancements
+The tournament player (`player.py`) adds several layers on top of raw model predictions:
+- **Opening book** for known mainline positions
+- **Forced mate-in-1 detection** before any model call
+- **Heuristic score adjustments** (check/capture/promotion bonuses)
+- **Blunder detection** (avoids hanging pieces, allowing mate-in-1)
+- **1-ply lookahead** on top-5 candidates (minimizes opponent's best response)
+- **Syzygy tablebase** support for endgames (<=5 pieces)
+## Tokenizer
+Custom FEN tokenizer with a 51-token vocabulary:
+- `[CLS]`, `[SEP]`, `[PAD]` — special tokens
+- 12 piece characters (`P N B R Q K p n b r q k`)
+- `.` for empty squares
+- Side to move (`w`, `b`)
+- Castling flags, file/rank characters, digits
+Fixed sequence length of 80 tokens per position.
+## Usage
+```python
+from player import TransformerPlayer
+player = TransformerPlayer("Chess-Bot-20M")
+move = player.get_move("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")
+print(move)  # e.g. "e2e4"
+```
+## Repository
+- **GitHub:** [kruuusher13/Chess-Bot-20M](https://github.com/kruuusher13/Chess-Bot-20M)
+## Limitations
+- Trained only on Stockfish evaluations — may not generalize to all play styles
+- No search beyond 1-ply lookahead
+- ~50% top-1 accuracy means it picks the engine's best move about half the time; the rest of the time it picks a reasonable but suboptimal legal move