MagnusBot β Custom Transformer Chess Move Predictor
A custom encoder-decoder Transformer trained from scratch to predict strong chess moves given a board position in FEN notation, using a dataset of 100Kβ400K chess positions including games and tactical puzzles.
Model Details
Model Description
MagnusBot is a custom sequence-to-sequence Transformer trained end-to-end for chess move prediction. Given a board state in FEN notation, it outputs the predicted best move in UCI format. The model was trained in two phases: a base training phase (25 epochs) followed by a fine-tuning phase (4 epochs) focused on tactical positions, including checkmate threats and winning combinations.
- Developed by: Jochem van Kemenade
- Model type: Custom Encoder-Decoder Transformer (trained from scratch)
- Domain: Chess notation (FEN input β UCI move output)
- License: Apache 2.0
- Architecture: Custom
TransformerwithChessTokenizer(chess-specific vocabulary)
Uses
Direct Use
Given a chess board state in FEN notation, the model predicts the next best move in UCI format. It is designed for use as a chess engine component or tournament player.
Out-of-Scope Use
This model is not intended for general natural language tasks. It has been specialized for chess move prediction and will perform poorly outside that domain.
Training Details
Training Data
Training data was sourced from three datasets, combined and deduplicated:
chess_moves.csvβ local dataset of chess positions (primary source)train_data.csvβ local dataset of additional chess positionschess_moves_1st.csvβ local dataset of first-move positionsssingh22/chess-evaluations(tactics split, HuggingFace) β tactical puzzles filtered to positions with engine evaluations between β2000 and +2000 centipawns, balanced between white-favourable and black-favourable positions
Total training data: ~4M examples
Each training example is formatted as a tokenized FEN string (source) mapped to a UCI move (target).
Fine-Tuning Data
The fine-tuning phase uses a smaller curated subset focused winning games under or at max 200 moves:
-
- 1% replay of the base training data to mitigate catastrophic forgetting
- 50% sample of the local CSV data
90/10 train/validation split for both phases.
Training Procedure
Training is split into two phases:
Phase 1 β Base Training
- Trained from scratch on the full combined dataset
- 25 epochs, Adam optimizer
- Mixed precision training (AMP fp16 via
torch.amp) - Batch size and learning rate sourced from Optuna-tuned config (
opt-configs.yml)
Phase 2 β Fine-Tuning on Tactical Positions
- Initialized from Phase 1 weights
- 4 epochs, learning rate reduced to 10% of base LR
- Gradient accumulation over 4 steps (effective batch size Γ4)
- Mixed precision training (AMP fp16)
Hardware
- Hardware: NVIDIA GeForce RTX 4070 Super (12GB VRAM)
- Training time: ~10 hours