MagnusBot — Custom Transformer Chess Move Predictor

A custom encoder-decoder Transformer trained from scratch to predict strong chess moves given a board position in FEN notation, using a dataset of 100K–400K chess positions including games and tactical puzzles.

Model Details

Model Description

MagnusBot is a custom sequence-to-sequence Transformer trained end-to-end for chess move prediction. Given a board state in FEN notation, it outputs the predicted best move in UCI format. The model was trained in two phases: a base training phase (25 epochs) followed by a fine-tuning phase (4 epochs) focused on tactical positions, including checkmate threats and winning combinations.

Developed by: Jochem van Kemenade
Model type: Custom Encoder-Decoder Transformer (trained from scratch)
Domain: Chess notation (FEN input → UCI move output)
License: Apache 2.0
Architecture: Custom Transformer with ChessTokenizer (chess-specific vocabulary)

Uses

Direct Use

Given a chess board state in FEN notation, the model predicts the next best move in UCI format. It is designed for use as a chess engine component or tournament player.

Out-of-Scope Use

This model is not intended for general natural language tasks. It has been specialized for chess move prediction and will perform poorly outside that domain.

Training Details

Training Data

Training data was sourced from three datasets, combined and deduplicated:

chess_moves.csv — local dataset of chess positions (primary source)
train_data.csv — local dataset of additional chess positions
chess_moves_1st.csv — local dataset of first-move positions
ssingh22/chess-evaluations (tactics split, HuggingFace) — tactical puzzles filtered to positions with engine evaluations between –2000 and +2000 centipawns, balanced between white-favourable and black-favourable positions

Total training data: ~4M examples

Each training example is formatted as a tokenized FEN string (source) mapped to a UCI move (target).

Fine-Tuning Data

The fine-tuning phase uses a smaller curated subset focused winning games under or at max 200 moves:

1% replay of the base training data to mitigate catastrophic forgetting
50% sample of the local CSV data

90/10 train/validation split for both phases.

Training Procedure

Training is split into two phases:

Phase 1 — Base Training

Trained from scratch on the full combined dataset
25 epochs, Adam optimizer
Mixed precision training (AMP fp16 via torch.amp)
Batch size and learning rate sourced from Optuna-tuned config (opt-configs.yml)

Phase 2 — Fine-Tuning on Tactical Positions

Initialized from Phase 1 weights
4 epochs, learning rate reduced to 10% of base LR
Gradient accumulation over 4 steps (effective batch size ×4)
Mixed precision training (AMP fp16)

Hardware

Hardware: NVIDIA GeForce RTX 4070 Super (12GB VRAM)
Training time: ~10 hours

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jochemvkem/magnusbot

Quantizations

1 model