MagnusBot β€” Custom Transformer Chess Move Predictor

A custom encoder-decoder Transformer trained from scratch to predict strong chess moves given a board position in FEN notation, using a dataset of 100K–400K chess positions including games and tactical puzzles.

Model Details

Model Description

MagnusBot is a custom sequence-to-sequence Transformer trained end-to-end for chess move prediction. Given a board state in FEN notation, it outputs the predicted best move in UCI format. The model was trained in two phases: a base training phase (25 epochs) followed by a fine-tuning phase (4 epochs) focused on tactical positions, including checkmate threats and winning combinations.

  • Developed by: Jochem van Kemenade
  • Model type: Custom Encoder-Decoder Transformer (trained from scratch)
  • Domain: Chess notation (FEN input β†’ UCI move output)
  • License: Apache 2.0
  • Architecture: Custom Transformer with ChessTokenizer (chess-specific vocabulary)

Uses

Direct Use

Given a chess board state in FEN notation, the model predicts the next best move in UCI format. It is designed for use as a chess engine component or tournament player.

Out-of-Scope Use

This model is not intended for general natural language tasks. It has been specialized for chess move prediction and will perform poorly outside that domain.

Training Details

Training Data

Training data was sourced from three datasets, combined and deduplicated:

  • chess_moves.csv β€” local dataset of chess positions (primary source)
  • train_data.csv β€” local dataset of additional chess positions
  • chess_moves_1st.csv β€” local dataset of first-move positions
  • ssingh22/chess-evaluations (tactics split, HuggingFace) β€” tactical puzzles filtered to positions with engine evaluations between –2000 and +2000 centipawns, balanced between white-favourable and black-favourable positions

Total training data: ~4M examples

Each training example is formatted as a tokenized FEN string (source) mapped to a UCI move (target).

Fine-Tuning Data

The fine-tuning phase uses a smaller curated subset focused winning games under or at max 200 moves:

-

  • 1% replay of the base training data to mitigate catastrophic forgetting
  • 50% sample of the local CSV data

90/10 train/validation split for both phases.

Training Procedure

Training is split into two phases:

Phase 1 β€” Base Training

  • Trained from scratch on the full combined dataset
  • 25 epochs, Adam optimizer
  • Mixed precision training (AMP fp16 via torch.amp)
  • Batch size and learning rate sourced from Optuna-tuned config (opt-configs.yml)

Phase 2 β€” Fine-Tuning on Tactical Positions

  • Initialized from Phase 1 weights
  • 4 epochs, learning rate reduced to 10% of base LR
  • Gradient accumulation over 4 steps (effective batch size Γ—4)
  • Mixed precision training (AMP fp16)

Hardware

  • Hardware: NVIDIA GeForce RTX 4070 Super (12GB VRAM)
  • Training time: ~10 hours
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Jochemvkem/magnusbot

Quantizations
1 model