Recurrent Transformer for Chess Move Prediction (v1)

Recurrent transformer that predicts the best chess move from a board state (FEN). Trained from scratch; uses From/To square prediction heads and a shared transformer block applied 8 times with iteration embeddings (CORnet-s / Universal Transformer style). Output is always a legal move (zero fallbacks).

Part of: INFOMTALC 2025/26 (Utrecht University, MSc Applied Data Science) — chess tournament assignment.


Model description

  • Architecture: Encoder-only transformer with recurrent weight sharing. Board is encoded as 70 tokens (turn, castling×4, 64 squares, en passant) using a spatial tokenizer (14 piece IDs). Separate embeddings per token type; one shared block (multi-head self-attention + feed-forward with GELU) applied 8 times with learned iteration embeddings. Two linear heads score each of the 64 squares as source and destination; each legal move is scored as from_logits[src] + to_logits[dst] and the best is returned.
  • Inspiration: From/To heads from sgrvinod/chess-transformers; recurrent shared block from CORnet-s (Kubilius et al., 2019) and BLT networks (Spoerer et al., 2017).
  • Parameters: ~3.2M (d_model=512, 8 heads, FFN=2048, 8 iterations).
  • Input: FEN string.
  • Output: Single UCI move string (e.g. e2e4). No sampling; deterministic given the position.

Intended use

  • Intended: Chess move prediction for the INFOMTALC tournament and similar setups (player that receives FEN and returns UCI move). Runs on CPU or GPU; fits on free-tier Colab (T4).
  • Not intended: General-purpose chess engine, opening book, or strength comparable to Stockfish. This is a small transformer trained on engine labels, not a full engine.

Training data

  • Source: angeluriot/chess_games (games filtered to ELO ≥ 1500).
  • Labels: Best move per position from local Stockfish (depth 10).
  • Size: 750K (FEN, UCI) pairs in JSONL format.

Training procedure

  • Loss: Cross-entropy on from-square and to-square predictions separately.
  • Optimizer: AdamW with Vaswani-style learning rate schedule (warmup 4000 steps).
  • Batch size: 512.
  • Epochs: 16 with early stopping (patience 2).
  • Hardware: Trained locally (5070Ti). Inference runs on free-tier Colab T4.

How to use

Requires the chess_exam package (for the Player base class and Game). Install it, then use the model via the tournament player class from the assignment repo:

# Install tournament framework
# git clone https://github.com/bylinina/chess_exam.git && cd chess_exam && pip install -e .

from chess_tournament import Game, RandomPlayer
from player import TransformerPlayer  # from the assignment repo that contains model.py + player.py

tp = TransformerPlayer("RecurrentTransformer")  # downloads this model from HF on first use
rp = RandomPlayer("Random")
game = Game(tp, rp, max_half_moves=200)
outcome, scores, fallbacks = game.play()
print(outcome, fallbacks)

Loading only the PyTorch state dict (no player):

import json
import torch
from huggingface_hub import hf_hub_download
from model import RecurrentTransformer  # need model.py from the assignment repo

config_path = hf_hub_download("Izzent/recurrent-transformer-chess", "config.json")
weights_path = hf_hub_download("Izzent/recurrent-transformer-chess", "model.pt")

with open(config_path) as f:
    config = json.load(f)

model = RecurrentTransformer.from_config(config)
state = torch.load(weights_path, map_location="cpu", weights_only=True)
model.load_state_dict(state)
model.eval()

# Forward pass expects a batch dict: board (B,64), turn (B,1), castling (B,4), ep (B,1)
# Use BoardTokenizer.encode(fen) to get these from a FEN string.

Files

  • config.json: Model config (d_model, nhead, d_ff, num_iterations, dropout).
  • model.pt: PyTorch state dict (weights only).

License

All rights reserved.

Downloads last month
219
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Izzent/recurrent-transformer-chess