Chess Transformer (Cross-Temporal Attention)

A 39M-parameter dual-head transformer (policy + value) that plays chess, trained from scratch on ~98M Lichess positions. The novel contribution is cross-temporal attention over 8 successive board states (600 input tokens), allowing the model to reason about how a position evolved, not just its current static state.

Files

best.pt — supervised pretraining checkpoint (~45 % top-1 move accuracy on a 1M-position Lichess test set, ~1100 Elo with 200 MCTS simulations per move)
rl_latest.pt — same model after 20 iterations of AlphaZero-style PPO self-play (note: underperforms the supervised baseline due to compute scale,see project notebook)

Architecture

12-layer pre-norm encoder, d_model=512, 8 heads
75-token board encoding (64 squares + side / castling / en passant / material / king-safety / phase)
8-board temporal window flattened to 600 tokens + CLS
Policy head: Linear(512 -> 1968) with legal-move masking
Value head: Linear(512 -> 256) -> GELU -> Linear(256 -> 1) -> Tanh

Training

Hardware: single NVIDIA RTX 5070 (12 GB, Blackwell sm_120)
Supervised pretraining: 3 epochs over 98M positions, mixed-precision fp16, AdamW, cosine schedule — **5 days of continuous training**
PPO self-play (RL): 20 iterations × 50 games × 200 MCTS sims, GAE + clipped surrogate, Stockfish-shaped reward — ~3 days on top of the supervised checkpoint
Total: ~8 days of GPU time end-to-end on a single consumer card

Usage

See github.com/Angelotxx271/Individual_project_DAI for training/inference code, or try the interactive demo.

Citation

Individual project for Designing AI (2025-26).

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning