lcm-chess / README.md
MostLime's picture
Update README.md
48220d2 verified
metadata
language:
  - en
license: mit
tags:
  - chess
  - transformer
  - gqa
  - hybrid
  - reinforcement-learning
  - game
library_name: custom
pipeline_tag: text-generation
datasets:
  - MostLime/chess-elite-uci

LCM β€” Liquid Chess Model

A 29.2M parameter hybrid transformer trained to play chess, built from scratch. LCM uses a novel combination of GQA attention and LIV convolution blocks from Liquid AI's LFM2 architecture, trained with dual NTP + TOP objectives on ~8 million chess games.

Play against it online here.


Architecture

LCM is a hybrid transformer with two interleaved block types, distributed evenly across 16 layers using a Bresenham algorithm:

  • 6 GQA blocks β€” Grouped Query Attention (8 query heads, 2 KV heads) with RoPE positional embeddings and SwiGLU FFN
  • 10 LIV blocks β€” Local Input-dependent Value causal convolution (kernel size 4), efficient for local sequential patterns
  • LRM β€” Learnable Rate Multipliers on every block, stabilizing training dynamics
  • Weight tying β€” Embedding and NTP head share weights

Layer pattern: GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA

Parameter Value
Parameters 29.2M
d_model 512
Layers 16 (6 GQA + 10 LIV)
Attention heads 8Q / 2KV
Context length 255 tokens
Vocab size 1,977

Training

LCM was trained on a combined dataset of ~7.9M chess games:

  • Chess Elite UCI β€” 7.8M games with an average Lichess-rated elo of 2600 per player
  • ~100k additional OTB & outside of Lichess games from private sources

Tokenization: Each game is encoded as a sequence of UCI move strings (e2e4, g1f3, etc.), prepended with a POV token (<W> or <B>) indicating the side to predict for.

Training objectives:

  • NTP (Next Token Prediction, weight=0.30): Predicts the next move given the sequence so far, applied only to the winning side's moves to avoid teaching losing play.
  • TOP (Token Order Prediction, weight=0.70): Predicts the relative order of upcoming tokens in a future window, introduced in Zuhri et al., 2026 and provides richer training indicator compared to NTP alone.

Optimizer: Muon (2D params) + AdamW (1D params) + AdamW with LRM-specific weight decay


Limitations & Future Work

LCM represents an initial exploration of LFM2-style hybrid architectures for chess as well as TOP to teach the model how to predict future moves. Known limitations:

  • Tactical blindness β€” misses simple immediate threats and captures in some positions. Hypothesized cause: elite training data (~2600 Elo) rarely contains hanging pieces or one-move tactics, so the model never learned to detect them.
  • Implicit board state β€” the model reconstructs position purely from move history rather than an explicit board representation, making it impossible to use for puzzles or any other contexts that don't have full game context.
  • No search β€” LCM selects moves in a single forward pass with no tree search or lookahead.

Planned v2 improvements:

  • Replace LIV blocks against pure GQA to test whether the hybrid architecture helps or hurts
  • Pretrain on Leela Chess Zero (lc0) self-play data for cleaner, stronger training signal
  • Explore explicit board state input (FEN or bitboard tokens) alongside move history
  • More in-depth ablations on hyperparameters like conv_kernel_size

Quick Start

git clone https://huggingface.co/MostLime/lcm-chess
cd lcm-chess
pip install -r requirements.txt
python generate.py

Play as black:

python generate.py --side black

Custom checkpoint:

python generate.py --checkpoint model.safetensors --temperature 0.8

Requirements: Python 3.10+, PyTorch 2.0+, chess, safetensors


Files

File Description
model.safetensors Model weights
vocab.json UCI move vocabulary (1,977 tokens)
config.py Architecture hyperparameters
model.py Model implementation
generate.py Interactive terminal chess interface
requirements.txt Python dependencies

References


Author

Built by MostLime