Update README.md

48220d2 verified about 14 hours ago

5.02 kB

language:
  - en
license: mit
tags:
  - chess
  - transformer
  - gqa
  - hybrid
  - reinforcement-learning
  - game
library_name: custom
pipeline_tag: text-generation
datasets:
  - MostLime/chess-elite-uci

LCM — Liquid Chess Model

A 29.2M parameter hybrid transformer trained to play chess, built from scratch. LCM uses a novel combination of GQA attention and LIV convolution blocks from Liquid AI's LFM2 architecture, trained with dual NTP + TOP objectives on ~8 million chess games.

Play against it online here.

Architecture

LCM is a hybrid transformer with two interleaved block types, distributed evenly across 16 layers using a Bresenham algorithm:

6 GQA blocks — Grouped Query Attention (8 query heads, 2 KV heads) with RoPE positional embeddings and SwiGLU FFN
10 LIV blocks — Local Input-dependent Value causal convolution (kernel size 4), efficient for local sequential patterns
LRM — Learnable Rate Multipliers on every block, stabilizing training dynamics
Weight tying — Embedding and NTP head share weights

Layer pattern: GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA

Parameter	Value
Parameters	29.2M
d_model	512
Layers	16 (6 GQA + 10 LIV)
Attention heads	8Q / 2KV
Context length	255 tokens
Vocab size	1,977

Training

LCM was trained on a combined dataset of ~7.9M chess games:

Chess Elite UCI — 7.8M games with an average Lichess-rated elo of 2600 per player
~100k additional OTB & outside of Lichess games from private sources

Tokenization: Each game is encoded as a sequence of UCI move strings (e2e4, g1f3, etc.), prepended with a POV token (<W> or <B>) indicating the side to predict for.

Training objectives:

NTP (Next Token Prediction, weight=0.30): Predicts the next move given the sequence so far, applied only to the winning side's moves to avoid teaching losing play.
TOP (Token Order Prediction, weight=0.70): Predicts the relative order of upcoming tokens in a future window, introduced in Zuhri et al., 2026 and provides richer training indicator compared to NTP alone.

Optimizer: Muon (2D params) + AdamW (1D params) + AdamW with LRM-specific weight decay

Limitations & Future Work

LCM represents an initial exploration of LFM2-style hybrid architectures for chess as well as TOP to teach the model how to predict future moves. Known limitations:

Tactical blindness — misses simple immediate threats and captures in some positions. Hypothesized cause: elite training data (~2600 Elo) rarely contains hanging pieces or one-move tactics, so the model never learned to detect them.
Implicit board state — the model reconstructs position purely from move history rather than an explicit board representation, making it impossible to use for puzzles or any other contexts that don't have full game context.
No search — LCM selects moves in a single forward pass with no tree search or lookahead.

Planned v2 improvements:

Replace LIV blocks against pure GQA to test whether the hybrid architecture helps or hurts
Pretrain on Leela Chess Zero (lc0) self-play data for cleaner, stronger training signal
Explore explicit board state input (FEN or bitboard tokens) alongside move history
More in-depth ablations on hyperparameters like conv_kernel_size

Quick Start

git clone https://huggingface.co/MostLime/lcm-chess
cd lcm-chess
pip install -r requirements.txt
python generate.py

Play as black:

python generate.py --side black

Custom checkpoint:

python generate.py --checkpoint model.safetensors --temperature 0.8

Requirements: Python 3.10+, PyTorch 2.0+, chess, safetensors

Files

File	Description
`model.safetensors`	Model weights
`vocab.json`	UCI move vocabulary (1,977 tokens)
`config.py`	Architecture hyperparameters
`model.py`	Model implementation
`generate.py`	Interactive terminal chess interface
`requirements.txt`	Python dependencies

References

LFM2 / LIV blocks: Liquid Foundation Models — Liquid AI, 2024
TOP objective: Predicting the Order of Upcoming Tokens Improves Language Modeling — Zuhri et al., 2026
LRM: Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers — Velikanov et al., 2026
Muon optimizer: Muon: An optimizer for the hidden layers of neural networks — Jordan, 2024
Training data: chess-elite-uci

Author

Built by MostLime