MostLime
/

lcm-chess

+---
+language:
+- en
+license: mit
+tags:
+- chess
+- transformer
+- gqa
+- hybrid
+- reinforcement-learning
+- game
+library_name: custom
+pipeline_tag: text-generation
+datasets:
+- MostLime/chess-elite-uci
+---
+# LCM — Liquid Chess Model
+A 29.2M parameter hybrid transformer trained to play chess, built from scratch. LCM uses a novel combination of GQA attention and LIV convolution blocks from Liquid AI's LFM2 architecture, trained with dual NTP + TOP objectives on ~8 million chess games.
+---
+## Architecture
+LCM is a hybrid transformer with two interleaved block types, distributed evenly across 16 layers using a Bresenham algorithm:
+- **6 GQA blocks** — Grouped Query Attention (8 query heads, 2 KV heads) with RoPE positional embeddings and SwiGLU FFN
+- **10 LIV blocks** — Local Input-dependent Value causal convolution (kernel size 4), efficient for local sequential patterns
+- **LRM** — Learnable Rate Multipliers on every block, stabilizing training dynamics
+- **Weight tying** — Embedding and NTP head share weights
+Layer pattern: `GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA`
+| Parameter | Value |
+|-----------|-------|
+| Parameters | 29.2M |
+| d_model | 512 |
+| Layers | 16 (6 GQA + 10 LIV) |
+| Attention heads | 8Q / 2KV |
+| Context length | 255 tokens |
+| Vocab size | 1,977 |
+---
+## Training
+LCM was trained on a combined dataset of ~7.9M chess games:
+- [Chess Elite UCI](https://huggingface.co/datasets/MostLime/chess-elite-uci) — 7.8M games with an average Lichess-rated elo of 2600 per player
+- ~100k additional OTB & outside of Lichess games from private sources
+**Tokenization:** Each game is encoded as a sequence of UCI move strings (`e2e4`, `g1f3`, etc.), prepended with a POV token (`<W>` or `<B>`) indicating the side to predict for.
+**Training objectives:**
+- **NTP (Next Token Prediction, weight=0.30):** Predicts the next move given the sequence so far, applied only to the winning side's moves to avoid teaching losing play.
+- **TOP (Token Order Prediction, weight=0.70):** Predicts the relative order of upcoming tokens in a future window, introduced in [Zuhri et al., 2026](https://arxiv.org/abs/2508.19228) and provides richer training indicator compared to NTP alone.
+**Optimizer:** Muon (2D params) + AdamW (1D params) + AdamW with LRM-specific weight decay
+---
+## Limitations & Future Work
+LCM represents an initial exploration of LFM2-style hybrid architectures for chess as well as TOP to teach the model how to predict future moves. Known limitations:
+- **Tactical blindness** — misses simple immediate threats and captures in some positions. Hypothesized cause: elite training data (2300+ Elo) rarely contains hanging pieces or one-move tactics, so the model never learned to detect them.
+- **Implicit board state** — the model reconstructs position purely from move history rather than an explicit board representation, making it impossible to use for puzzles or any other contexts that don't have full game context.
+- **No search** — LCM selects moves in a single forward pass with no tree search or lookahead.
+**Planned v2 improvements:**
+- Replace LIV blocks against pure GQA to test whether the hybrid architecture helps or hurts
+- Pretrain on Leela Chess Zero (lc0) self-play data for cleaner, stronger training signal
+- Explore explicit board state input (FEN or bitboard tokens) alongside move history
+- More in-depth ablations on hyperparameters like `conv_kernel_size`
+---
+## Quick Start
+```bash
+git clone https://huggingface.co/MostLime/lcm-chess
+cd lcm-chess
+pip install -r requirements.txt
+python generate.py
+```
+Play as black:
+```bash
+python generate.py --side black
+```
+Custom checkpoint:
+```bash
+python generate.py --checkpoint model.safetensors --temperature 0.8
+```
+**Requirements:** Python 3.10+, PyTorch 2.0+, `chess`, `safetensors`
+---
+## Files
+| File | Description |
+|------|-------------|
+| `model.safetensors` | Model weights |
+| `vocab.json` | UCI move vocabulary (1,977 tokens) |
+| `config.py` | Architecture hyperparameters |
+| `model.py` | Model implementation |
+| `generate.py` | Interactive terminal chess interface |
+| `requirements.txt` | Python dependencies |
+---
+## References
+- **LFM2 / LIV blocks:** [Liquid Foundation Models](https://www.liquid.ai/liquid-foundation-models) — Liquid AI, 2024
+- **TOP objective:** [Predicting the Order of Upcoming Tokens Improves Language Modeling](https://arxiv.org/abs/2508.19228) — Zuhri et al., 2026
+- **LRM:** [Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers](https://arxiv.org/abs/2601.04890) — Velikanov et al., 2026
+- **Muon optimizer:** [Muon: An optimizer for the hidden layers of neural networks](https://github.com/KellerJordan/muon) — Jordan, 2024
+- **Training data:** [chess-elite-uci](https://database.nikonoel.fr/)
+---
+## Author
+Built by [MostLime](https://github.com/MostLime)