File size: 5,019 Bytes
3d42e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48220d2
 
3d42e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5759133
3d42e39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5759133
3d42e39
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
language:
- en
license: mit
tags:
- chess
- transformer
- gqa
- hybrid
- reinforcement-learning
- game
library_name: custom
pipeline_tag: text-generation
datasets:
- MostLime/chess-elite-uci
---

# LCM β€” Liquid Chess Model

A 29.2M parameter hybrid transformer trained to play chess, built from scratch. LCM uses a novel combination of GQA attention and LIV convolution blocks from Liquid AI's LFM2 architecture, trained with dual NTP + TOP objectives on ~8 million chess games.

Play against it online [here](https://huggingface.co/spaces/MostLime/lcm-chess-playground).

---

## Architecture

LCM is a hybrid transformer with two interleaved block types, distributed evenly across 16 layers using a Bresenham algorithm:

- **6 GQA blocks** β€” Grouped Query Attention (8 query heads, 2 KV heads) with RoPE positional embeddings and SwiGLU FFN
- **10 LIV blocks** β€” Local Input-dependent Value causal convolution (kernel size 4), efficient for local sequential patterns
- **LRM** β€” Learnable Rate Multipliers on every block, stabilizing training dynamics
- **Weight tying** β€” Embedding and NTP head share weights

Layer pattern: `GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA`

| Parameter | Value |
|-----------|-------|
| Parameters | 29.2M |
| d_model | 512 |
| Layers | 16 (6 GQA + 10 LIV) |
| Attention heads | 8Q / 2KV |
| Context length | 255 tokens |
| Vocab size | 1,977 |

---

## Training

LCM was trained on a combined dataset of ~7.9M chess games:

- [Chess Elite UCI](https://huggingface.co/datasets/MostLime/chess-elite-uci) β€” 7.8M games with an average Lichess-rated elo of 2600 per player
- ~100k additional OTB & outside of Lichess games from private sources

**Tokenization:** Each game is encoded as a sequence of UCI move strings (`e2e4`, `g1f3`, etc.), prepended with a POV token (`<W>` or `<B>`) indicating the side to predict for.

**Training objectives:**
- **NTP (Next Token Prediction, weight=0.30):** Predicts the next move given the sequence so far, applied only to the winning side's moves to avoid teaching losing play.
- **TOP (Token Order Prediction, weight=0.70):** Predicts the relative order of upcoming tokens in a future window, introduced in [Zuhri et al., 2026](https://arxiv.org/abs/2508.19228) and provides richer training indicator compared to NTP alone.

**Optimizer:** Muon (2D params) + AdamW (1D params) + AdamW with LRM-specific weight decay

---

## Limitations & Future Work

LCM represents an initial exploration of LFM2-style hybrid architectures for chess as well as TOP to teach the model how to predict future moves. Known limitations:

- **Tactical blindness** β€” misses simple immediate threats and captures in some positions. Hypothesized cause: elite training data (~2600 Elo) rarely contains hanging pieces or one-move tactics, so the model never learned to detect them.
- **Implicit board state** β€” the model reconstructs position purely from move history rather than an explicit board representation, making it impossible to use for puzzles or any other contexts that don't have full game context.
- **No search** β€” LCM selects moves in a single forward pass with no tree search or lookahead.

**Planned v2 improvements:**
- Replace LIV blocks against pure GQA to test whether the hybrid architecture helps or hurts
- Pretrain on Leela Chess Zero (lc0) self-play data for cleaner, stronger training signal
- Explore explicit board state input (FEN or bitboard tokens) alongside move history
- More in-depth ablations on hyperparameters like `conv_kernel_size`

---

## Quick Start

```bash
git clone https://huggingface.co/MostLime/lcm-chess
cd lcm-chess
pip install -r requirements.txt
python generate.py
```

Play as black:
```bash
python generate.py --side black
```

Custom checkpoint:
```bash
python generate.py --checkpoint model.safetensors --temperature 0.8
```

**Requirements:** Python 3.10+, PyTorch 2.0+, `chess`, `safetensors`

---

## Files

| File | Description |
|------|-------------|
| `model.safetensors` | Model weights |
| `vocab.json` | UCI move vocabulary (1,977 tokens) |
| `config.py` | Architecture hyperparameters |
| `model.py` | Model implementation |
| `generate.py` | Interactive terminal chess interface |
| `requirements.txt` | Python dependencies |

---

## References

- **LFM2 / LIV blocks:** [Liquid Foundation Models](https://www.liquid.ai/liquid-foundation-models) β€” Liquid AI, 2024
- **TOP objective:** [Predicting the Order of Upcoming Tokens Improves Language Modeling](https://arxiv.org/abs/2508.19228) β€” Zuhri et al., 2026
- **LRM:** [Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers](https://arxiv.org/abs/2601.04890) β€” Velikanov et al., 2026
- **Muon optimizer:** [Muon: An optimizer for the hidden layers of neural networks](https://github.com/KellerJordan/muon) β€” Jordan, 2024
- **Training data:** [chess-elite-uci](https://huggingface.co/datasets/MostLime/chess-elite-uci)

---

## Author

Built by [MostLime](https://github.com/MostLime)