MostLime commited on
Commit
3d42e39
Β·
verified Β·
1 Parent(s): b2c1dad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -3
README.md CHANGED
@@ -1,3 +1,127 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - chess
7
+ - transformer
8
+ - gqa
9
+ - hybrid
10
+ - reinforcement-learning
11
+ - game
12
+ library_name: custom
13
+ pipeline_tag: text-generation
14
+ datasets:
15
+ - MostLime/chess-elite-uci
16
+ ---
17
+
18
+ # LCM β€” Liquid Chess Model
19
+
20
+ A 29.2M parameter hybrid transformer trained to play chess, built from scratch. LCM uses a novel combination of GQA attention and LIV convolution blocks from Liquid AI's LFM2 architecture, trained with dual NTP + TOP objectives on ~8 million chess games.
21
+
22
+ ---
23
+
24
+ ## Architecture
25
+
26
+ LCM is a hybrid transformer with two interleaved block types, distributed evenly across 16 layers using a Bresenham algorithm:
27
+
28
+ - **6 GQA blocks** β€” Grouped Query Attention (8 query heads, 2 KV heads) with RoPE positional embeddings and SwiGLU FFN
29
+ - **10 LIV blocks** β€” Local Input-dependent Value causal convolution (kernel size 4), efficient for local sequential patterns
30
+ - **LRM** β€” Learnable Rate Multipliers on every block, stabilizing training dynamics
31
+ - **Weight tying** β€” Embedding and NTP head share weights
32
+
33
+ Layer pattern: `GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA LIV LIV GQA`
34
+
35
+ | Parameter | Value |
36
+ |-----------|-------|
37
+ | Parameters | 29.2M |
38
+ | d_model | 512 |
39
+ | Layers | 16 (6 GQA + 10 LIV) |
40
+ | Attention heads | 8Q / 2KV |
41
+ | Context length | 255 tokens |
42
+ | Vocab size | 1,977 |
43
+
44
+ ---
45
+
46
+ ## Training
47
+
48
+ LCM was trained on a combined dataset of ~7.9M chess games:
49
+
50
+ - [Chess Elite UCI](https://huggingface.co/datasets/MostLime/chess-elite-uci) β€” 7.8M games with an average Lichess-rated elo of 2600 per player
51
+ - ~100k additional OTB & outside of Lichess games from private sources
52
+
53
+ **Tokenization:** Each game is encoded as a sequence of UCI move strings (`e2e4`, `g1f3`, etc.), prepended with a POV token (`<W>` or `<B>`) indicating the side to predict for.
54
+
55
+ **Training objectives:**
56
+ - **NTP (Next Token Prediction, weight=0.30):** Predicts the next move given the sequence so far, applied only to the winning side's moves to avoid teaching losing play.
57
+ - **TOP (Token Order Prediction, weight=0.70):** Predicts the relative order of upcoming tokens in a future window, introduced in [Zuhri et al., 2026](https://arxiv.org/abs/2508.19228) and provides richer training indicator compared to NTP alone.
58
+
59
+ **Optimizer:** Muon (2D params) + AdamW (1D params) + AdamW with LRM-specific weight decay
60
+
61
+ ---
62
+
63
+ ## Limitations & Future Work
64
+
65
+ LCM represents an initial exploration of LFM2-style hybrid architectures for chess as well as TOP to teach the model how to predict future moves. Known limitations:
66
+
67
+ - **Tactical blindness** β€” misses simple immediate threats and captures in some positions. Hypothesized cause: elite training data (2300+ Elo) rarely contains hanging pieces or one-move tactics, so the model never learned to detect them.
68
+ - **Implicit board state** β€” the model reconstructs position purely from move history rather than an explicit board representation, making it impossible to use for puzzles or any other contexts that don't have full game context.
69
+ - **No search** β€” LCM selects moves in a single forward pass with no tree search or lookahead.
70
+
71
+ **Planned v2 improvements:**
72
+ - Replace LIV blocks against pure GQA to test whether the hybrid architecture helps or hurts
73
+ - Pretrain on Leela Chess Zero (lc0) self-play data for cleaner, stronger training signal
74
+ - Explore explicit board state input (FEN or bitboard tokens) alongside move history
75
+ - More in-depth ablations on hyperparameters like `conv_kernel_size`
76
+
77
+ ---
78
+
79
+ ## Quick Start
80
+
81
+ ```bash
82
+ git clone https://huggingface.co/MostLime/lcm-chess
83
+ cd lcm-chess
84
+ pip install -r requirements.txt
85
+ python generate.py
86
+ ```
87
+
88
+ Play as black:
89
+ ```bash
90
+ python generate.py --side black
91
+ ```
92
+
93
+ Custom checkpoint:
94
+ ```bash
95
+ python generate.py --checkpoint model.safetensors --temperature 0.8
96
+ ```
97
+
98
+ **Requirements:** Python 3.10+, PyTorch 2.0+, `chess`, `safetensors`
99
+
100
+ ---
101
+
102
+ ## Files
103
+
104
+ | File | Description |
105
+ |------|-------------|
106
+ | `model.safetensors` | Model weights |
107
+ | `vocab.json` | UCI move vocabulary (1,977 tokens) |
108
+ | `config.py` | Architecture hyperparameters |
109
+ | `model.py` | Model implementation |
110
+ | `generate.py` | Interactive terminal chess interface |
111
+ | `requirements.txt` | Python dependencies |
112
+
113
+ ---
114
+
115
+ ## References
116
+
117
+ - **LFM2 / LIV blocks:** [Liquid Foundation Models](https://www.liquid.ai/liquid-foundation-models) β€” Liquid AI, 2024
118
+ - **TOP objective:** [Predicting the Order of Upcoming Tokens Improves Language Modeling](https://arxiv.org/abs/2508.19228) β€” Zuhri et al., 2026
119
+ - **LRM:** [Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers](https://arxiv.org/abs/2601.04890) β€” Velikanov et al., 2026
120
+ - **Muon optimizer:** [Muon: An optimizer for the hidden layers of neural networks](https://github.com/KellerJordan/muon) β€” Jordan, 2024
121
+ - **Training data:** [chess-elite-uci](https://database.nikonoel.fr/)
122
+
123
+ ---
124
+
125
+ ## Author
126
+
127
+ Built by [MostLime](https://github.com/MostLime)