Text Generation
Transformers
ONNX
Safetensors
English
gpt2
chess
pgn
causal-lm
game-playing
text-generation-inference
InterwebAlchemy commited on
Commit
472946d
·
verified ·
1 Parent(s): e5af006

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,212 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - chess
7
+ - pgn
8
+ - gpt2
9
+ - causal-lm
10
+ - game-playing
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ ---
14
+
15
+ # kn1ght-bullet
16
+
17
+ A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation.
18
+ Intended for use in chess tutoring applications via constrained decoding at inference time.
19
+
20
+ **bullet** refers to the model's size tier — small and fast, in the same spirit as chess time controls.
21
+
22
+ ---
23
+
24
+ ## Model Details
25
+
26
+ | | |
27
+ | --------------- | ------------------------------------------ |
28
+ | Architecture | GPT (4 layers, 4 heads, 256 embedding dim) |
29
+ | Parameters | 4.3M |
30
+ | Context length | 256 tokens |
31
+ | Vocabulary | 4,096 BPE tokens (chess PGN–specific) |
32
+ | Training format | PGN text (`[g_start]1.e4 e5 2.Nf3 ...`) |
33
+
34
+ The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most
35
+ moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and
36
+ makes constrained decoding straightforward — legal move masking is a one-step operation
37
+ for the large majority of positions.
38
+
39
+ ---
40
+
41
+ ## Training Pipeline
42
+
43
+ Training proceeded in three phases from the
44
+ [InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens)
45
+ dataset (~3.5M games, average ELO ~2240, spanning 1783–2006).
46
+
47
+ **Phase 1 — Pre-training**
48
+ 200,000 steps on 100,000 games. The model learns PGN structure and develops opening
49
+ pattern recognition across a wide range of named lines.
50
+
51
+ **Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)**
52
+ A self-improvement loop: generate continuations from named opening prompts, filter to
53
+ legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and
54
+ fine-tune. Repeated five times, growing the legal training set from 67 games (9.1%
55
+ pass rate) to 796 games (67.5% pass rate).
56
+
57
+ **Phase 3 — DPO (300 steps)**
58
+ Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions)
59
+ rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable
60
+ throughout, confirming the model retains PGN structure.
61
+
62
+ ---
63
+
64
+ ## Evaluation
65
+
66
+ Evaluated against chess-specialist models and frontier LLMs on three tasks.
67
+
68
+ - **kn1ght models** use the custom 4,096-token chess BPE tokenizer with a `[g_start]`
69
+ game-start prefix.
70
+ - **HuggingFace specialist models** (chessgpt-base-v1, chesspythia-70m) use their own
71
+ model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input
72
+ is raw PGN text with no special prefix.
73
+ - **Frontier LLMs** receive raw PGN prompts via the OpenRouter API; completion models
74
+ (gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a
75
+ short system prompt (`"You play chess. Reply with only the next move in SAN notation."`).
76
+
77
+ ### Phase B — Opening play (50 positions × 10 generations)
78
+
79
+ Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's
80
+ best move at depth 15. Lower is better.
81
+
82
+ | Model | Params | Mean CPL ↓ | Legality | Blunder % |
83
+ | ------------------------------ | -------- | ---------- | --------- | --------- |
84
+ | Gemini 3.1 Flash Lite | ~8B | **2.58** | 100% | 0.0% |
85
+ | chessgpt-base-v1 | ~85M | 4.92 | 99.6% | 0.2% |
86
+ | gpt-3.5-turbo-instruct | ~175B | 5.79 | 99.4% | 0.0% |
87
+ | **kn1ght-bullet (this model)** | **4.3M** | **5.83** | **99.8%** | **0.0%** |
88
+ | DeepSeek V3 | ~685B | 8.18 | 86.0% | 0.4% |
89
+
90
+ kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in
91
+ mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy
92
+ Lopez variations well-represented in the training data, and weaker in less-common
93
+ openings such as the Benoni and Colle System.
94
+
95
+ ### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)
96
+
97
+ Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551),
98
+ presented as full PGN game context up to the puzzle position.
99
+
100
+ | Model | Top-1 Accuracy | Legality |
101
+ | ---------------------- | -------------- | -------- |
102
+ | Gemini 3.1 Flash Lite | 49% | 98% |
103
+ | chessgpt-base-v1 | 34% | 97% |
104
+ | gpt-3.5-turbo-instruct | 26% | 63% |
105
+ | **kn1ght-bullet** | **10%** | **58%** |
106
+ | DeepSeek V3 | 12% | 62% |
107
+
108
+ Tactical puzzle accuracy is constrained by model capacity at this scale. With
109
+ constrained decoding at inference time, the model selects the highest-ranked legal
110
+ move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.
111
+
112
+ ### Phase C — FEN puzzle accuracy
113
+
114
+ kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never
115
+ present in the training data; feeding FEN to the model produces arbitrary output. This
116
+ is a known and intentional limitation of PGN-only training.
117
+
118
+ ---
119
+
120
+ ## Usage
121
+
122
+ ### With transformers.js (browser / Node.js)
123
+
124
+ The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser
125
+ delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference.
126
+
127
+ ```javascript
128
+ import { pipeline } from "@xenova/transformers";
129
+
130
+ const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
131
+ const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
132
+ max_new_tokens: 10,
133
+ do_sample: true,
134
+ temperature: 0.8,
135
+ top_k: 40,
136
+ });
137
+ ```
138
+
139
+ **Constrained decoding** is strongly recommended in production. At each move step,
140
+ mask the logits to only the token IDs of legal moves (from `chess.js`) before
141
+ sampling. This guarantees legal play and lets the model's probability distribution
142
+ over legal moves act as an opening-quality signal.
143
+
144
+ ```javascript
145
+ // Build per-position allowlist once, not inside the generation loop
146
+ const legalMoves = chess.moves();
147
+ const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));
148
+
149
+ function maskLogits(logits) {
150
+ for (let i = 0; i < logits.length; i++) {
151
+ if (!allowedIds.has(i)) logits[i] = -Infinity;
152
+ }
153
+ return logits;
154
+ }
155
+ ```
156
+
157
+ ### With Python (PyTorch)
158
+
159
+ ```python
160
+ import torch
161
+ from tokenizers import Tokenizer
162
+
163
+ # Load the tokenizer
164
+ tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")
165
+
166
+ # Load via ONNX (recommended)
167
+ import onnxruntime as ort
168
+ session = ort.InferenceSession("onnx/model.onnx")
169
+
170
+ pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
171
+ input_ids = tokenizer.encode(pgn).ids
172
+ logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
173
+ next_token = logits[0, -1].argmax()
174
+ print(tokenizer.decode([next_token]))
175
+ ```
176
+
177
+ ---
178
+
179
+ ## Limitations
180
+
181
+ - **PGN-only**: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
182
+ - **Opening-focused**: Training data emphasises the opening phase. Middlegame and
183
+ endgame play degrades without constrained decoding.
184
+ - **256-token context**: Long games approaching move 60+ may exceed the context window.
185
+ - **Not a chess engine**: Does not perform search or lookahead. Move quality reflects
186
+ learned opening patterns, not calculation.
187
+
188
+ ---
189
+
190
+ ## Files
191
+
192
+ | File | Description |
193
+ | --------------------------- | -------------------------------------------------------------- |
194
+ | `onnx/model.onnx` | Full-precision ONNX (21.6 MB) |
195
+ | `onnx/model_quantized.onnx` | Int8 quantized ONNX (5.7 MB) — recommended for browser |
196
+ | `tokenizer.json` | BPE tokenizer, loadable by transformers.js and HF `tokenizers` |
197
+ | `config.json` | Model architecture |
198
+ | `generation_config.json` | Default generation parameters |
199
+
200
+ ---
201
+
202
+ ## Citation
203
+
204
+ ```bibtex
205
+ @misc{kn1ght-bullet,
206
+ author = {InterwebAlchemy},
207
+ title = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
208
+ year = {2026},
209
+ publisher = {HuggingFace},
210
+ url = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
211
+ }
212
+ ```
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "gpt2",
3
+ "architectures": [
4
+ "GPT2LMHeadModel"
5
+ ],
6
+ "_name_or_path": "InterwebAlchemy/kn1ght-bullet",
7
+ "vocab_size": 4096,
8
+ "n_embd": 256,
9
+ "n_head": 4,
10
+ "n_layer": 4,
11
+ "n_positions": 256,
12
+ "n_inner": 1024,
13
+ "activation_function": "gelu_new",
14
+ "resid_pdrop": 0.0,
15
+ "embd_pdrop": 0.0,
16
+ "attn_pdrop": 0.0,
17
+ "layer_norm_epsilon": 1e-05,
18
+ "initializer_range": 0.02,
19
+ "scale_attn_weights": true,
20
+ "reorder_and_upcast_attn": false,
21
+ "scale_attn_by_inverse_layer_idx": false,
22
+ "use_cache": true,
23
+ "bos_token_id": 0,
24
+ "eos_token_id": 1,
25
+ "pad_token_id": 3
26
+ }
generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 1,
5
+ "pad_token_id": 3,
6
+ "max_new_tokens": 256,
7
+ "do_sample": true,
8
+ "temperature": 0.8,
9
+ "top_k": 40
10
+ }
onnx/model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a239d08c78ed7d9984b1e5cf57b34bd940c6ab6ecf44e4ef858410740b0d3a0
3
+ size 21573148
onnx/model_quantized.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36f840dbe6a77af06e578663de595804bd8df99cb5ec02151e6d49d8021b1b8d
3
+ size 5694652
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "PreTrainedTokenizerFast",
3
+ "bos_token": "[g_start]",
4
+ "eos_token": "[g_end]",
5
+ "unk_token": "[unknown]",
6
+ "pad_token": "[pad]",
7
+ "model_max_length": 256,
8
+ "clean_up_tokenization_spaces": false
9
+ }