Upload folder using huggingface_hub
Browse files- README.md +212 -3
- config.json +26 -0
- generation_config.json +10 -0
- onnx/model.onnx +3 -0
- onnx/model_quantized.onnx +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +9 -0
README.md
CHANGED
|
@@ -1,3 +1,212 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: mit
|
| 5 |
+
tags:
|
| 6 |
+
- chess
|
| 7 |
+
- pgn
|
| 8 |
+
- gpt2
|
| 9 |
+
- causal-lm
|
| 10 |
+
- game-playing
|
| 11 |
+
library_name: transformers
|
| 12 |
+
pipeline_tag: text-generation
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# kn1ght-bullet
|
| 16 |
+
|
| 17 |
+
A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation.
|
| 18 |
+
Intended for use in chess tutoring applications via constrained decoding at inference time.
|
| 19 |
+
|
| 20 |
+
**bullet** refers to the model's size tier — small and fast, in the same spirit as chess time controls.
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Model Details
|
| 25 |
+
|
| 26 |
+
| | |
|
| 27 |
+
| --------------- | ------------------------------------------ |
|
| 28 |
+
| Architecture | GPT (4 layers, 4 heads, 256 embedding dim) |
|
| 29 |
+
| Parameters | 4.3M |
|
| 30 |
+
| Context length | 256 tokens |
|
| 31 |
+
| Vocabulary | 4,096 BPE tokens (chess PGN–specific) |
|
| 32 |
+
| Training format | PGN text (`[g_start]1.e4 e5 2.Nf3 ...`) |
|
| 33 |
+
|
| 34 |
+
The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most
|
| 35 |
+
moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and
|
| 36 |
+
makes constrained decoding straightforward — legal move masking is a one-step operation
|
| 37 |
+
for the large majority of positions.
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Training Pipeline
|
| 42 |
+
|
| 43 |
+
Training proceeded in three phases from the
|
| 44 |
+
[InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens)
|
| 45 |
+
dataset (~3.5M games, average ELO ~2240, spanning 1783–2006).
|
| 46 |
+
|
| 47 |
+
**Phase 1 — Pre-training**
|
| 48 |
+
200,000 steps on 100,000 games. The model learns PGN structure and develops opening
|
| 49 |
+
pattern recognition across a wide range of named lines.
|
| 50 |
+
|
| 51 |
+
**Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)**
|
| 52 |
+
A self-improvement loop: generate continuations from named opening prompts, filter to
|
| 53 |
+
legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and
|
| 54 |
+
fine-tune. Repeated five times, growing the legal training set from 67 games (9.1%
|
| 55 |
+
pass rate) to 796 games (67.5% pass rate).
|
| 56 |
+
|
| 57 |
+
**Phase 3 — DPO (300 steps)**
|
| 58 |
+
Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions)
|
| 59 |
+
rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable
|
| 60 |
+
throughout, confirming the model retains PGN structure.
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## Evaluation
|
| 65 |
+
|
| 66 |
+
Evaluated against chess-specialist models and frontier LLMs on three tasks.
|
| 67 |
+
|
| 68 |
+
- **kn1ght models** use the custom 4,096-token chess BPE tokenizer with a `[g_start]`
|
| 69 |
+
game-start prefix.
|
| 70 |
+
- **HuggingFace specialist models** (chessgpt-base-v1, chesspythia-70m) use their own
|
| 71 |
+
model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input
|
| 72 |
+
is raw PGN text with no special prefix.
|
| 73 |
+
- **Frontier LLMs** receive raw PGN prompts via the OpenRouter API; completion models
|
| 74 |
+
(gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a
|
| 75 |
+
short system prompt (`"You play chess. Reply with only the next move in SAN notation."`).
|
| 76 |
+
|
| 77 |
+
### Phase B — Opening play (50 positions × 10 generations)
|
| 78 |
+
|
| 79 |
+
Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's
|
| 80 |
+
best move at depth 15. Lower is better.
|
| 81 |
+
|
| 82 |
+
| Model | Params | Mean CPL ↓ | Legality | Blunder % |
|
| 83 |
+
| ------------------------------ | -------- | ---------- | --------- | --------- |
|
| 84 |
+
| Gemini 3.1 Flash Lite | ~8B | **2.58** | 100% | 0.0% |
|
| 85 |
+
| chessgpt-base-v1 | ~85M | 4.92 | 99.6% | 0.2% |
|
| 86 |
+
| gpt-3.5-turbo-instruct | ~175B | 5.79 | 99.4% | 0.0% |
|
| 87 |
+
| **kn1ght-bullet (this model)** | **4.3M** | **5.83** | **99.8%** | **0.0%** |
|
| 88 |
+
| DeepSeek V3 | ~685B | 8.18 | 86.0% | 0.4% |
|
| 89 |
+
|
| 90 |
+
kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in
|
| 91 |
+
mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy
|
| 92 |
+
Lopez variations well-represented in the training data, and weaker in less-common
|
| 93 |
+
openings such as the Benoni and Colle System.
|
| 94 |
+
|
| 95 |
+
### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)
|
| 96 |
+
|
| 97 |
+
Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551),
|
| 98 |
+
presented as full PGN game context up to the puzzle position.
|
| 99 |
+
|
| 100 |
+
| Model | Top-1 Accuracy | Legality |
|
| 101 |
+
| ---------------------- | -------------- | -------- |
|
| 102 |
+
| Gemini 3.1 Flash Lite | 49% | 98% |
|
| 103 |
+
| chessgpt-base-v1 | 34% | 97% |
|
| 104 |
+
| gpt-3.5-turbo-instruct | 26% | 63% |
|
| 105 |
+
| **kn1ght-bullet** | **10%** | **58%** |
|
| 106 |
+
| DeepSeek V3 | 12% | 62% |
|
| 107 |
+
|
| 108 |
+
Tactical puzzle accuracy is constrained by model capacity at this scale. With
|
| 109 |
+
constrained decoding at inference time, the model selects the highest-ranked legal
|
| 110 |
+
move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.
|
| 111 |
+
|
| 112 |
+
### Phase C — FEN puzzle accuracy
|
| 113 |
+
|
| 114 |
+
kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never
|
| 115 |
+
present in the training data; feeding FEN to the model produces arbitrary output. This
|
| 116 |
+
is a known and intentional limitation of PGN-only training.
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## Usage
|
| 121 |
+
|
| 122 |
+
### With transformers.js (browser / Node.js)
|
| 123 |
+
|
| 124 |
+
The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser
|
| 125 |
+
delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference.
|
| 126 |
+
|
| 127 |
+
```javascript
|
| 128 |
+
import { pipeline } from "@xenova/transformers";
|
| 129 |
+
|
| 130 |
+
const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
|
| 131 |
+
const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
|
| 132 |
+
max_new_tokens: 10,
|
| 133 |
+
do_sample: true,
|
| 134 |
+
temperature: 0.8,
|
| 135 |
+
top_k: 40,
|
| 136 |
+
});
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
**Constrained decoding** is strongly recommended in production. At each move step,
|
| 140 |
+
mask the logits to only the token IDs of legal moves (from `chess.js`) before
|
| 141 |
+
sampling. This guarantees legal play and lets the model's probability distribution
|
| 142 |
+
over legal moves act as an opening-quality signal.
|
| 143 |
+
|
| 144 |
+
```javascript
|
| 145 |
+
// Build per-position allowlist once, not inside the generation loop
|
| 146 |
+
const legalMoves = chess.moves();
|
| 147 |
+
const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));
|
| 148 |
+
|
| 149 |
+
function maskLogits(logits) {
|
| 150 |
+
for (let i = 0; i < logits.length; i++) {
|
| 151 |
+
if (!allowedIds.has(i)) logits[i] = -Infinity;
|
| 152 |
+
}
|
| 153 |
+
return logits;
|
| 154 |
+
}
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
### With Python (PyTorch)
|
| 158 |
+
|
| 159 |
+
```python
|
| 160 |
+
import torch
|
| 161 |
+
from tokenizers import Tokenizer
|
| 162 |
+
|
| 163 |
+
# Load the tokenizer
|
| 164 |
+
tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")
|
| 165 |
+
|
| 166 |
+
# Load via ONNX (recommended)
|
| 167 |
+
import onnxruntime as ort
|
| 168 |
+
session = ort.InferenceSession("onnx/model.onnx")
|
| 169 |
+
|
| 170 |
+
pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
|
| 171 |
+
input_ids = tokenizer.encode(pgn).ids
|
| 172 |
+
logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
|
| 173 |
+
next_token = logits[0, -1].argmax()
|
| 174 |
+
print(tokenizer.decode([next_token]))
|
| 175 |
+
```
|
| 176 |
+
|
| 177 |
+
---
|
| 178 |
+
|
| 179 |
+
## Limitations
|
| 180 |
+
|
| 181 |
+
- **PGN-only**: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
|
| 182 |
+
- **Opening-focused**: Training data emphasises the opening phase. Middlegame and
|
| 183 |
+
endgame play degrades without constrained decoding.
|
| 184 |
+
- **256-token context**: Long games approaching move 60+ may exceed the context window.
|
| 185 |
+
- **Not a chess engine**: Does not perform search or lookahead. Move quality reflects
|
| 186 |
+
learned opening patterns, not calculation.
|
| 187 |
+
|
| 188 |
+
---
|
| 189 |
+
|
| 190 |
+
## Files
|
| 191 |
+
|
| 192 |
+
| File | Description |
|
| 193 |
+
| --------------------------- | -------------------------------------------------------------- |
|
| 194 |
+
| `onnx/model.onnx` | Full-precision ONNX (21.6 MB) |
|
| 195 |
+
| `onnx/model_quantized.onnx` | Int8 quantized ONNX (5.7 MB) — recommended for browser |
|
| 196 |
+
| `tokenizer.json` | BPE tokenizer, loadable by transformers.js and HF `tokenizers` |
|
| 197 |
+
| `config.json` | Model architecture |
|
| 198 |
+
| `generation_config.json` | Default generation parameters |
|
| 199 |
+
|
| 200 |
+
---
|
| 201 |
+
|
| 202 |
+
## Citation
|
| 203 |
+
|
| 204 |
+
```bibtex
|
| 205 |
+
@misc{kn1ght-bullet,
|
| 206 |
+
author = {InterwebAlchemy},
|
| 207 |
+
title = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
|
| 208 |
+
year = {2026},
|
| 209 |
+
publisher = {HuggingFace},
|
| 210 |
+
url = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
|
| 211 |
+
}
|
| 212 |
+
```
|
config.json
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "gpt2",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"GPT2LMHeadModel"
|
| 5 |
+
],
|
| 6 |
+
"_name_or_path": "InterwebAlchemy/kn1ght-bullet",
|
| 7 |
+
"vocab_size": 4096,
|
| 8 |
+
"n_embd": 256,
|
| 9 |
+
"n_head": 4,
|
| 10 |
+
"n_layer": 4,
|
| 11 |
+
"n_positions": 256,
|
| 12 |
+
"n_inner": 1024,
|
| 13 |
+
"activation_function": "gelu_new",
|
| 14 |
+
"resid_pdrop": 0.0,
|
| 15 |
+
"embd_pdrop": 0.0,
|
| 16 |
+
"attn_pdrop": 0.0,
|
| 17 |
+
"layer_norm_epsilon": 1e-05,
|
| 18 |
+
"initializer_range": 0.02,
|
| 19 |
+
"scale_attn_weights": true,
|
| 20 |
+
"reorder_and_upcast_attn": false,
|
| 21 |
+
"scale_attn_by_inverse_layer_idx": false,
|
| 22 |
+
"use_cache": true,
|
| 23 |
+
"bos_token_id": 0,
|
| 24 |
+
"eos_token_id": 1,
|
| 25 |
+
"pad_token_id": 3
|
| 26 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"bos_token_id": 0,
|
| 4 |
+
"eos_token_id": 1,
|
| 5 |
+
"pad_token_id": 3,
|
| 6 |
+
"max_new_tokens": 256,
|
| 7 |
+
"do_sample": true,
|
| 8 |
+
"temperature": 0.8,
|
| 9 |
+
"top_k": 40
|
| 10 |
+
}
|
onnx/model.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0a239d08c78ed7d9984b1e5cf57b34bd940c6ab6ecf44e4ef858410740b0d3a0
|
| 3 |
+
size 21573148
|
onnx/model_quantized.onnx
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:36f840dbe6a77af06e578663de595804bd8df99cb5ec02151e6d49d8021b1b8d
|
| 3 |
+
size 5694652
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"tokenizer_class": "PreTrainedTokenizerFast",
|
| 3 |
+
"bos_token": "[g_start]",
|
| 4 |
+
"eos_token": "[g_end]",
|
| 5 |
+
"unk_token": "[unknown]",
|
| 6 |
+
"pad_token": "[pad]",
|
| 7 |
+
"model_max_length": 256,
|
| 8 |
+
"clean_up_tokenization_spaces": false
|
| 9 |
+
}
|