Text Generation
Transformers
ONNX
Safetensors
English
gpt2
chess
pgn
causal-lm
game-playing
text-generation-inference
kn1ght-bullet / README.md
InterwebAlchemy's picture
Upload folder using huggingface_hub
cb98ff2 verified
---
language:
- en
license: cc-by-4.0
tags:
- chess
- pgn
- gpt2
- causal-lm
- game-playing
library_name: transformers
pipeline_tag: text-generation
datasets:
- InterwebAlchemy/pgn-dataset
- InterwebAlchemy/pgn-dataset-including-special-tokens
- InterwebAlchemy/pgn-lichess-puzzle-dataset
thumbnail: https://huggingface.co/InterwebAlchemy/kn1ght-bullet/resolve/main/assets/kn1ght-bullet.png
---
# kn1ght-bullet
![kn1ght-bullet](kn1ght-bullet.png)
A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation.
Intended for use in chess tutoring applications via constrained decoding at inference time.
**bullet** refers to the model's size tier — small and fast, in the same spirit as chess time controls.
---
## Model Details
| | |
| --------------- | ------------------------------------------ |
| Architecture | GPT (4 layers, 4 heads, 256 embedding dim) |
| Parameters | 4.3M |
| Context length | 256 tokens |
| Vocabulary | 4,096 BPE tokens (chess PGN–specific) |
| Training format | PGN text (`[g_start]1.e4 e5 2.Nf3 ...`) |
The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most
moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and
makes constrained decoding straightforward — legal move masking is a one-step operation
for the large majority of positions.
---
## Training Pipeline
Training proceeded in three phases. Pre-training used
[InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens)
(~3.5M games, average ELO ~2240, spanning 1783–2006), derived from the base
[InterwebAlchemy/pgn-dataset](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset)
which adds `[g_start]` / `[g_end]` game boundary tokens.
**Phase 1 — Pre-training**
200,000 steps on 100,000 games. The model learns PGN structure and develops opening
pattern recognition across a wide range of named lines.
**Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)**
A self-improvement loop: generate continuations from named opening prompts, filter to
legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and
fine-tune. Repeated five times, growing the legal training set from 67 games (9.1%
pass rate) to 796 games (67.5% pass rate).
**Phase 3 — DPO (300 steps)**
Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions)
rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable
throughout, confirming the model retains PGN structure.
---
## Evaluation
Evaluated against chess-specialist models and frontier LLMs on three tasks.
- **kn1ght models** use the custom 4,096-token chess BPE tokenizer with a `[g_start]`
game-start prefix.
- **HuggingFace specialist models** (chessgpt-base-v1, chesspythia-70m) use their own
model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input
is raw PGN text with no special prefix.
- **Frontier LLMs** receive raw PGN prompts via the OpenRouter API; completion models
(gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a
short system prompt (`"You play chess. Reply with only the next move in SAN notation."`).
### Phase B — Opening play (50 positions × 10 generations)
Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's
best move at depth 15. Lower is better.
| Model | Params | Mean CPL ↓ | Legality | Blunder % |
| ------------------------------ | -------- | ---------- | --------- | --------- |
| Gemini 3.1 Flash Lite | ~8B | **2.58** | 100% | 0.0% |
| chessgpt-base-v1 | ~85M | 4.92 | 99.6% | 0.2% |
| gpt-3.5-turbo-instruct | ~175B | 5.79 | 99.4% | 0.0% |
| **kn1ght-bullet (this model)** | **4.3M** | **5.83** | **99.8%** | **0.0%** |
| DeepSeek V3 | ~685B | 8.18 | 86.0% | 0.4% |
kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in
mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy
Lopez variations well-represented in the training data, and weaker in less-common
openings such as the Benoni and Colle System.
### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)
Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551),
presented as full PGN game context up to the puzzle position.
| Model | Top-1 Accuracy | Legality |
| ---------------------- | -------------- | -------- |
| Gemini 3.1 Flash Lite | 49% | 98% |
| chessgpt-base-v1 | 34% | 97% |
| gpt-3.5-turbo-instruct | 26% | 63% |
| **kn1ght-bullet** | **10%** | **58%** |
| DeepSeek V3 | 12% | 62% |
Tactical puzzle accuracy is constrained by model capacity at this scale. With
constrained decoding at inference time, the model selects the highest-ranked legal
move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.
### Phase C — FEN puzzle accuracy
kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never
present in the training data; feeding FEN to the model produces arbitrary output. This
is a known and intentional limitation of PGN-only training.
---
## Usage
### With transformers.js (browser / Node.js)
The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser
delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference.
```javascript
import { pipeline } from "@xenova/transformers";
const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
max_new_tokens: 10,
do_sample: true,
temperature: 0.8,
top_k: 40,
});
```
**Constrained decoding** is strongly recommended in production. At each move step,
mask the logits to only the token IDs of legal moves (from `chess.js`) before
sampling. This guarantees legal play and lets the model's probability distribution
over legal moves act as an opening-quality signal.
```javascript
// Build per-position allowlist once, not inside the generation loop
const legalMoves = chess.moves();
const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));
function maskLogits(logits) {
for (let i = 0; i < logits.length; i++) {
if (!allowedIds.has(i)) logits[i] = -Infinity;
}
return logits;
}
```
### With Python (PyTorch)
```python
import torch
from tokenizers import Tokenizer
# Load the tokenizer
tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")
# Load via ONNX (recommended)
import onnxruntime as ort
session = ort.InferenceSession("onnx/model.onnx")
pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
input_ids = tokenizer.encode(pgn).ids
logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
next_token = logits[0, -1].argmax()
print(tokenizer.decode([next_token]))
```
---
## Limitations
- **PGN-only**: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
- **Opening-focused**: Training data emphasises the opening phase. Middlegame and
endgame play degrades without constrained decoding.
- **256-token context**: Long games approaching move 60+ may exceed the context window.
- **Not a chess engine**: Does not perform search or lookahead. Move quality reflects
learned opening patterns, not calculation.
---
## Files
| File | Description |
| --------------------------- | -------------------------------------------------------------- |
| `onnx/model.onnx` | Full-precision ONNX (21.6 MB) |
| `onnx/model_quantized.onnx` | Int8 quantized ONNX (5.7 MB) — recommended for browser |
| `tokenizer.json` | BPE tokenizer, loadable by transformers.js and HF `tokenizers` |
| `config.json` | Model architecture |
| `generation_config.json` | Default generation parameters |
---
## Citation
```bibtex
@misc{kn1ght-bullet,
author = {InterwebAlchemy},
title = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
}
```