| --- |
| language: |
| - en |
| license: cc-by-4.0 |
| tags: |
| - chess |
| - pgn |
| - gpt2 |
| - causal-lm |
| - game-playing |
| library_name: transformers |
| pipeline_tag: text-generation |
| datasets: |
| - InterwebAlchemy/pgn-dataset |
| - InterwebAlchemy/pgn-dataset-including-special-tokens |
| - InterwebAlchemy/pgn-lichess-puzzle-dataset |
| thumbnail: https://huggingface.co/InterwebAlchemy/kn1ght-bullet/resolve/main/assets/kn1ght-bullet.png |
| --- |
| |
| # kn1ght-bullet |
|
|
|  |
|
|
| A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation. |
| Intended for use in chess tutoring applications via constrained decoding at inference time. |
|
|
| **bullet** refers to the model's size tier — small and fast, in the same spirit as chess time controls. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | | | |
| | --------------- | ------------------------------------------ | |
| | Architecture | GPT (4 layers, 4 heads, 256 embedding dim) | |
| | Parameters | 4.3M | |
| | Context length | 256 tokens | |
| | Vocabulary | 4,096 BPE tokens (chess PGN–specific) | |
| | Training format | PGN text (`[g_start]1.e4 e5 2.Nf3 ...`) | |
|
|
| The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most |
| moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and |
| makes constrained decoding straightforward — legal move masking is a one-step operation |
| for the large majority of positions. |
|
|
| --- |
|
|
| ## Training Pipeline |
|
|
| Training proceeded in three phases. Pre-training used |
| [InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens) |
| (~3.5M games, average ELO ~2240, spanning 1783–2006), derived from the base |
| [InterwebAlchemy/pgn-dataset](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset) |
| which adds `[g_start]` / `[g_end]` game boundary tokens. |
|
|
| **Phase 1 — Pre-training** |
| 200,000 steps on 100,000 games. The model learns PGN structure and develops opening |
| pattern recognition across a wide range of named lines. |
|
|
| **Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)** |
| A self-improvement loop: generate continuations from named opening prompts, filter to |
| legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and |
| fine-tune. Repeated five times, growing the legal training set from 67 games (9.1% |
| pass rate) to 796 games (67.5% pass rate). |
|
|
| **Phase 3 — DPO (300 steps)** |
| Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions) |
| rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable |
| throughout, confirming the model retains PGN structure. |
|
|
| --- |
|
|
| ## Evaluation |
|
|
| Evaluated against chess-specialist models and frontier LLMs on three tasks. |
|
|
| - **kn1ght models** use the custom 4,096-token chess BPE tokenizer with a `[g_start]` |
| game-start prefix. |
| - **HuggingFace specialist models** (chessgpt-base-v1, chesspythia-70m) use their own |
| model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input |
| is raw PGN text with no special prefix. |
| - **Frontier LLMs** receive raw PGN prompts via the OpenRouter API; completion models |
| (gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a |
| short system prompt (`"You play chess. Reply with only the next move in SAN notation."`). |
|
|
| ### Phase B — Opening play (50 positions × 10 generations) |
|
|
| Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's |
| best move at depth 15. Lower is better. |
|
|
| | Model | Params | Mean CPL ↓ | Legality | Blunder % | |
| | ------------------------------ | -------- | ---------- | --------- | --------- | |
| | Gemini 3.1 Flash Lite | ~8B | **2.58** | 100% | 0.0% | |
| | chessgpt-base-v1 | ~85M | 4.92 | 99.6% | 0.2% | |
| | gpt-3.5-turbo-instruct | ~175B | 5.79 | 99.4% | 0.0% | |
| | **kn1ght-bullet (this model)** | **4.3M** | **5.83** | **99.8%** | **0.0%** | |
| | DeepSeek V3 | ~685B | 8.18 | 86.0% | 0.4% | |
|
|
| kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in |
| mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy |
| Lopez variations well-represented in the training data, and weaker in less-common |
| openings such as the Benoni and Colle System. |
|
|
| ### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations) |
|
|
| Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551), |
| presented as full PGN game context up to the puzzle position. |
|
|
| | Model | Top-1 Accuracy | Legality | |
| | ---------------------- | -------------- | -------- | |
| | Gemini 3.1 Flash Lite | 49% | 98% | |
| | chessgpt-base-v1 | 34% | 97% | |
| | gpt-3.5-turbo-instruct | 26% | 63% | |
| | **kn1ght-bullet** | **10%** | **58%** | |
| | DeepSeek V3 | 12% | 62% | |
|
|
| Tactical puzzle accuracy is constrained by model capacity at this scale. With |
| constrained decoding at inference time, the model selects the highest-ranked legal |
| move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL. |
|
|
| ### Phase C — FEN puzzle accuracy |
|
|
| kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never |
| present in the training data; feeding FEN to the model produces arbitrary output. This |
| is a known and intentional limitation of PGN-only training. |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### With transformers.js (browser / Node.js) |
|
|
| The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser |
| delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference. |
|
|
| ```javascript |
| import { pipeline } from "@xenova/transformers"; |
| |
| const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet"); |
| const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", { |
| max_new_tokens: 10, |
| do_sample: true, |
| temperature: 0.8, |
| top_k: 40, |
| }); |
| ``` |
|
|
| **Constrained decoding** is strongly recommended in production. At each move step, |
| mask the logits to only the token IDs of legal moves (from `chess.js`) before |
| sampling. This guarantees legal play and lets the model's probability distribution |
| over legal moves act as an opening-quality signal. |
|
|
| ```javascript |
| // Build per-position allowlist once, not inside the generation loop |
| const legalMoves = chess.moves(); |
| const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids)); |
| |
| function maskLogits(logits) { |
| for (let i = 0; i < logits.length; i++) { |
| if (!allowedIds.has(i)) logits[i] = -Infinity; |
| } |
| return logits; |
| } |
| ``` |
|
|
| ### With Python (PyTorch) |
|
|
| ```python |
| import torch |
| from tokenizers import Tokenizer |
| |
| # Load the tokenizer |
| tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet") |
| |
| # Load via ONNX (recommended) |
| import onnxruntime as ort |
| session = ort.InferenceSession("onnx/model.onnx") |
| |
| pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5" |
| input_ids = tokenizer.encode(pgn).ids |
| logits = session.run(["logits"], {"input_ids": [input_ids]})[0] |
| next_token = logits[0, -1].argmax() |
| print(tokenizer.decode([next_token])) |
| ``` |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **PGN-only**: Cannot parse FEN notation. Positions must be provided as PGN move sequences. |
| - **Opening-focused**: Training data emphasises the opening phase. Middlegame and |
| endgame play degrades without constrained decoding. |
| - **256-token context**: Long games approaching move 60+ may exceed the context window. |
| - **Not a chess engine**: Does not perform search or lookahead. Move quality reflects |
| learned opening patterns, not calculation. |
|
|
| --- |
|
|
| ## Files |
|
|
| | File | Description | |
| | --------------------------- | -------------------------------------------------------------- | |
| | `onnx/model.onnx` | Full-precision ONNX (21.6 MB) | |
| | `onnx/model_quantized.onnx` | Int8 quantized ONNX (5.7 MB) — recommended for browser | |
| | `tokenizer.json` | BPE tokenizer, loadable by transformers.js and HF `tokenizers` | |
| | `config.json` | Model architecture | |
| | `generation_config.json` | Default generation parameters | |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{kn1ght-bullet, |
| author = {InterwebAlchemy}, |
| title = {kn1ght-bullet: A 4.3M Parameter Chess Language Model}, |
| year = {2026}, |
| publisher = {HuggingFace}, |
| url = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet} |
| } |
| ``` |
|
|