--- language: - en license: cc-by-4.0 tags: - chess - pgn - gpt2 - causal-lm - game-playing library_name: transformers pipeline_tag: text-generation datasets: - InterwebAlchemy/pgn-dataset - InterwebAlchemy/pgn-dataset-including-special-tokens - InterwebAlchemy/pgn-lichess-puzzle-dataset thumbnail: https://huggingface.co/InterwebAlchemy/kn1ght-bullet/resolve/main/assets/kn1ght-bullet.png --- # kn1ght-bullet ![kn1ght-bullet](kn1ght-bullet.png) A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation. Intended for use in chess tutoring applications via constrained decoding at inference time. **bullet** refers to the model's size tier — small and fast, in the same spirit as chess time controls. --- ## Model Details | | | | --------------- | ------------------------------------------ | | Architecture | GPT (4 layers, 4 heads, 256 embedding dim) | | Parameters | 4.3M | | Context length | 256 tokens | | Vocabulary | 4,096 BPE tokens (chess PGN–specific) | | Training format | PGN text (`[g_start]1.e4 e5 2.Nf3 ...`) | The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and makes constrained decoding straightforward — legal move masking is a one-step operation for the large majority of positions. --- ## Training Pipeline Training proceeded in three phases. Pre-training used [InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens) (~3.5M games, average ELO ~2240, spanning 1783–2006), derived from the base [InterwebAlchemy/pgn-dataset](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset) which adds `[g_start]` / `[g_end]` game boundary tokens. **Phase 1 — Pre-training** 200,000 steps on 100,000 games. The model learns PGN structure and develops opening pattern recognition across a wide range of named lines. **Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)** A self-improvement loop: generate continuations from named opening prompts, filter to legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and fine-tune. Repeated five times, growing the legal training set from 67 games (9.1% pass rate) to 796 games (67.5% pass rate). **Phase 3 — DPO (300 steps)** Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions) rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable throughout, confirming the model retains PGN structure. --- ## Evaluation Evaluated against chess-specialist models and frontier LLMs on three tasks. - **kn1ght models** use the custom 4,096-token chess BPE tokenizer with a `[g_start]` game-start prefix. - **HuggingFace specialist models** (chessgpt-base-v1, chesspythia-70m) use their own model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input is raw PGN text with no special prefix. - **Frontier LLMs** receive raw PGN prompts via the OpenRouter API; completion models (gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a short system prompt (`"You play chess. Reply with only the next move in SAN notation."`). ### Phase B — Opening play (50 positions × 10 generations) Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's best move at depth 15. Lower is better. | Model | Params | Mean CPL ↓ | Legality | Blunder % | | ------------------------------ | -------- | ---------- | --------- | --------- | | Gemini 3.1 Flash Lite | ~8B | **2.58** | 100% | 0.0% | | chessgpt-base-v1 | ~85M | 4.92 | 99.6% | 0.2% | | gpt-3.5-turbo-instruct | ~175B | 5.79 | 99.4% | 0.0% | | **kn1ght-bullet (this model)** | **4.3M** | **5.83** | **99.8%** | **0.0%** | | DeepSeek V3 | ~685B | 8.18 | 86.0% | 0.4% | kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy Lopez variations well-represented in the training data, and weaker in less-common openings such as the Benoni and Colle System. ### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations) Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551), presented as full PGN game context up to the puzzle position. | Model | Top-1 Accuracy | Legality | | ---------------------- | -------------- | -------- | | Gemini 3.1 Flash Lite | 49% | 98% | | chessgpt-base-v1 | 34% | 97% | | gpt-3.5-turbo-instruct | 26% | 63% | | **kn1ght-bullet** | **10%** | **58%** | | DeepSeek V3 | 12% | 62% | Tactical puzzle accuracy is constrained by model capacity at this scale. With constrained decoding at inference time, the model selects the highest-ranked legal move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL. ### Phase C — FEN puzzle accuracy kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never present in the training data; feeding FEN to the model produces arbitrary output. This is a known and intentional limitation of PGN-only training. --- ## Usage ### With transformers.js (browser / Node.js) The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference. ```javascript import { pipeline } from "@xenova/transformers"; const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet"); const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", { max_new_tokens: 10, do_sample: true, temperature: 0.8, top_k: 40, }); ``` **Constrained decoding** is strongly recommended in production. At each move step, mask the logits to only the token IDs of legal moves (from `chess.js`) before sampling. This guarantees legal play and lets the model's probability distribution over legal moves act as an opening-quality signal. ```javascript // Build per-position allowlist once, not inside the generation loop const legalMoves = chess.moves(); const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids)); function maskLogits(logits) { for (let i = 0; i < logits.length; i++) { if (!allowedIds.has(i)) logits[i] = -Infinity; } return logits; } ``` ### With Python (PyTorch) ```python import torch from tokenizers import Tokenizer # Load the tokenizer tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet") # Load via ONNX (recommended) import onnxruntime as ort session = ort.InferenceSession("onnx/model.onnx") pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5" input_ids = tokenizer.encode(pgn).ids logits = session.run(["logits"], {"input_ids": [input_ids]})[0] next_token = logits[0, -1].argmax() print(tokenizer.decode([next_token])) ``` --- ## Limitations - **PGN-only**: Cannot parse FEN notation. Positions must be provided as PGN move sequences. - **Opening-focused**: Training data emphasises the opening phase. Middlegame and endgame play degrades without constrained decoding. - **256-token context**: Long games approaching move 60+ may exceed the context window. - **Not a chess engine**: Does not perform search or lookahead. Move quality reflects learned opening patterns, not calculation. --- ## Files | File | Description | | --------------------------- | -------------------------------------------------------------- | | `onnx/model.onnx` | Full-precision ONNX (21.6 MB) | | `onnx/model_quantized.onnx` | Int8 quantized ONNX (5.7 MB) — recommended for browser | | `tokenizer.json` | BPE tokenizer, loadable by transformers.js and HF `tokenizers` | | `config.json` | Model architecture | | `generation_config.json` | Default generation parameters | --- ## Citation ```bibtex @misc{kn1ght-bullet, author = {InterwebAlchemy}, title = {kn1ght-bullet: A 4.3M Parameter Chess Language Model}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet} } ```