Upload folder using huggingface_hub

cb98ff2 verified 11 days ago

8.79 kB

	---
	language:
	- en
	license: cc-by-4.0
	tags:
	- chess
	- pgn
	- gpt2
	- causal-lm
	- game-playing
	library_name: transformers
	pipeline_tag: text-generation
	datasets:
	- InterwebAlchemy/pgn-dataset
	- InterwebAlchemy/pgn-dataset-including-special-tokens
	- InterwebAlchemy/pgn-lichess-puzzle-dataset
	thumbnail: https://huggingface.co/InterwebAlchemy/kn1ght-bullet/resolve/main/assets/kn1ght-bullet.png
	---

	# kn1ght-bullet

	![kn1ght-bullet](kn1ght-bullet.png)

	A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation.
	Intended for use in chess tutoring applications via constrained decoding at inference time.

	bullet refers to the model's size tier — small and fast, in the same spirit as chess time controls.

	---

	## Model Details

	\| \| \|
	\| --------------- \| ------------------------------------------ \|
	\| Architecture \| GPT (4 layers, 4 heads, 256 embedding dim) \|
	\| Parameters \| 4.3M \|
	\| Context length \| 256 tokens \|
	\| Vocabulary \| 4,096 BPE tokens (chess PGN–specific) \|
	\| Training format \| PGN text (`[g_start]1.e4 e5 2.Nf3 ...`) \|

	The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most
	moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and
	makes constrained decoding straightforward — legal move masking is a one-step operation
	for the large majority of positions.

	---

	## Training Pipeline

	Training proceeded in three phases. Pre-training used
	[InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens)
	(~3.5M games, average ELO ~2240, spanning 1783–2006), derived from the base
	[InterwebAlchemy/pgn-dataset](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset)
	which adds `[g_start]` / `[g_end]` game boundary tokens.

	Phase 1 — Pre-training
	200,000 steps on 100,000 games. The model learns PGN structure and develops opening
	pattern recognition across a wide range of named lines.

	Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)
	A self-improvement loop: generate continuations from named opening prompts, filter to
	legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and
	fine-tune. Repeated five times, growing the legal training set from 67 games (9.1%
	pass rate) to 796 games (67.5% pass rate).

	Phase 3 — DPO (300 steps)
	Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions)
	rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable
	throughout, confirming the model retains PGN structure.

	---

	## Evaluation

	Evaluated against chess-specialist models and frontier LLMs on three tasks.

	- kn1ght models use the custom 4,096-token chess BPE tokenizer with a `[g_start]`
	game-start prefix.
	- HuggingFace specialist models (chessgpt-base-v1, chesspythia-70m) use their own
	model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input
	is raw PGN text with no special prefix.
	- Frontier LLMs receive raw PGN prompts via the OpenRouter API; completion models
	(gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a
	short system prompt (`"You play chess. Reply with only the next move in SAN notation."`).

	### Phase B — Opening play (50 positions × 10 generations)

	Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's
	best move at depth 15. Lower is better.

	\| Model \| Params \| Mean CPL ↓ \| Legality \| Blunder % \|
	\| ------------------------------ \| -------- \| ---------- \| --------- \| --------- \|
	\| Gemini 3.1 Flash Lite \| ~8B \| 2.58 \| 100% \| 0.0% \|
	\| chessgpt-base-v1 \| ~85M \| 4.92 \| 99.6% \| 0.2% \|
	\| gpt-3.5-turbo-instruct \| ~175B \| 5.79 \| 99.4% \| 0.0% \|
	\| kn1ght-bullet (this model) \| 4.3M \| 5.83 \| 99.8% \| 0.0% \|
	\| DeepSeek V3 \| ~685B \| 8.18 \| 86.0% \| 0.4% \|

	kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in
	mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy
	Lopez variations well-represented in the training data, and weaker in less-common
	openings such as the Benoni and Colle System.

	### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)

	Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551),
	presented as full PGN game context up to the puzzle position.

	\| Model \| Top-1 Accuracy \| Legality \|
	\| ---------------------- \| -------------- \| -------- \|
	\| Gemini 3.1 Flash Lite \| 49% \| 98% \|
	\| chessgpt-base-v1 \| 34% \| 97% \|
	\| gpt-3.5-turbo-instruct \| 26% \| 63% \|
	\| kn1ght-bullet \| 10% \| 58% \|
	\| DeepSeek V3 \| 12% \| 62% \|

	Tactical puzzle accuracy is constrained by model capacity at this scale. With
	constrained decoding at inference time, the model selects the highest-ranked legal
	move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.

	### Phase C — FEN puzzle accuracy

	kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never
	present in the training data; feeding FEN to the model produces arbitrary output. This
	is a known and intentional limitation of PGN-only training.

	---

	## Usage

	### With transformers.js (browser / Node.js)

	The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser
	delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference.

	```javascript
	import { pipeline } from "@xenova/transformers";

	const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
	const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
	max_new_tokens: 10,
	do_sample: true,
	temperature: 0.8,
	top_k: 40,
	});
	```

	Constrained decoding is strongly recommended in production. At each move step,
	mask the logits to only the token IDs of legal moves (from `chess.js`) before
	sampling. This guarantees legal play and lets the model's probability distribution
	over legal moves act as an opening-quality signal.

	```javascript
	// Build per-position allowlist once, not inside the generation loop
	const legalMoves = chess.moves();
	const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));

	function maskLogits(logits) {
	for (let i = 0; i < logits.length; i++) {
	if (!allowedIds.has(i)) logits[i] = -Infinity;
	}
	return logits;
	}
	```

	### With Python (PyTorch)

	```python
	import torch
	from tokenizers import Tokenizer

	# Load the tokenizer
	tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")

	# Load via ONNX (recommended)
	import onnxruntime as ort
	session = ort.InferenceSession("onnx/model.onnx")

	pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
	input_ids = tokenizer.encode(pgn).ids
	logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
	next_token = logits[0, -1].argmax()
	print(tokenizer.decode([next_token]))
	```

	---

	## Limitations

	- PGN-only: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
	- Opening-focused: Training data emphasises the opening phase. Middlegame and
	endgame play degrades without constrained decoding.
	- 256-token context: Long games approaching move 60+ may exceed the context window.
	- Not a chess engine: Does not perform search or lookahead. Move quality reflects
	learned opening patterns, not calculation.

	---

	## Files

	\| File \| Description \|
	\| --------------------------- \| -------------------------------------------------------------- \|
	\| `onnx/model.onnx` \| Full-precision ONNX (21.6 MB) \|
	\| `onnx/model_quantized.onnx` \| Int8 quantized ONNX (5.7 MB) — recommended for browser \|
	\| `tokenizer.json` \| BPE tokenizer, loadable by transformers.js and HF `tokenizers` \|
	\| `config.json` \| Model architecture \|
	\| `generation_config.json` \| Default generation parameters \|

	---

	## Citation

	```bibtex
	@misc{kn1ght-bullet,
	author = {InterwebAlchemy},
	title = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
	year = {2026},
	publisher = {HuggingFace},
	url = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
	}
	```