---
language:
  - en
license: cc-by-4.0
tags:
  - chess
  - pgn
  - gpt2
  - causal-lm
  - game-playing
library_name: transformers
pipeline_tag: text-generation
datasets:
  - InterwebAlchemy/pgn-dataset
  - InterwebAlchemy/pgn-dataset-including-special-tokens
  - InterwebAlchemy/pgn-lichess-puzzle-dataset
thumbnail: https://huggingface.co/InterwebAlchemy/kn1ght-bullet/resolve/main/assets/kn1ght-bullet.png
---

# kn1ght-bullet

![kn1ght-bullet](kn1ght-bullet.png)

A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation.
Intended for use in chess tutoring applications via constrained decoding at inference time.

**bullet** refers to the model's size tier — small and fast, in the same spirit as chess time controls.

---

## Model Details

|                 |                                            |
| --------------- | ------------------------------------------ |
| Architecture    | GPT (4 layers, 4 heads, 256 embedding dim) |
| Parameters      | 4.3M                                       |
| Context length  | 256 tokens                                 |
| Vocabulary      | 4,096 BPE tokens (chess PGN–specific)      |
| Training format | PGN text (`[g_start]1.e4 e5 2.Nf3 ...`)    |

The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most
moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and
makes constrained decoding straightforward — legal move masking is a one-step operation
for the large majority of positions.

---

## Training Pipeline

Training proceeded in three phases. Pre-training used
[InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens)
(~3.5M games, average ELO ~2240, spanning 1783–2006), derived from the base
[InterwebAlchemy/pgn-dataset](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset)
which adds `[g_start]` / `[g_end]` game boundary tokens.

**Phase 1 — Pre-training**
200,000 steps on 100,000 games. The model learns PGN structure and develops opening
pattern recognition across a wide range of named lines.

**Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)**
A self-improvement loop: generate continuations from named opening prompts, filter to
legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and
fine-tune. Repeated five times, growing the legal training set from 67 games (9.1%
pass rate) to 796 games (67.5% pass rate).

**Phase 3 — DPO (300 steps)**
Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions)
rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable
throughout, confirming the model retains PGN structure.

---

## Evaluation

Evaluated against chess-specialist models and frontier LLMs on three tasks.

- **kn1ght models** use the custom 4,096-token chess BPE tokenizer with a `[g_start]`
  game-start prefix.
- **HuggingFace specialist models** (chessgpt-base-v1, chesspythia-70m) use their own
  model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input
  is raw PGN text with no special prefix.
- **Frontier LLMs** receive raw PGN prompts via the OpenRouter API; completion models
  (gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a
  short system prompt (`"You play chess. Reply with only the next move in SAN notation."`).

### Phase B — Opening play (50 positions × 10 generations)

Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's
best move at depth 15. Lower is better.

| Model                          | Params   | Mean CPL ↓ | Legality  | Blunder % |
| ------------------------------ | -------- | ---------- | --------- | --------- |
| Gemini 3.1 Flash Lite          | ~8B      | **2.58**   | 100%      | 0.0%      |
| chessgpt-base-v1               | ~85M     | 4.92       | 99.6%     | 0.2%      |
| gpt-3.5-turbo-instruct         | ~175B    | 5.79       | 99.4%     | 0.0%      |
| **kn1ght-bullet (this model)** | **4.3M** | **5.83**   | **99.8%** | **0.0%**  |
| DeepSeek V3                    | ~685B    | 8.18       | 86.0%     | 0.4%      |

kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in
mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy
Lopez variations well-represented in the training data, and weaker in less-common
openings such as the Benoni and Colle System.

### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)

Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551),
presented as full PGN game context up to the puzzle position.

| Model                  | Top-1 Accuracy | Legality |
| ---------------------- | -------------- | -------- |
| Gemini 3.1 Flash Lite  | 49%            | 98%      |
| chessgpt-base-v1       | 34%            | 97%      |
| gpt-3.5-turbo-instruct | 26%            | 63%      |
| **kn1ght-bullet**      | **10%**        | **58%**  |
| DeepSeek V3            | 12%            | 62%      |

Tactical puzzle accuracy is constrained by model capacity at this scale. With
constrained decoding at inference time, the model selects the highest-ranked legal
move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.

### Phase C — FEN puzzle accuracy

kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never
present in the training data; feeding FEN to the model produces arbitrary output. This
is a known and intentional limitation of PGN-only training.

---

## Usage

### With transformers.js (browser / Node.js)

The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser
delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference.

```javascript
import { pipeline } from "@xenova/transformers";

const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
  max_new_tokens: 10,
  do_sample: true,
  temperature: 0.8,
  top_k: 40,
});
```

**Constrained decoding** is strongly recommended in production. At each move step,
mask the logits to only the token IDs of legal moves (from `chess.js`) before
sampling. This guarantees legal play and lets the model's probability distribution
over legal moves act as an opening-quality signal.

```javascript
// Build per-position allowlist once, not inside the generation loop
const legalMoves = chess.moves();
const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));

function maskLogits(logits) {
  for (let i = 0; i < logits.length; i++) {
    if (!allowedIds.has(i)) logits[i] = -Infinity;
  }
  return logits;
}
```

### With Python (PyTorch)

```python
import torch
from tokenizers import Tokenizer

# Load the tokenizer
tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")

# Load via ONNX (recommended)
import onnxruntime as ort
session = ort.InferenceSession("onnx/model.onnx")

pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
input_ids = tokenizer.encode(pgn).ids
logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
next_token = logits[0, -1].argmax()
print(tokenizer.decode([next_token]))
```

---

## Limitations

- **PGN-only**: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
- **Opening-focused**: Training data emphasises the opening phase. Middlegame and
  endgame play degrades without constrained decoding.
- **256-token context**: Long games approaching move 60+ may exceed the context window.
- **Not a chess engine**: Does not perform search or lookahead. Move quality reflects
  learned opening patterns, not calculation.

---

## Files

| File                        | Description                                                    |
| --------------------------- | -------------------------------------------------------------- |
| `onnx/model.onnx`           | Full-precision ONNX (21.6 MB)                                  |
| `onnx/model_quantized.onnx` | Int8 quantized ONNX (5.7 MB) — recommended for browser         |
| `tokenizer.json`            | BPE tokenizer, loadable by transformers.js and HF `tokenizers` |
| `config.json`               | Model architecture                                             |
| `generation_config.json`    | Default generation parameters                                  |

---

## Citation

```bibtex
@misc{kn1ght-bullet,
  author       = {InterwebAlchemy},
  title        = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
  year         = {2026},
  publisher    = {HuggingFace},
  url          = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
}
```