Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +212 -3
config.json +26 -0
generation_config.json +10 -0
onnx/model.onnx +3 -0
onnx/model_quantized.onnx +3 -0
tokenizer.json +0 -0
tokenizer_config.json +9 -0

README.md CHANGED Viewed

@@ -1,3 +1,212 @@
----
-license: cc-by-4.0
----

+---
+language:
+  - en
+license: mit
+tags:
+  - chess
+  - pgn
+  - gpt2
+  - causal-lm
+  - game-playing
+library_name: transformers
+pipeline_tag: text-generation
+---
+# kn1ght-bullet
+A 4.3M parameter GPT trained to play chess by next-token prediction on PGN notation.
+Intended for use in chess tutoring applications via constrained decoding at inference time.
+**bullet** refers to the model's size tier — small and fast, in the same spirit as chess time controls.
+---
+## Model Details
+|                 |                                            |
+| --------------- | ------------------------------------------ |
+| Architecture    | GPT (4 layers, 4 heads, 256 embedding dim) |
+| Parameters      | 4.3M                                       |
+| Context length  | 256 tokens                                 |
+| Vocabulary      | 4,096 BPE tokens (chess PGN–specific)      |
+| Training format | PGN text (`[g_start]1.e4 e5 2.Nf3 ...`)    |
+The tokenizer is a BPE vocabulary built specifically for chess PGN notation, where most
+moves (`e4`, `Nf3`, `O-O`, `cxd5`) encode as single tokens. This keeps inference fast and
+makes constrained decoding straightforward — legal move masking is a one-step operation
+for the large majority of positions.
+---
+## Training Pipeline
+Training proceeded in three phases from the
+[InterwebAlchemy/pgn-dataset-including-special-tokens](https://huggingface.co/datasets/InterwebAlchemy/pgn-dataset-including-special-tokens)
+dataset (~3.5M games, average ELO ~2240, spanning 1783–2006).
+**Phase 1 — Pre-training**
+200,000 steps on 100,000 games. The model learns PGN structure and develops opening
+pattern recognition across a wide range of named lines.
+**Phase 2 — Legality-Filtered SFT (5 rounds × 5,000 steps)**
+A self-improvement loop: generate continuations from named opening prompts, filter to
+legally-valid games, mix with HuggingFace anchor games to prevent forgetting, and
+fine-tune. Repeated five times, growing the legal training set from 67 games (9.1%
+pass rate) to 796 games (67.5% pass rate).
+**Phase 3 — DPO (300 steps)**
+Stockfish-generated preference pairs (771 chosen/rejected pairs from 783 positions)
+rank legal moves by quality. Val reward accuracy: 0.885. SFT loss remains stable
+throughout, confirming the model retains PGN structure.
+---
+## Evaluation
+Evaluated against chess-specialist models and frontier LLMs on three tasks.
+- **kn1ght models** use the custom 4,096-token chess BPE tokenizer with a `[g_start]`
+  game-start prefix.
+- **HuggingFace specialist models** (chessgpt-base-v1, chesspythia-70m) use their own
+  model-specific tokenizers, loaded automatically via the HuggingFace pipeline. Input
+  is raw PGN text with no special prefix.
+- **Frontier LLMs** receive raw PGN prompts via the OpenRouter API; completion models
+  (gpt-3.5-turbo-instruct, gpt-oss-20b) get a bare PGN string, chat models get a
+  short system prompt (`"You play chess. Reply with only the next move in SAN notation."`).
+### Phase B — Opening play (50 positions × 10 generations)
+Centipawn loss (CPL) measures how much worse a model's move is compared to Stockfish's
+best move at depth 15. Lower is better.
+| Model                          | Params   | Mean CPL ↓ | Legality  | Blunder % |
+| ------------------------------ | -------- | ---------- | --------- | --------- |
+| Gemini 3.1 Flash Lite          | ~8B      | **2.58**   | 100%      | 0.0%      |
+| chessgpt-base-v1               | ~85M     | 4.92       | 99.6%     | 0.2%      |
+| gpt-3.5-turbo-instruct         | ~175B    | 5.79       | 99.4%     | 0.0%      |
+| **kn1ght-bullet (this model)** | **4.3M** | **5.83**   | **99.8%** | **0.0%**  |
+| DeepSeek V3                    | ~685B    | 8.18       | 86.0%     | 0.4%      |
+kn1ght-bullet matches gpt-3.5-turbo-instruct (a ~175B parameter frontier model) in
+mean CPL while being 40,000× smaller. Performance is strongest in Sicilian and Ruy
+Lopez variations well-represented in the training data, and weaker in less-common
+openings such as the Benoni and Colle System.
+### Phase C' — PGN puzzle accuracy (20 puzzles × 10 generations)
+Puzzles are drawn from the Lichess Open Puzzle Database (ratings 1201–1895, mean 1551),
+presented as full PGN game context up to the puzzle position.
+| Model                  | Top-1 Accuracy | Legality |
+| ---------------------- | -------------- | -------- |
+| Gemini 3.1 Flash Lite  | 49%            | 98%      |
+| chessgpt-base-v1       | 34%            | 97%      |
+| gpt-3.5-turbo-instruct | 26%            | 63%      |
+| **kn1ght-bullet**      | **10%**        | **58%**  |
+| DeepSeek V3            | 12%            | 62%      |
+Tactical puzzle accuracy is constrained by model capacity at this scale. With
+constrained decoding at inference time, the model selects the highest-ranked legal
+move — puzzle accuracy is less relevant to the tutoring use case than opening-play CPL.
+### Phase C — FEN puzzle accuracy
+kn1ght-bullet scores 0% on FEN-format puzzles, as expected. FEN notation was never
+present in the training data; feeding FEN to the model produces arbitrary output. This
+is a known and intentional limitation of PGN-only training.
+---
+## Usage
+### With transformers.js (browser / Node.js)
+The primary intended runtime. Use `onnx/model_quantized.onnx` (5.7 MB) for browser
+delivery; `onnx/model.onnx` (21.6 MB) for full-precision inference.
+```javascript
+import { pipeline } from "@xenova/transformers";
+const generator = await pipeline("text-generation", "InterwebAlchemy/kn1ght-bullet");
+const result = await generator("[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5", {
+  max_new_tokens: 10,
+  do_sample: true,
+  temperature: 0.8,
+  top_k: 40,
+});
+```
+**Constrained decoding** is strongly recommended in production. At each move step,
+mask the logits to only the token IDs of legal moves (from `chess.js`) before
+sampling. This guarantees legal play and lets the model's probability distribution
+over legal moves act as an opening-quality signal.
+```javascript
+// Build per-position allowlist once, not inside the generation loop
+const legalMoves = chess.moves();
+const allowedIds = new Set(legalMoves.flatMap((san) => tokenizer.encode(san).input_ids));
+function maskLogits(logits) {
+  for (let i = 0; i < logits.length; i++) {
+    if (!allowedIds.has(i)) logits[i] = -Infinity;
+  }
+  return logits;
+}
+```
+### With Python (PyTorch)
+```python
+import torch
+from tokenizers import Tokenizer
+# Load the tokenizer
+tokenizer = Tokenizer.from_pretrained("InterwebAlchemy/kn1ght-bullet")
+# Load via ONNX (recommended)
+import onnxruntime as ort
+session = ort.InferenceSession("onnx/model.onnx")
+pgn = "[g_start]1.e4 e5 2.Nf3 Nc6 3.Bb5"
+input_ids = tokenizer.encode(pgn).ids
+logits = session.run(["logits"], {"input_ids": [input_ids]})[0]
+next_token = logits[0, -1].argmax()
+print(tokenizer.decode([next_token]))
+```
+---
+## Limitations
+- **PGN-only**: Cannot parse FEN notation. Positions must be provided as PGN move sequences.
+- **Opening-focused**: Training data emphasises the opening phase. Middlegame and
+  endgame play degrades without constrained decoding.
+- **256-token context**: Long games approaching move 60+ may exceed the context window.
+- **Not a chess engine**: Does not perform search or lookahead. Move quality reflects
+  learned opening patterns, not calculation.
+---
+## Files
+| File                        | Description                                                    |
+| --------------------------- | -------------------------------------------------------------- |
+| `onnx/model.onnx`           | Full-precision ONNX (21.6 MB)                                  |
+| `onnx/model_quantized.onnx` | Int8 quantized ONNX (5.7 MB) — recommended for browser         |
+| `tokenizer.json`            | BPE tokenizer, loadable by transformers.js and HF `tokenizers` |
+| `config.json`               | Model architecture                                             |
+| `generation_config.json`    | Default generation parameters                                  |
+---
+## Citation
+```bibtex
+@misc{kn1ght-bullet,
+  author       = {InterwebAlchemy},
+  title        = {kn1ght-bullet: A 4.3M Parameter Chess Language Model},
+  year         = {2026},
+  publisher    = {HuggingFace},
+  url          = {https://huggingface.co/InterwebAlchemy/kn1ght-bullet}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "model_type": "gpt2",
+  "architectures": [
+    "GPT2LMHeadModel"
+  ],
+  "_name_or_path": "InterwebAlchemy/kn1ght-bullet",
+  "vocab_size": 4096,
+  "n_embd": 256,
+  "n_head": 4,
+  "n_layer": 4,
+  "n_positions": 256,
+  "n_inner": 1024,
+  "activation_function": "gelu_new",
+  "resid_pdrop": 0.0,
+  "embd_pdrop": 0.0,
+  "attn_pdrop": 0.0,
+  "layer_norm_epsilon": 1e-05,
+  "initializer_range": 0.02,
+  "scale_attn_weights": true,
+  "reorder_and_upcast_attn": false,
+  "scale_attn_by_inverse_layer_idx": false,
+  "use_cache": true,
+  "bos_token_id": 0,
+  "eos_token_id": 1,
+  "pad_token_id": 3
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 1,
+  "pad_token_id": 3,
+  "max_new_tokens": 256,
+  "do_sample": true,
+  "temperature": 0.8,
+  "top_k": 40
+}

onnx/model.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a239d08c78ed7d9984b1e5cf57b34bd940c6ab6ecf44e4ef858410740b0d3a0
+size 21573148

onnx/model_quantized.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:36f840dbe6a77af06e578663de595804bd8df99cb5ec02151e6d49d8021b1b8d
+size 5694652

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "bos_token": "[g_start]",
+  "eos_token": "[g_end]",
+  "unk_token": "[unknown]",
+  "pad_token": "[pad]",
+  "model_max_length": 256,
+  "clean_up_tokenization_spaces": false
+}