File size: 6,315 Bytes
34c9cd8 ea5e0ca 8aabd3f ea5e0ca 8aabd3f ea5e0ca 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 e27101c d7ecc62 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 a188746 34c9cd8 a188746 34c9cd8 a188746 34c9cd8 a188746 ae46efa 3d0031b 34c9cd8 3d0031b 34c9cd8 230508d 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 3d0031b 34c9cd8 d7ecc62 34c9cd8 3d0031b 34c9cd8 230508d 34c9cd8 d7ecc62 230508d d7ecc62 230508d d7ecc62 34c9cd8 d7ecc62 34c9cd8 3d0031b 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 a188746 34c9cd8 d7ecc62 34c9cd8 d7ecc62 34c9cd8 d7ecc62 95f9aba 34c9cd8 95f9aba 34c9cd8 8aabd3f 34c9cd8 8aabd3f 34c9cd8 d7ecc62 e27101c d7ecc62 34c9cd8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | ---
library_name: pawn
license: apache-2.0
tags:
- chess
- transformer
- world-model
- causal-lm
- next-token-prediction
- representation-learning
- parameter-efficient-finetuning
- pytorch
- rust
language:
- en
pipeline_tag: other
citation: |
@software{schweich2026pawn,
author = {Schweich, Thomas},
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
year = 2026,
url = {https://github.com/thomas-schweich/PAWN},
license = {Apache-2.0}
}
---
# PAWN: Playstyle-Agnostic World-model Network for Chess
PAWN is a small causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.
PAWN is designed as a testbed for finetuning and augmentation methods at small scale. Because the pretrained model is entirely unopinionated (trained only on uniformly random legal moves), it serves as a blank slate that can be adapted, augmented, and finetuned into arbitrary player models with unique playstyles.
Finetuning PAWN has proven significantly more parameter-efficient than training new models from scratch and requires minimal compute resources.
**[GitHub Repository](https://github.com/thomas-schweich/PAWN)**
## Model Variants
| Variant | d_model | Layers | Heads | Parameters | Link |
|---------|---------|--------|-------|------------|------|
| PAWN-Small | 256 | 8 | 4 | ~9.5M | [thomas-schweich/pawn-small](https://huggingface.co/thomas-schweich/pawn-small) |
| PAWN (Base) | 512 | 8 | 8 | ~35.8M | [thomas-schweich/pawn-base](https://huggingface.co/thomas-schweich/pawn-base) |
| PAWN-Large | 640 | 10 | 8 | ~68.4M | [thomas-schweich/pawn-large](https://huggingface.co/thomas-schweich/pawn-large) |
All variants share the same architecture (RMSNorm, SwiGLU, RoPE, factored move embeddings) and vocabulary (4,278 tokens). They differ only in width, depth, and head count.
## Quickstart
```bash
# Clone and build
git clone https://github.com/thomas-schweich/PAWN.git && cd PAWN
# Build the Rust chess engine (required -- handles all game logic)
cd engine && uv run --with maturin maturin develop --release && cd ..
# Install Python dependencies
uv sync --extra cu128 # NVIDIA (or --extra rocm for AMD)
# Dev tools (pytest, seaborn, solara, etc.) are included in base dependencies
# — no extra flags needed beyond the GPU backend above
# Pull a pretrained checkpoint
git submodule update --init checkpoints/pawn-base
```
### Load and generate moves
```python
import torch
from safetensors.torch import load_file
from pawn.config import CLMConfig, WHITE_CHECKMATES
from pawn.model import PAWNCLM
# Load the model
cfg = CLMConfig.base()
model = PAWNCLM(cfg).cuda().eval()
weights = load_file("checkpoints/pawn-base/model.safetensors", device="cuda")
model.load_state_dict(weights)
# Condition on outcome and generate a game
input_ids = torch.tensor([[WHITE_CHECKMATES]], device="cuda")
pad_mask = torch.ones(1, 1, dtype=torch.bool, device="cuda")
logits, _ = model.forward_generate(input_ids, pad_mask)
next_token = logits[0, -1].argmax()
```
### Train an adapter
```bash
uv sync --extra dev
git submodule update --init checkpoints/pawn-base
uv run python scripts/train_bottleneck.py \
--checkpoint checkpoints/pawn-base \
--pgn data/lichess_1800_1900.pgn \
--bottleneck-dim 32 --lr 1e-4 --local-checkpoints
```
## Architecture
PAWN is a decoder-only transformer trained with next-token prediction on chess move sequences. Each sequence has the format:
```
[outcome] [ply_1] [ply_2] ... [ply_N] [PAD] ... [PAD]
```
The token vocabulary covers all possible source-destination square pairs on the 8x8 board (4,096 grid moves), promotion moves (176 tokens for 4 piece types across 44 eligible square pairs), 5 outcome tokens, and 1 padding token.
Move embeddings are factored: each move token is decomposed into source square + destination square + promotion piece, with embeddings summed. This provides structural inductive bias (moves sharing a source or destination share embedding components) while reducing embedding parameters by roughly 32x.
The model uses pre-norm RMSNorm, SwiGLU feed-forward layers (4x expansion), Rotary Position Embeddings (RoPE), and a 256-token context window. All chess logic -- game simulation, move generation, tokenization, and legal move computation -- is handled by a bundled Rust engine built on [shakmaty](https://github.com/niklasf/shakmaty).
For full architectural details, see [docs/ARCHITECTURE.md](https://github.com/thomas-schweich/PAWN/blob/main/docs/ARCHITECTURE.md).
## What the Model Learns
Despite training exclusively on random games, PAWN develops rich internal representations:
- **Legal move prediction**: The model achieves over 98% legal move rate, accurately predicting which moves are legal from move history alone.
- **Board state tracking**: Linear probes on hidden states decode piece positions, check status, castling rights, material counts, and game phase with high accuracy -- even though the model never sees explicit board representations.
These properties make PAWN useful as a frozen backbone for downstream tasks. See the [adapter documentation](https://github.com/thomas-schweich/PAWN/blob/main/docs/ADAPTERS.md) for fine-tuning results.
## Adapter Methods
PAWN ships with six adapter implementations for fine-tuning the frozen backbone on human game data:
| Method | Parameters | Description |
|--------|-----------|-------------|
| Bottleneck | ~131K | Houlsby-style residual MLP adapters |
| RoSA | configurable | Gradient-informed sparse + LoRA ([Nikdan et al., 2024](https://arxiv.org/abs/2401.04679)) |
| Sparse | 503K--2.7M | Random binary mask on frozen weights |
| LoRA | ~65K | Low-rank attention projection adapters |
| Hybrid | ~65K | LoRA + FiLM combined |
| FiLM | ~17K | Per-channel affine modulation |
## Citation
```bibtex
@software{schweich2026pawn,
author = {Schweich, Thomas},
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
year = 2026,
url = {https://github.com/thomas-schweich/PAWN},
license = {Apache-2.0}
}
```
## License
Apache 2.0. See [LICENSE](https://github.com/thomas-schweich/PAWN/blob/main/LICENSE).
|