PAWN / README.md
thomas-schweich's picture
Fix citation.
8aabd3f
---
library_name: pawn
license: apache-2.0
tags:
- chess
- transformer
- world-model
- causal-lm
- next-token-prediction
- representation-learning
- parameter-efficient-finetuning
- pytorch
- rust
language:
- en
pipeline_tag: other
citation: |
@software{schweich2026pawn,
author = {Schweich, Thomas},
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
year = 2026,
url = {https://github.com/thomas-schweich/PAWN},
license = {Apache-2.0}
}
---
# PAWN: Playstyle-Agnostic World-model Network for Chess
PAWN is a small causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.
PAWN is designed as a testbed for finetuning and augmentation methods at small scale. Because the pretrained model is entirely unopinionated (trained only on uniformly random legal moves), it serves as a blank slate that can be adapted, augmented, and finetuned into arbitrary player models with unique playstyles.
Finetuning PAWN has proven significantly more parameter-efficient than training new models from scratch and requires minimal compute resources.
**[GitHub Repository](https://github.com/thomas-schweich/PAWN)**
## Model Variants
| Variant | d_model | Layers | Heads | Parameters | Link |
|---------|---------|--------|-------|------------|------|
| PAWN-Small | 256 | 8 | 4 | ~9.5M | [thomas-schweich/pawn-small](https://huggingface.co/thomas-schweich/pawn-small) |
| PAWN (Base) | 512 | 8 | 8 | ~35.8M | [thomas-schweich/pawn-base](https://huggingface.co/thomas-schweich/pawn-base) |
| PAWN-Large | 640 | 10 | 8 | ~68.4M | [thomas-schweich/pawn-large](https://huggingface.co/thomas-schweich/pawn-large) |
All variants share the same architecture (RMSNorm, SwiGLU, RoPE, factored move embeddings) and vocabulary (4,278 tokens). They differ only in width, depth, and head count.
## Quickstart
```bash
# Clone and build
git clone https://github.com/thomas-schweich/PAWN.git && cd PAWN
# Build the Rust chess engine (required -- handles all game logic)
cd engine && uv run --with maturin maturin develop --release && cd ..
# Install Python dependencies
uv sync --extra cu128 # NVIDIA (or --extra rocm for AMD)
# Dev tools (pytest, seaborn, solara, etc.) are included in base dependencies
# — no extra flags needed beyond the GPU backend above
# Pull a pretrained checkpoint
git submodule update --init checkpoints/pawn-base
```
### Load and generate moves
```python
import torch
from safetensors.torch import load_file
from pawn.config import CLMConfig, WHITE_CHECKMATES
from pawn.model import PAWNCLM
# Load the model
cfg = CLMConfig.base()
model = PAWNCLM(cfg).cuda().eval()
weights = load_file("checkpoints/pawn-base/model.safetensors", device="cuda")
model.load_state_dict(weights)
# Condition on outcome and generate a game
input_ids = torch.tensor([[WHITE_CHECKMATES]], device="cuda")
pad_mask = torch.ones(1, 1, dtype=torch.bool, device="cuda")
logits, _ = model.forward_generate(input_ids, pad_mask)
next_token = logits[0, -1].argmax()
```
### Train an adapter
```bash
uv sync --extra dev
git submodule update --init checkpoints/pawn-base
uv run python scripts/train_bottleneck.py \
--checkpoint checkpoints/pawn-base \
--pgn data/lichess_1800_1900.pgn \
--bottleneck-dim 32 --lr 1e-4 --local-checkpoints
```
## Architecture
PAWN is a decoder-only transformer trained with next-token prediction on chess move sequences. Each sequence has the format:
```
[outcome] [ply_1] [ply_2] ... [ply_N] [PAD] ... [PAD]
```
The token vocabulary covers all possible source-destination square pairs on the 8x8 board (4,096 grid moves), promotion moves (176 tokens for 4 piece types across 44 eligible square pairs), 5 outcome tokens, and 1 padding token.
Move embeddings are factored: each move token is decomposed into source square + destination square + promotion piece, with embeddings summed. This provides structural inductive bias (moves sharing a source or destination share embedding components) while reducing embedding parameters by roughly 32x.
The model uses pre-norm RMSNorm, SwiGLU feed-forward layers (4x expansion), Rotary Position Embeddings (RoPE), and a 256-token context window. All chess logic -- game simulation, move generation, tokenization, and legal move computation -- is handled by a bundled Rust engine built on [shakmaty](https://github.com/niklasf/shakmaty).
For full architectural details, see [docs/ARCHITECTURE.md](https://github.com/thomas-schweich/PAWN/blob/main/docs/ARCHITECTURE.md).
## What the Model Learns
Despite training exclusively on random games, PAWN develops rich internal representations:
- **Legal move prediction**: The model achieves over 98% legal move rate, accurately predicting which moves are legal from move history alone.
- **Board state tracking**: Linear probes on hidden states decode piece positions, check status, castling rights, material counts, and game phase with high accuracy -- even though the model never sees explicit board representations.
These properties make PAWN useful as a frozen backbone for downstream tasks. See the [adapter documentation](https://github.com/thomas-schweich/PAWN/blob/main/docs/ADAPTERS.md) for fine-tuning results.
## Adapter Methods
PAWN ships with six adapter implementations for fine-tuning the frozen backbone on human game data:
| Method | Parameters | Description |
|--------|-----------|-------------|
| Bottleneck | ~131K | Houlsby-style residual MLP adapters |
| RoSA | configurable | Gradient-informed sparse + LoRA ([Nikdan et al., 2024](https://arxiv.org/abs/2401.04679)) |
| Sparse | 503K--2.7M | Random binary mask on frozen weights |
| LoRA | ~65K | Low-rank attention projection adapters |
| Hybrid | ~65K | LoRA + FiLM combined |
| FiLM | ~17K | Per-channel affine modulation |
## Citation
```bibtex
@software{schweich2026pawn,
author = {Schweich, Thomas},
title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
year = 2026,
url = {https://github.com/thomas-schweich/PAWN},
license = {Apache-2.0}
}
```
## License
Apache 2.0. See [LICENSE](https://github.com/thomas-schweich/PAWN/blob/main/LICENSE).