PAWN / README.md

Fix citation.

8aabd3f about 1 hour ago

6.32 kB

	---
	library_name: pawn
	license: apache-2.0
	tags:
	- chess
	- transformer
	- world-model
	- causal-lm
	- next-token-prediction
	- representation-learning
	- parameter-efficient-finetuning
	- pytorch
	- rust
	language:
	- en
	pipeline_tag: other
	citation: \|
	@software{schweich2026pawn,
	author = {Schweich, Thomas},
	title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
	year = 2026,
	url = {https://github.com/thomas-schweich/PAWN},
	license = {Apache-2.0}
	}
	---

	# PAWN: Playstyle-Agnostic World-model Network for Chess

	PAWN is a small causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.

	PAWN is designed as a testbed for finetuning and augmentation methods at small scale. Because the pretrained model is entirely unopinionated (trained only on uniformly random legal moves), it serves as a blank slate that can be adapted, augmented, and finetuned into arbitrary player models with unique playstyles.

	Finetuning PAWN has proven significantly more parameter-efficient than training new models from scratch and requires minimal compute resources.

	[GitHub Repository](https://github.com/thomas-schweich/PAWN)

	## Model Variants

	\| Variant \| d_model \| Layers \| Heads \| Parameters \| Link \|
	\|---------\|---------\|--------\|-------\|------------\|------\|
	\| PAWN-Small \| 256 \| 8 \| 4 \| ~9.5M \| [thomas-schweich/pawn-small](https://huggingface.co/thomas-schweich/pawn-small) \|
	\| PAWN (Base) \| 512 \| 8 \| 8 \| ~35.8M \| [thomas-schweich/pawn-base](https://huggingface.co/thomas-schweich/pawn-base) \|
	\| PAWN-Large \| 640 \| 10 \| 8 \| ~68.4M \| [thomas-schweich/pawn-large](https://huggingface.co/thomas-schweich/pawn-large) \|

	All variants share the same architecture (RMSNorm, SwiGLU, RoPE, factored move embeddings) and vocabulary (4,278 tokens). They differ only in width, depth, and head count.

	## Quickstart

	```bash
	# Clone and build
	git clone https://github.com/thomas-schweich/PAWN.git && cd PAWN

	# Build the Rust chess engine (required -- handles all game logic)
	cd engine && uv run --with maturin maturin develop --release && cd ..

	# Install Python dependencies
	uv sync --extra cu128 # NVIDIA (or --extra rocm for AMD)

	# Dev tools (pytest, seaborn, solara, etc.) are included in base dependencies
	# — no extra flags needed beyond the GPU backend above

	# Pull a pretrained checkpoint
	git submodule update --init checkpoints/pawn-base
	```

	### Load and generate moves

	```python
	import torch
	from safetensors.torch import load_file
	from pawn.config import CLMConfig, WHITE_CHECKMATES
	from pawn.model import PAWNCLM

	# Load the model
	cfg = CLMConfig.base()
	model = PAWNCLM(cfg).cuda().eval()
	weights = load_file("checkpoints/pawn-base/model.safetensors", device="cuda")
	model.load_state_dict(weights)

	# Condition on outcome and generate a game
	input_ids = torch.tensor([[WHITE_CHECKMATES]], device="cuda")
	pad_mask = torch.ones(1, 1, dtype=torch.bool, device="cuda")

	logits, _ = model.forward_generate(input_ids, pad_mask)
	next_token = logits[0, -1].argmax()
	```

	### Train an adapter

	```bash
	uv sync --extra dev
	git submodule update --init checkpoints/pawn-base

	uv run python scripts/train_bottleneck.py \
	--checkpoint checkpoints/pawn-base \
	--pgn data/lichess_1800_1900.pgn \
	--bottleneck-dim 32 --lr 1e-4 --local-checkpoints
	```

	## Architecture

	PAWN is a decoder-only transformer trained with next-token prediction on chess move sequences. Each sequence has the format:

	```
	[outcome] [ply_1] [ply_2] ... [ply_N] [PAD] ... [PAD]
	```

	The token vocabulary covers all possible source-destination square pairs on the 8x8 board (4,096 grid moves), promotion moves (176 tokens for 4 piece types across 44 eligible square pairs), 5 outcome tokens, and 1 padding token.

	Move embeddings are factored: each move token is decomposed into source square + destination square + promotion piece, with embeddings summed. This provides structural inductive bias (moves sharing a source or destination share embedding components) while reducing embedding parameters by roughly 32x.

	The model uses pre-norm RMSNorm, SwiGLU feed-forward layers (4x expansion), Rotary Position Embeddings (RoPE), and a 256-token context window. All chess logic -- game simulation, move generation, tokenization, and legal move computation -- is handled by a bundled Rust engine built on [shakmaty](https://github.com/niklasf/shakmaty).

	For full architectural details, see [docs/ARCHITECTURE.md](https://github.com/thomas-schweich/PAWN/blob/main/docs/ARCHITECTURE.md).

	## What the Model Learns

	Despite training exclusively on random games, PAWN develops rich internal representations:

	- Legal move prediction: The model achieves over 98% legal move rate, accurately predicting which moves are legal from move history alone.
	- Board state tracking: Linear probes on hidden states decode piece positions, check status, castling rights, material counts, and game phase with high accuracy -- even though the model never sees explicit board representations.

	These properties make PAWN useful as a frozen backbone for downstream tasks. See the [adapter documentation](https://github.com/thomas-schweich/PAWN/blob/main/docs/ADAPTERS.md) for fine-tuning results.

	## Adapter Methods

	PAWN ships with six adapter implementations for fine-tuning the frozen backbone on human game data:

	\| Method \| Parameters \| Description \|
	\|--------\|-----------\|-------------\|
	\| Bottleneck \| ~131K \| Houlsby-style residual MLP adapters \|
	\| RoSA \| configurable \| Gradient-informed sparse + LoRA ([Nikdan et al., 2024](https://arxiv.org/abs/2401.04679)) \|
	\| Sparse \| 503K--2.7M \| Random binary mask on frozen weights \|
	\| LoRA \| ~65K \| Low-rank attention projection adapters \|
	\| Hybrid \| ~65K \| LoRA + FiLM combined \|
	\| FiLM \| ~17K \| Per-channel affine modulation \|

	## Citation

	```bibtex
	@software{schweich2026pawn,
	author = {Schweich, Thomas},
	title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
	year = 2026,
	url = {https://github.com/thomas-schweich/PAWN},
	license = {Apache-2.0}
	}
	```

	## License

	Apache 2.0. See [LICENSE](https://github.com/thomas-schweich/PAWN/blob/main/LICENSE).