File size: 6,315 Bytes

34c9cd8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ea5e0ca
 
8aabd3f
ea5e0ca
 
8aabd3f
ea5e0ca
 
 
34c9cd8
 
d7ecc62
 
34c9cd8
d7ecc62
34c9cd8
d7ecc62
 
 
34c9cd8
e27101c
d7ecc62
 
34c9cd8
 
 
 
 
d7ecc62
34c9cd8
d7ecc62
34c9cd8
a188746
34c9cd8
 
 
a188746
34c9cd8
 
a188746
34c9cd8
 
a188746
ae46efa
 
3d0031b
34c9cd8
 
 
3d0031b
34c9cd8
230508d
34c9cd8
 
 
 
 
d7ecc62
34c9cd8
 
 
 
 
d7ecc62
34c9cd8
 
 
3d0031b
34c9cd8
 
 
d7ecc62
34c9cd8
3d0031b
34c9cd8
 
230508d
34c9cd8
d7ecc62
230508d
d7ecc62
230508d
d7ecc62
 
 
 
34c9cd8
d7ecc62
 
 
 
 
34c9cd8
3d0031b
34c9cd8
d7ecc62
34c9cd8
d7ecc62
34c9cd8
d7ecc62
34c9cd8
d7ecc62
34c9cd8
a188746
34c9cd8
 
d7ecc62
34c9cd8
d7ecc62
34c9cd8
d7ecc62
95f9aba
34c9cd8
 
 
 
95f9aba
34c9cd8
 
 
 
 
 
 
 
8aabd3f
34c9cd8
 
8aabd3f
34c9cd8
 
 
d7ecc62
e27101c
d7ecc62
 
34c9cd8

---
library_name: pawn
license: apache-2.0
tags:
  - chess
  - transformer
  - world-model
  - causal-lm
  - next-token-prediction
  - representation-learning
  - parameter-efficient-finetuning
  - pytorch
  - rust
language:
  - en
pipeline_tag: other
citation: |
  @software{schweich2026pawn,
    author = {Schweich, Thomas},
    title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
    year = 2026,
    url = {https://github.com/thomas-schweich/PAWN},
    license = {Apache-2.0}
  }
---

# PAWN: Playstyle-Agnostic World-model Network for Chess

PAWN is a small causal transformer trained on random chess games. It learns legal moves, board state representations, and game dynamics purely from random legal move sequences -- no strategic play, no hand-crafted features, no external game databases.

PAWN is designed as a testbed for finetuning and augmentation methods at small scale. Because the pretrained model is entirely unopinionated (trained only on uniformly random legal moves), it serves as a blank slate that can be adapted, augmented, and finetuned into arbitrary player models with unique playstyles.

Finetuning PAWN has proven significantly more parameter-efficient than training new models from scratch and requires minimal compute resources.

**[GitHub Repository](https://github.com/thomas-schweich/PAWN)**

## Model Variants

| Variant | d_model | Layers | Heads | Parameters | Link |
|---------|---------|--------|-------|------------|------|
| PAWN-Small | 256 | 8 | 4 | ~9.5M | [thomas-schweich/pawn-small](https://huggingface.co/thomas-schweich/pawn-small) |
| PAWN (Base) | 512 | 8 | 8 | ~35.8M | [thomas-schweich/pawn-base](https://huggingface.co/thomas-schweich/pawn-base) |
| PAWN-Large | 640 | 10 | 8 | ~68.4M | [thomas-schweich/pawn-large](https://huggingface.co/thomas-schweich/pawn-large) |

All variants share the same architecture (RMSNorm, SwiGLU, RoPE, factored move embeddings) and vocabulary (4,278 tokens). They differ only in width, depth, and head count.

## Quickstart

```bash
# Clone and build
git clone https://github.com/thomas-schweich/PAWN.git && cd PAWN

# Build the Rust chess engine (required -- handles all game logic)
cd engine && uv run --with maturin maturin develop --release && cd ..

# Install Python dependencies
uv sync --extra cu128   # NVIDIA (or --extra rocm for AMD)

# Dev tools (pytest, seaborn, solara, etc.) are included in base dependencies
# — no extra flags needed beyond the GPU backend above

# Pull a pretrained checkpoint
git submodule update --init checkpoints/pawn-base
```

### Load and generate moves

```python
import torch
from safetensors.torch import load_file
from pawn.config import CLMConfig, WHITE_CHECKMATES
from pawn.model import PAWNCLM

# Load the model
cfg = CLMConfig.base()
model = PAWNCLM(cfg).cuda().eval()
weights = load_file("checkpoints/pawn-base/model.safetensors", device="cuda")
model.load_state_dict(weights)

# Condition on outcome and generate a game
input_ids = torch.tensor([[WHITE_CHECKMATES]], device="cuda")
pad_mask = torch.ones(1, 1, dtype=torch.bool, device="cuda")

logits, _ = model.forward_generate(input_ids, pad_mask)
next_token = logits[0, -1].argmax()
```

### Train an adapter

```bash
uv sync --extra dev
git submodule update --init checkpoints/pawn-base

uv run python scripts/train_bottleneck.py \
    --checkpoint checkpoints/pawn-base \
    --pgn data/lichess_1800_1900.pgn \
    --bottleneck-dim 32 --lr 1e-4 --local-checkpoints
```

## Architecture

PAWN is a decoder-only transformer trained with next-token prediction on chess move sequences. Each sequence has the format:

```
[outcome] [ply_1] [ply_2] ... [ply_N] [PAD] ... [PAD]
```

The token vocabulary covers all possible source-destination square pairs on the 8x8 board (4,096 grid moves), promotion moves (176 tokens for 4 piece types across 44 eligible square pairs), 5 outcome tokens, and 1 padding token.

Move embeddings are factored: each move token is decomposed into source square + destination square + promotion piece, with embeddings summed. This provides structural inductive bias (moves sharing a source or destination share embedding components) while reducing embedding parameters by roughly 32x.

The model uses pre-norm RMSNorm, SwiGLU feed-forward layers (4x expansion), Rotary Position Embeddings (RoPE), and a 256-token context window. All chess logic -- game simulation, move generation, tokenization, and legal move computation -- is handled by a bundled Rust engine built on [shakmaty](https://github.com/niklasf/shakmaty).

For full architectural details, see [docs/ARCHITECTURE.md](https://github.com/thomas-schweich/PAWN/blob/main/docs/ARCHITECTURE.md).

## What the Model Learns

Despite training exclusively on random games, PAWN develops rich internal representations:

- **Legal move prediction**: The model achieves over 98% legal move rate, accurately predicting which moves are legal from move history alone.
- **Board state tracking**: Linear probes on hidden states decode piece positions, check status, castling rights, material counts, and game phase with high accuracy -- even though the model never sees explicit board representations.

These properties make PAWN useful as a frozen backbone for downstream tasks. See the [adapter documentation](https://github.com/thomas-schweich/PAWN/blob/main/docs/ADAPTERS.md) for fine-tuning results.

## Adapter Methods

PAWN ships with six adapter implementations for fine-tuning the frozen backbone on human game data:

| Method | Parameters | Description |
|--------|-----------|-------------|
| Bottleneck | ~131K | Houlsby-style residual MLP adapters |
| RoSA | configurable | Gradient-informed sparse + LoRA ([Nikdan et al., 2024](https://arxiv.org/abs/2401.04679)) |
| Sparse | 503K--2.7M | Random binary mask on frozen weights |
| LoRA | ~65K | Low-rank attention projection adapters |
| Hybrid | ~65K | LoRA + FiLM combined |
| FiLM | ~17K | Per-channel affine modulation |

## Citation

```bibtex
@software{schweich2026pawn,
  author = {Schweich, Thomas},
  title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
  year = 2026,
  url = {https://github.com/thomas-schweich/PAWN},
  license = {Apache-2.0}
}
```

## License

Apache 2.0. See [LICENSE](https://github.com/thomas-schweich/PAWN/blob/main/LICENSE).