thomas-schweich commited on
Commit
3d0031b
·
unverified ·
1 Parent(s): 73c3924

Add links to Hugging Face models.

Browse files

Updated model download links in README to include Hugging Face badges and improved clarity in vocabulary description.

Files changed (1) hide show
  1. README.md +29 -12
README.md CHANGED
@@ -14,61 +14,78 @@ To aid in exploring how model size affects different finetuning methods, we trai
14
 
15
  | Variant | d_model | Layers | Heads | Params | Download |
16
  |---------|---------|--------|-------|--------|----------|
17
- | **PAWN** | 512 | 8 | 8 | ~35.8M | [pawn-base.pt]() |
18
- | **PAWN-Small** | 256 | 8 | 4 | ~9.5M | [pawn-small.pt]() |
19
- | **PAWN-Large** | 640 | 10 | 8 | ~68.4M | [pawn-large.pt]() |
20
 
21
 
22
  All variants share the same architecture: RMSNorm, SwiGLU FFN, RoPE, factored move embeddings, and a 4278-token vocabulary covering:
23
 
24
  - all possible (src, dst) pairs for an 8x8 grid (the chess board),
25
  - promotion moves (one per promotion piece type per square on 1st or 8th rank),
26
- - a token for each game outcome (white wins, black wins, stalemate, draw, ply limit),
27
  - and a padding token.
28
 
29
- Notably, the vocabulary includes impossible moves like `a1a1` and `b1a5`. PAWN naturally learns to avoid these during training.
30
 
31
- Conceptually, each token is best thought of as a move in UCI notation--they are effectively coordinates. They do not include any information on the type of peice, side to play, or any direct geometric or board state information[^1]. `e2e4` is the token that represents the king's pawn opening but only when it's the first ply in the sequence--moving a rook between from e2 to e4 in the late game would use the same token). `e7e8`
 
 
32
 
33
  ## Quickstart
34
 
35
  ```bash
36
  # Clone and build
37
  git clone https://github.com/<user>/pawn.git && cd pawn
 
 
38
  cd engine && uv run --with maturin maturin develop --release && cd ..
 
 
39
  uv sync --extra cu128 # NVIDIA GPU (or --extra rocm for AMD)
40
 
 
 
 
41
  # Train an adapter on a pre-trained checkpoint
42
  uv run python scripts/train_bottleneck.py \
43
  --checkpoint checkpoints/pawn-base.pt \
44
  --pgn data/lichess_1800_1900.pgn \
45
  --bottleneck-dim 32 --lr 1e-4
46
 
47
- # Or pretrain from scratch (generates random games on-the-fly)
48
  uv run python scripts/train.py --variant base
 
 
 
49
  ```
50
 
51
  ## Architecture
52
 
 
 
53
  PAWN is a standard decoder-only transformer trained with next-token prediction on chess move sequences. Each training example is:
54
 
55
  ```
56
  [outcome] [ply_1] [ply_2] ... [ply_N] [PAD] ... [PAD]
57
  ```
58
 
59
- The outcome token (white wins, black wins, stalemate, draw, ply limit) tells the model how the game ends.
 
 
60
 
61
- Ply tokens use a factored embedding: each move is decomposed into source square + destination square + promotion piece, with embeddings summed. This gives the model explicit spatial structure while keeping the vocabulary compact.
62
 
63
  The context window of all variants is 256 tokens wide. Training examples all include the outcome token followed by up to 255 ply or padding tokens.
64
 
65
- During training, examples are retroactively prepended with their actual outcome. During inference, the outcome token has a measurable impact on subsequent completions.
66
 
67
  The models predictions are not masked to legal moves during training; it has to determine what moves are currently legal based on the seqeunce of moves so far.
68
 
69
  No attempt is made to provide the model with information about other peices. In other words, it only thinks in moves. There is no equivalent of 7-dimensional manifold board representation used by e.g. Alpha Zero and Lc0. Any and all state representation is learned by the model internally.
70
 
71
  ## Adapter Methods
 
72
 
73
  PAWN ships with five adapter implementations for fine-tuning the frozen backbone:
74
 
@@ -80,7 +97,7 @@ PAWN ships with five adapter implementations for fine-tuning the frozen backbone
80
  | **Hybrid** | ~65K | 34.1% | LoRA + FiLM combined |
81
  | **FiLM** | ~17K | 30.3% | Per-channel affine modulation |
82
 
83
- A 524K bottleneck adapter on PAWN achieves 42.2% accuracy, vs. 30.9% for a standalone model with the same parameter count. The frozen backbone provides ~11 percentage points of "free" accuracy.
84
 
85
  See [docs/ADAPTERS.md](docs/ADAPTERS.md) for detailed comparisons and training instructions.
86
 
@@ -106,7 +123,7 @@ pawn/
106
 
107
  ## Chess Engine
108
 
109
- PAWN includes a bundled Rust chess engine (`engine/`) that handles all game simulation, move generation, legal move computation, and PGN parsing via `shakmaty`. No Python chess libraries are used. The engine generates training data on-the-fly via `chess_engine.generate_random_games()`, which is capable of producing well over 100 million random games per hour on a modern CPU.
110
 
111
 
112
  ## Documentation
 
14
 
15
  | Variant | d_model | Layers | Heads | Params | Download |
16
  |---------|---------|--------|-------|--------|----------|
17
+ | **PAWN** | 512 | 8 | 8 | ~35.8M | [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm-dark.svg)](https://huggingface.co/thomas-schweich/pawn-base) |
18
+ | **PAWN-Small** | 256 | 8 | 4 | ~9.5M | [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm-dark.svg)](https://huggingface.co/thomas-schweich/pawn-small) |
19
+ | **PAWN-Large** | 640 | 10 | 8 | ~68.4M | [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm-dark.svg)](https://huggingface.co/thomas-schweich/pawn-large) |
20
 
21
 
22
  All variants share the same architecture: RMSNorm, SwiGLU FFN, RoPE, factored move embeddings, and a 4278-token vocabulary covering:
23
 
24
  - all possible (src, dst) pairs for an 8x8 grid (the chess board),
25
  - promotion moves (one per promotion piece type per square on 1st or 8th rank),
26
+ - a token for each game outcome (`WHITE_CHECKMATE`, `BLACK_CHECKMATE`, `STALEMATE`, `DRAW_BY_RULE`, `PLY_LIMIT`),
27
  - and a padding token.
28
 
29
+ Notably, the vocabulary includes impossible moves like `a1a1` and `b1a5`. PAWN naturally learns to avoid these since they don't appear in its training examples.
30
 
31
+ Conceptually, each token is best thought of as a move in UCI notation--they are effectively coordinates. They do not include any information on the type of peice, side to play, or any direct geometric or board state information other than the factored nature of the embeddings (see the architecture section below for details).
32
+
33
+ For example, `e2e4` is the token that represents the king's pawn opening, but only when it's the first ply in the sequence (moving a rook between from e2 to e4 in the late game would use the same token). The model learns to track which type of peice is on each square any given moment entirely of its own accord. For that matter, it isn't even told what piece types exist and what movement patterns they follow, or indeed even the concept of a peice. All of that 'understanding' comes purely from observation.
34
 
35
  ## Quickstart
36
 
37
  ```bash
38
  # Clone and build
39
  git clone https://github.com/<user>/pawn.git && cd pawn
40
+
41
+ # Build the Rust chess engine
42
  cd engine && uv run --with maturin maturin develop --release && cd ..
43
+
44
+ # Install core dependencies
45
  uv sync --extra cu128 # NVIDIA GPU (or --extra rocm for AMD)
46
 
47
+ # Install dependencies for running tests, performing analysis on the results, and running the training monitoring dashboard (optional but recommended)
48
+ uv sync --extra dev --extra eval --extra dashboard
49
+
50
  # Train an adapter on a pre-trained checkpoint
51
  uv run python scripts/train_bottleneck.py \
52
  --checkpoint checkpoints/pawn-base.pt \
53
  --pgn data/lichess_1800_1900.pgn \
54
  --bottleneck-dim 32 --lr 1e-4
55
 
56
+ # Or pretrain a PAWN variant from scratch (generates random games on-the-fly; no dataset required)
57
  uv run python scripts/train.py --variant base
58
+
59
+ # Launch the real-time monitoring dashboard (optional dashboard dependency must be installed)
60
+ uv run python -m pawn.dashboard --log-dir logs --port 8765
61
  ```
62
 
63
  ## Architecture
64
 
65
+ <sub>Main article: [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)</sub>
66
+
67
  PAWN is a standard decoder-only transformer trained with next-token prediction on chess move sequences. Each training example is:
68
 
69
  ```
70
  [outcome] [ply_1] [ply_2] ... [ply_N] [PAD] ... [PAD]
71
  ```
72
 
73
+ The outcome token is one of `WHITE_CHECKMATE`, `BLACK_CHECKMATE`, `STALEMATE`, `DRAW_BY_RULE`, or `PLY_LIMIT`.
74
+
75
+ Ply tokens use a factored embedding: each move is decomposed into source square + destination square + promotion piece, with embeddings summed. This gives the model some degree of explicit spatial structure while keeping the vocabulary compact.
76
 
77
+ The summed embeddings effectively represent UCI strings like `e2e4` (peice moves from `e2` to `e4`) or `f7f8q` (promotion to queen on `f8`). In factored form, the vector `e2e4` is given by `(e2xx + xxe4)`. Likewise, `f7f8q` is given by `(f7xx + xxf8 + q)`.
78
 
79
  The context window of all variants is 256 tokens wide. Training examples all include the outcome token followed by up to 255 ply or padding tokens.
80
 
81
+ During training, simulated games are retroactively prepended with their actual outcome. During inference, the outcome token has a measurable impact on subsequent completions.
82
 
83
  The models predictions are not masked to legal moves during training; it has to determine what moves are currently legal based on the seqeunce of moves so far.
84
 
85
  No attempt is made to provide the model with information about other peices. In other words, it only thinks in moves. There is no equivalent of 7-dimensional manifold board representation used by e.g. Alpha Zero and Lc0. Any and all state representation is learned by the model internally.
86
 
87
  ## Adapter Methods
88
+ <sub>Main article: [docs/ADAPTERS.md](docs/ADAPTERS.md)</sub>
89
 
90
  PAWN ships with five adapter implementations for fine-tuning the frozen backbone:
91
 
 
97
  | **Hybrid** | ~65K | 34.1% | LoRA + FiLM combined |
98
  | **FiLM** | ~17K | 30.3% | Per-channel affine modulation |
99
 
100
+ Preliminary results show that a 524K bottleneck adapter on PAWN achieves 42.2% accuracy when predicting moves by 1800-level players on Lichess vs. 30.9% for a standalone model with the same architecture and parameter count. Thus the frozen backbone provides ~11 percentage points of "free" accuracy.
101
 
102
  See [docs/ADAPTERS.md](docs/ADAPTERS.md) for detailed comparisons and training instructions.
103
 
 
123
 
124
  ## Chess Engine
125
 
126
+ PAWN includes a bundled Rust chess engine (`engine/`) that handles all game simulation, move generation, legal move computation, and PGN parsing. The engine extensively uses `shakmaty` under the hood. No Python chess libraries are used. The engine generates training data on-the-fly via `chess_engine.generate_random_games()`, which is capable of producing well over 100 million random games per hour on a modern CPU.
127
 
128
 
129
  ## Documentation