thomas-schweich commited on
Commit
ac274df
·
verified ·
1 Parent(s): 1da8797

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +66 -0
  2. model.pt +3 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: pytorch
4
+ tags:
5
+ - chess
6
+ - transformer
7
+ - causal-lm
8
+ - world-model
9
+ datasets:
10
+ - random-self-play
11
+ model-index:
12
+ - name: pawn-small
13
+ results:
14
+ - task:
15
+ type: next-move-prediction
16
+ metrics:
17
+ - name: Val Loss
18
+ type: loss
19
+ value: 3.15
20
+ - name: Val Accuracy
21
+ type: accuracy
22
+ value: 6.7
23
+ ---
24
+
25
+ # PAWN-SMALL
26
+
27
+ A causal transformer trained on random chess games, designed as a testbed for finetuning and augmentation methods at small scales.
28
+
29
+ ## Model Details
30
+
31
+ | | |
32
+ |---|---|
33
+ | **Parameters** | 9.5M |
34
+ | **Architecture** | Decoder-only transformer (RMSNorm, SwiGLU, RoPE) |
35
+ | **d_model** | 256 |
36
+ | **Layers** | 8 |
37
+ | **Heads** | 4 |
38
+ | **Vocabulary** | 4,278 tokens (4,096 grid + 176 promotions + 5 outcomes + 1 PAD) |
39
+ | **Sequence length** | 256 |
40
+ | **Training steps** | 80K/100K |
41
+ | **Best val loss** | 3.150 (step 80,000) |
42
+ | **Best val accuracy** | 6.7% |
43
+
44
+ ## Usage
45
+
46
+ ```python
47
+ import torch
48
+ from pawn.config import CLMConfig
49
+ from pawn.model import PAWNCLM
50
+
51
+ cfg = CLMConfig.small()
52
+ model = PAWNCLM(cfg)
53
+
54
+ ckpt = torch.load("model.pt", map_location="cpu", weights_only=False)
55
+ model.load_state_dict(ckpt["model_state_dict"])
56
+ model.eval()
57
+ ```
58
+
59
+ ## Training
60
+
61
+ Trained from scratch on random self-play games generated by a Rust chess engine (shakmaty).
62
+ See the [PAWN repository](https://github.com/thomas-schweich/PAWN) for training code, data pipeline, and evaluation suite.
63
+
64
+ ## License
65
+
66
+ Apache 2.0
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6bc05855065923f2f8406834b6ad23c118fa63fb892e883b7641029761ac278
3
+ size 114390171