taj-gillin commited on
Commit
d2f91e2
·
verified ·
1 Parent(s): c9b9ae5

Final trained model

Browse files
Files changed (3) hide show
  1. README.md +11 -11
  2. config.json +5 -5
  3. pytorch_model.bin +2 -2
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  license: mit
8
  ---
9
 
10
- # chessgpt-small
11
 
12
  ChessGPT model trained for mechanistic interpretability research.
13
 
@@ -17,24 +17,24 @@ ChessGPT model trained for mechanistic interpretability research.
17
  - **Architecture**: GPT-style transformer
18
  - **Vocabulary**: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
19
  - **Context Length**: 256
20
- - **Layers**: 8
21
- - **Hidden Size**: 512
22
- - **Attention Heads**: 8
23
 
24
  ## Training Configuration
25
 
26
  - **Dataset**: Lichess/standard-chess-games
27
- - **Min Elo**: 1500
28
- - **Min Moves**: 5
29
- - **Batch Size**: 16
30
  - **Learning Rate**: 3e-4
31
- - **Epochs**: 1
32
 
33
  ## Metrics
34
 
35
- - **loss**: 6.6455
36
- - **accuracy**: 0.0562
37
- - **perplexity**: 769.3397
38
 
39
 
40
  ## Usage
 
7
  license: mit
8
  ---
9
 
10
+ # chessgpt-medium
11
 
12
  ChessGPT model trained for mechanistic interpretability research.
13
 
 
17
  - **Architecture**: GPT-style transformer
18
  - **Vocabulary**: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
19
  - **Context Length**: 256
20
+ - **Layers**: 12
21
+ - **Hidden Size**: 768
22
+ - **Attention Heads**: 12
23
 
24
  ## Training Configuration
25
 
26
  - **Dataset**: Lichess/standard-chess-games
27
+ - **Min Elo**: 1800
28
+ - **Min Moves**: 10
29
+ - **Batch Size**: 32
30
  - **Learning Rate**: 3e-4
31
+ - **Epochs**: 10
32
 
33
  ## Metrics
34
 
35
+ - **loss**: 1.1781
36
+ - **accuracy**: 0.7051
37
+ - **perplexity**: 3.2484
38
 
39
 
40
  ## Usage
config.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
- "name": "chessgpt-small",
3
  "vocab_size": 4211,
4
- "n_layers": 8,
5
- "n_heads": 8,
6
- "d_model": 512,
7
- "d_ff": 2048,
8
  "dropout": 0.1,
9
  "context_length": 256,
10
  "pad_token_id": 0,
 
1
  {
2
+ "name": "chessgpt-medium",
3
  "vocab_size": 4211,
4
+ "n_layers": 12,
5
+ "n_heads": 12,
6
+ "d_model": 768,
7
+ "d_ff": 3072,
8
  "dropout": 0.1,
9
  "context_length": 256,
10
  "pad_token_id": 0,
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:81d0f437fea8a3129f26fff3fd292fbeaa2ddde5d8e7d435e14af30fa623710b
3
- size 112168211
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6475a1e2dbd9d2608a109a9399e8682d57c7b070120a93d5528121875f4f4b4d
3
+ size 357154723