taj-gillin
/

chessgpt

mechanistic-interpretability

Model card Files Files and versions

taj-gillin commited on Jan 10

Commit

d2f91e2

·

verified ·

1 Parent(s): c9b9ae5

Final trained model

Files changed (3) hide show

README.md +11 -11
config.json +5 -5
pytorch_model.bin +2 -2

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 license: mit
 ---
-# chessgpt-small
 ChessGPT model trained for mechanistic interpretability research.
@@ -17,24 +17,24 @@ ChessGPT model trained for mechanistic interpretability research.
 - **Architecture**: GPT-style transformer
 - **Vocabulary**: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
 - **Context Length**: 256
-- **Layers**: 8
-- **Hidden Size**: 512
-- **Attention Heads**: 8
 ## Training Configuration
 - **Dataset**: Lichess/standard-chess-games
-- **Min Elo**: 1500
-- **Min Moves**: 5
-- **Batch Size**: 16
 - **Learning Rate**: 3e-4
-- **Epochs**: 1
 ## Metrics
-- **loss**: 6.6455
-- **accuracy**: 0.0562
-- **perplexity**: 769.3397
 ## Usage

 license: mit
 ---
+# chessgpt-medium
 ChessGPT model trained for mechanistic interpretability research.
 - **Architecture**: GPT-style transformer
 - **Vocabulary**: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
 - **Context Length**: 256
+- **Layers**: 12
+- **Hidden Size**: 768
+- **Attention Heads**: 12
 ## Training Configuration
 - **Dataset**: Lichess/standard-chess-games
+- **Min Elo**: 1800
+- **Min Moves**: 10
+- **Batch Size**: 32
 - **Learning Rate**: 3e-4
+- **Epochs**: 10
 ## Metrics
+- **loss**: 1.1781
+- **accuracy**: 0.7051
+- **perplexity**: 3.2484
 ## Usage

config.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
-  "name": "chessgpt-small",
   "vocab_size": 4211,
-  "n_layers": 8,
-  "n_heads": 8,
-  "d_model": 512,
-  "d_ff": 2048,
   "dropout": 0.1,
   "context_length": 256,
   "pad_token_id": 0,

 {
+  "name": "chessgpt-medium",
   "vocab_size": 4211,
+  "n_layers": 12,
+  "n_heads": 12,
+  "d_model": 768,
+  "d_ff": 3072,
   "dropout": 0.1,
   "context_length": 256,
   "pad_token_id": 0,

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:81d0f437fea8a3129f26fff3fd292fbeaa2ddde5d8e7d435e14af30fa623710b
-size 112168211

 version https://git-lfs.github.com/spec/v1
+oid sha256:6475a1e2dbd9d2608a109a9399e8682d57c7b070120a93d5528121875f4f4b4d
+size 357154723