PhillipGuo commited on
Commit
6ddc184
·
verified ·
1 Parent(s): 7c69c55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -2,7 +2,10 @@
2
  license: apache-2.0
3
  ---
4
  # Trained Sparse Autoencoders on Pythia 2.8B
5
- I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained using https://github.com/magikarp01/facts-sae.git, a fork of https://github.com/saprmarks/dictionary_learning.
 
 
 
6
 
7
  ## SAE Setup
8
  - **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
@@ -10,7 +13,7 @@ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained
10
  - **Activation**: MLP_out, so d_model of 2560
11
  - **Layers Trained**: 0, 1, 2, 15
12
  - **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
13
- - **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1
14
  - **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
15
  - **Dictionary Size**: 16x activation, so 40960
16
 
@@ -23,8 +26,11 @@ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained
23
  - **Scheduler**: LambdaLR, linear warmup lr between 0 and warmup_steps
24
 
25
  ## SAE Metrics
26
- ![2.8b L0 390k Steps](results/2.8b_l0_390k.png)
27
- ![2.8b L1 390k Steps](results/2.8b_l1_390k.png)
28
- ![2.8b L1 740k Steps](results/2.8b_l1_740k.png)
29
- ![2.8b L2 390k Steps](results/2.8b_l2_390k.png)
30
- ![2.8b L15 490k Steps](results/2.8b_l15_490k.png)
 
 
 
 
2
  license: apache-2.0
3
  ---
4
  # Trained Sparse Autoencoders on Pythia 2.8B
5
+ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained using https://github.com/magikarp01/facts-sae.git, a fork of https://github.com/saprmarks/dictionary_learning designed for efficient multi-GPU (not yet multinode) training. I have checkpoints saved every 10k steps, but I have not uploaded them all: message me if you want more checkpoints.
6
+
7
+ The goal was originally to analyze these SAEs specifically to determine how well they contribute to performance on a [Sports Facts](https://www.lesswrong.com/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall) dataset.
8
+ I'm currently working on some other projects so I haven't actually had time to do this, but hopefully in the future some results might come out of these SAEs.
9
 
10
  ## SAE Setup
11
  - **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
 
13
  - **Activation**: MLP_out, so d_model of 2560
14
  - **Layers Trained**: 0, 1, 2, 15
15
  - **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
16
+ - **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1.
17
  - **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
18
  - **Dictionary Size**: 16x activation, so 40960
19
 
 
26
  - **Scheduler**: LambdaLR, linear warmup lr between 0 and warmup_steps
27
 
28
  ## SAE Metrics
29
+ ![2.8b Layer 0 390k Steps](results/2.8b_l0_390k.png)
30
+ ![2.8b Layer 1 390k Steps](results/2.8b_l1_390k.png)
31
+ ![2.8b Layer 1 740k Steps](results/2.8b_l1_740k.png)
32
+ ![2.8b Layer 2 390k Steps](results/2.8b_l2_390k.png)
33
+ ![2.8b Layer 15 490k Steps](results/2.8b_l15_490k.png)
34
+
35
+ ## Thanks
36
+ Thanks to Nat Friedman/NFDG for letting me use H100s from the Andromeda Cluster during downtime, and thanks to Sam Marks/NDIF for the original SAE training repo and for helping me distribute the SAEs. Work done as a late part of my MATS training phase with Neel Nanda.