PhillipGuo commited on
Commit
8bb7842
·
1 Parent(s): aa1238b

Added README and results

Browse files
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Trained Sparse Autoencoders on Pythia 2.8B
2
+ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained using https://github.com/magikarp01/facts-sae.git, a fork of https://github.com/saprmarks/dictionary_learning.
3
+
4
+ ## SAE Setup
5
+ - **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
6
+ - **Model**: 32-layer Pythia 2.8B
7
+ - **Activation**: MLP_out
8
+ - **Layers Trained**: 0, 1, 2, 15
9
+ - **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
10
+ - **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1
11
+ - **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
12
+
13
+ ## Training Hyperparamaters
14
+ - **Learning Rate**: 3e-4
15
+ - **Sparsity Penalty**: 1e-3
16
+ - **Warmup Steps**: 5000
17
+ - **Resample Steps**: 50000
18
+ - **Optimizer**: Constrained Adam
19
+ - **Scheduler**: LambdaLR, linear warmup lr between 0 and warmup_steps
20
+
21
+ ## SAE Metrics
22
+ ![2.8b L0 390k Steps](results/2.8b_l0_390k.png)
23
+ ![2.8b L1 390k Steps](results/2.8b_l1_390k.png)
24
+ ![2.8b L1 740k Steps](results/2.8b_l1_740k.png)
25
+ ![2.8b L2 390k Steps](results/2.8b_l2_390k.png)
26
+ ![2.8b L15 490k Steps](results/2.8b_l15_490k.png)
results/2.8b_l0_390k.png ADDED
results/2.8b_l15_490k.png ADDED
results/2.8b_l1_390k.png ADDED
results/2.8b_l1_740k.png ADDED
results/2.8b_l2_390k.png ADDED