PhillipGuo
/

2.8b-SAEs

Model card Files Files and versions

PhillipGuo commited on Jan 29, 2024

Commit

6ddc184

·

verified ·

1 Parent(s): 7c69c55

Update README.md

Files changed (1) hide show

README.md +13 -7

README.md CHANGED Viewed

@@ -2,7 +2,10 @@
 license: apache-2.0
 ---
 # Trained Sparse Autoencoders on Pythia 2.8B
-I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained using https://github.com/magikarp01/facts-sae.git, a fork of https://github.com/saprmarks/dictionary_learning.
 ## SAE Setup
 - **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
@@ -10,7 +13,7 @@ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained
 - **Activation**: MLP_out, so d_model of 2560
 - **Layers Trained**: 0, 1, 2, 15
 - **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
-- **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1
 - **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
 - **Dictionary Size**: 16x activation, so 40960
@@ -23,8 +26,11 @@ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained
 - **Scheduler**: LambdaLR, linear warmup lr between 0 and warmup_steps
 ## SAE Metrics
-![2.8b L0 390k Steps](results/2.8b_l0_390k.png)
-![2.8b L1 390k Steps](results/2.8b_l1_390k.png)
-![2.8b L1 740k Steps](results/2.8b_l1_740k.png)
-![2.8b L2 390k Steps](results/2.8b_l2_390k.png)
-![2.8b L15 490k Steps](results/2.8b_l15_490k.png)

 license: apache-2.0
 ---
 # Trained Sparse Autoencoders on Pythia 2.8B
+I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained using https://github.com/magikarp01/facts-sae.git, a fork of https://github.com/saprmarks/dictionary_learning designed for efficient multi-GPU (not yet multinode) training. I have checkpoints saved every 10k steps, but I have not uploaded them all: message me if you want more checkpoints.
+The goal was originally to analyze these SAEs specifically to determine how well they contribute to performance on a [Sports Facts](https://www.lesswrong.com/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall) dataset.
+I'm currently working on some other projects so I haven't actually had time to do this, but hopefully in the future some results might come out of these SAEs.
 ## SAE Setup
 - **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
 - **Activation**: MLP_out, so d_model of 2560
 - **Layers Trained**: 0, 1, 2, 15
 - **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
+- **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1.
 - **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
 - **Dictionary Size**: 16x activation, so 40960
 - **Scheduler**: LambdaLR, linear warmup lr between 0 and warmup_steps
 ## SAE Metrics
+![2.8b Layer 0 390k Steps](results/2.8b_l0_390k.png)
+![2.8b Layer 1 390k Steps](results/2.8b_l1_390k.png)
+![2.8b Layer 1 740k Steps](results/2.8b_l1_740k.png)
+![2.8b Layer 2 390k Steps](results/2.8b_l2_390k.png)
+![2.8b Layer 15 490k Steps](results/2.8b_l15_490k.png)
+## Thanks
+Thanks to Nat Friedman/NFDG for letting me use H100s from the Andromeda Cluster during downtime, and thanks to Sam Marks/NDIF for the original SAE training repo and for helping me distribute the SAEs. Work done as a late part of my MATS training phase with Neel Nanda.