Phillip Guo commited on
Commit
7c69c55
·
verified ·
1 Parent(s): 6a718a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -7,11 +7,12 @@ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained
7
  ## SAE Setup
8
  - **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
9
  - **Model**: 32-layer Pythia 2.8B
10
- - **Activation**: MLP_out
11
  - **Layers Trained**: 0, 1, 2, 15
12
  - **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
13
  - **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1
14
  - **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
 
15
 
16
  ## Training Hyperparamaters
17
  - **Learning Rate**: 3e-4
 
7
  ## SAE Setup
8
  - **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
9
  - **Model**: 32-layer Pythia 2.8B
10
+ - **Activation**: MLP_out, so d_model of 2560
11
  - **Layers Trained**: 0, 1, 2, 15
12
  - **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
13
  - **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1
14
  - **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
15
+ - **Dictionary Size**: 16x activation, so 40960
16
 
17
  ## Training Hyperparamaters
18
  - **Learning Rate**: 3e-4