Phillip Guo commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,11 +7,12 @@ I trained SAEs on the MLP_out activations of the Pythia 2.8B dataset. I trained
|
|
| 7 |
## SAE Setup
|
| 8 |
- **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
|
| 9 |
- **Model**: 32-layer Pythia 2.8B
|
| 10 |
-
- **Activation**: MLP_out
|
| 11 |
- **Layers Trained**: 0, 1, 2, 15
|
| 12 |
- **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
|
| 13 |
- **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1
|
| 14 |
- **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
|
|
|
|
| 15 |
|
| 16 |
## Training Hyperparamaters
|
| 17 |
- **Learning Rate**: 3e-4
|
|
|
|
| 7 |
## SAE Setup
|
| 8 |
- **Training Dataset**: Uncopyrighted Pile, at monology/pile-uncopyrighted
|
| 9 |
- **Model**: 32-layer Pythia 2.8B
|
| 10 |
+
- **Activation**: MLP_out, so d_model of 2560
|
| 11 |
- **Layers Trained**: 0, 1, 2, 15
|
| 12 |
- **Batch Size**: 2048 for layer 15, 2560 for layers 0, 1, 2
|
| 13 |
- **Training Tokens**: 1e9 for layers 15, 0, 2, slightly less than 2e9 for layer 1
|
| 14 |
- **Training Steps**: 4e5 for layers 0, 2, 5e5 for layer 15, 7.5e5 for layer 1
|
| 15 |
+
- **Dictionary Size**: 16x activation, so 40960
|
| 16 |
|
| 17 |
## Training Hyperparamaters
|
| 18 |
- **Learning Rate**: 3e-4
|