jacobcd52 commited on
Commit
c21c95e
·
verified ·
1 Parent(s): 40bb4f1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ss_d256_f1
2
+
3
+ Weight-sparse transformer trained with the procedure from Gao et al. (2025).
4
+
5
+ ## Model Details
6
+
7
+ - **Layers**: 4
8
+ - **Model Dimension**: 256
9
+ - **Context Length**: 512
10
+ - **Head Dimension**: 16
11
+ - **Vocabulary Size**: 4096
12
+
13
+ ## Sparsity
14
+
15
+ - **Weight Sparsity**: False
16
+ - **Target L0 Fraction**: 1
17
+ - **Activation Sparsity**: False
18
+
19
+ ## Training
20
+
21
+ - **Dataset**: SimpleStories/SimpleStories
22
+ - **Tokenizer**: SimpleStories/SimpleStories-1.25M
23
+ - **Total Tokens**: 2,000,000,000
24
+
25
+ ## Training Run
26
+
27
+ - **W&B Run**: [https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw](https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw)
28
+
29
+ ## Usage
30
+
31
+ ```python
32
+ import torch
33
+ from huggingface_hub import hf_hub_download
34
+
35
+ # Download model
36
+ model_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="pytorch_model.bin")
37
+ config_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="config.json")
38
+
39
+ # Load (requires the SparseGPT model class from this repo)
40
+ state_dict = torch.load(model_path)
41
+ ```