jacobcd52 commited on
Commit
9dc7b42
·
verified ·
1 Parent(s): 54bd20c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ss_dense
2
+
3
+ Weight-sparse transformer trained with the procedure from Gao et al. (2025).
4
+
5
+ ## Model Details
6
+
7
+ - **Layers**: 4
8
+ - **Model Dimension**: 512
9
+ - **Context Length**: 512
10
+ - **Head Dimension**: 16
11
+ - **Vocabulary Size**: 4096
12
+
13
+ ## Sparsity
14
+
15
+ - **Weight Sparsity**: False
16
+ - **Target L0 Fraction**: 1
17
+ - **Activation Sparsity**: False
18
+
19
+ ## Training
20
+
21
+ - **Dataset**: SimpleStories/SimpleStories
22
+ - **Tokenizer**: SimpleStories/SimpleStories-1.25M
23
+ - **Total Tokens**: 2,000,000,000
24
+
25
+ ## Usage
26
+
27
+ ```python
28
+ import torch
29
+ from huggingface_hub import hf_hub_download
30
+
31
+ # Download model
32
+ model_path = hf_hub_download(repo_id="jacobcd52/ss_dense", filename="pytorch_model.bin")
33
+ config_path = hf_hub_download(repo_id="jacobcd52/ss_dense", filename="config.json")
34
+
35
+ # Load (requires the SparseGPT model class from this repo)
36
+ state_dict = torch.load(model_path)
37
+ ```