jacobcd52
/

ss_d256_f1

jacobcd52 commited on Dec 14, 2025

Commit

c21c95e

verified ·

1 Parent(s): 40bb4f1

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md ADDED Viewed

+# ss_d256_f1
+Weight-sparse transformer trained with the procedure from Gao et al. (2025).
+## Model Details
+- **Layers**: 4
+- **Model Dimension**: 256
+- **Context Length**: 512
+- **Head Dimension**: 16
+- **Vocabulary Size**: 4096
+## Sparsity
+- **Weight Sparsity**: False
+- **Target L0 Fraction**: 1
+- **Activation Sparsity**: False
+## Training
+- **Dataset**: SimpleStories/SimpleStories
+- **Tokenizer**: SimpleStories/SimpleStories-1.25M
+- **Total Tokens**: 2,000,000,000
+## Training Run
+- **W&B Run**: [https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw](https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw)
+## Usage
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download model
+model_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="pytorch_model.bin")
+config_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="config.json")
+# Load (requires the SparseGPT model class from this repo)
+state_dict = torch.load(model_path)
+```