ss_d256_f1 / README.md
jacobcd52's picture
Upload README.md with huggingface_hub
c21c95e verified
# ss_d256_f1
Weight-sparse transformer trained with the procedure from Gao et al. (2025).
## Model Details
- **Layers**: 4
- **Model Dimension**: 256
- **Context Length**: 512
- **Head Dimension**: 16
- **Vocabulary Size**: 4096
## Sparsity
- **Weight Sparsity**: False
- **Target L0 Fraction**: 1
- **Activation Sparsity**: False
## Training
- **Dataset**: SimpleStories/SimpleStories
- **Tokenizer**: SimpleStories/SimpleStories-1.25M
- **Total Tokens**: 2,000,000,000
## Training Run
- **W&B Run**: [https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw](https://wandb.ai/training-saes/my_sparsity/runs/l0bvjtlw)
## Usage
```python
import torch
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id="jacobcd52/ss_d256_f1", filename="config.json")
# Load (requires the SparseGPT model class from this repo)
state_dict = torch.load(model_path)
```