| # ss_dense | |
| Weight-sparse transformer trained with the procedure from Gao et al. (2025). | |
| ## Model Details | |
| - **Layers**: 4 | |
| - **Model Dimension**: 512 | |
| - **Context Length**: 512 | |
| - **Head Dimension**: 16 | |
| - **Vocabulary Size**: 4096 | |
| ## Sparsity | |
| - **Weight Sparsity**: False | |
| - **Target L0 Fraction**: 1 | |
| - **Activation Sparsity**: False | |
| ## Training | |
| - **Dataset**: SimpleStories/SimpleStories | |
| - **Tokenizer**: SimpleStories/SimpleStories-1.25M | |
| - **Total Tokens**: 2,000,000,000 | |
| ## Usage | |
| ```python | |
| import torch | |
| from huggingface_hub import hf_hub_download | |
| # Download model | |
| model_path = hf_hub_download(repo_id="jacobcd52/ss_dense", filename="pytorch_model.bin") | |
| config_path = hf_hub_download(repo_id="jacobcd52/ss_dense", filename="config.json") | |
| # Load (requires the SparseGPT model class from this repo) | |
| state_dict = torch.load(model_path) | |
| ``` | |