# ss_dense Weight-sparse transformer trained with the procedure from Gao et al. (2025). ## Model Details - **Layers**: 4 - **Model Dimension**: 512 - **Context Length**: 512 - **Head Dimension**: 16 - **Vocabulary Size**: 4096 ## Sparsity - **Weight Sparsity**: False - **Target L0 Fraction**: 1 - **Activation Sparsity**: False ## Training - **Dataset**: SimpleStories/SimpleStories - **Tokenizer**: SimpleStories/SimpleStories-1.25M - **Total Tokens**: 2,000,000,000 ## Usage ```python import torch from huggingface_hub import hf_hub_download # Download model model_path = hf_hub_download(repo_id="jacobcd52/ss_dense", filename="pytorch_model.bin") config_path = hf_hub_download(repo_id="jacobcd52/ss_dense", filename="config.json") # Load (requires the SparseGPT model class from this repo) state_dict = torch.load(model_path) ```