YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ss_d3072_f0.0039

Weight-sparse transformer with bridges, trained with the procedure from Gao et al. (2025).

This repo contains a sparse model and bridges that couple it to a frozen dense model.

Sparse Model Details

  • Layers: 2
  • Model Dimension: 3072
  • Context Length: 512
  • Head Dimension: 16
  • Vocabulary Size: None

Dense Model

  • Source: jacobcd52/ss_d128_f1

Bridges

  • Encoder AbsTopK Fraction: 0.25

Sparsity

  • Weight Sparsity: True
  • Target L0 Fraction: 0.00390625
  • Activation Sparsity: True

Training

  • Dataset: SimpleStories/SimpleStories
  • Tokenizer: SimpleStories/SimpleStories-1.25M
  • Total Tokens: 2,000,000,000

Usage

import torch
from huggingface_hub import hf_hub_download

# Download sparse model and bridges
sparse_model_path = hf_hub_download(repo_id="jacobcd52/ss_d3072_f0.0039", filename="sparse_model.bin")
bridges_path = hf_hub_download(repo_id="jacobcd52/ss_d3072_f0.0039", filename="bridges.bin")
config_path = hf_hub_download(repo_id="jacobcd52/ss_d3072_f0.0039", filename="config.json")

# Load (requires the SparseGPT and BridgeSet classes from this repo)
sparse_state_dict = torch.load(sparse_model_path)
bridges_state_dict = torch.load(bridges_path)
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support