MoE Redistribution: Trained Checkpoints

Trained 1-layer transformer checkpoints accompanying the anonymous submission Sparsity Moves Computation (under double-blind review).

The repository contains 5-seed checkpoint sets across all architecture and routing variants used in the paper: dense FFN, GLU, MoE (top-1 / top-2, learned and random routing), and MoE-GLU, on three tasks (add-7, modular addition, histogram counting).

Layout

checkpoints/
  <task>_<arch>_<config>_s<seed>/
    best_model.pt          # add-7, histogram (PyTorch state dict + config)
    modadd_best.pt         # modular addition

<task> ∈ {add7, modadd, hist}. <arch> ∈ {ffn, glu, moe, moe_glu}. <config> encodes width, activation, normalization, and routing variant (e.g. nonorm, narrow_nonorm, topk2_nonorm, randroute_nonorm, d170_silu_nonorm).

Each .pt file contains a Python dict with keys: model_state_dict, config, optimizer_state_dict, and one of accuracy / test_acc / step / epoch. The config dict stores architectural hyperparameters only.

Loading a checkpoint

from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    repo_id="Sparsity-Moves-Computation/moe-redistribution-checkpoints",
    filename="add7_ffn_nonorm_s42/best_model.pt",
)
ck = torch.load(path, weights_only=False, map_location="cpu")
print(ck["config"])          # architectural hyperparameters
print(ck["accuracy"])        # final eval accuracy

Bulk download

huggingface-cli download \
    Sparsity-Moves-Computation/moe-redistribution-checkpoints \
    --local-dir checkpoints/

Reproducibility

The link is provided in the paper. Loading these checkpoints with that repository's OneLayerTransformer class (in model/model.py) reproduces every result in the paper.

Notes

All identifying information has been removed from filenames, folder names, and .pt config dicts.
The repository is anonymized for the duration.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support