MoE Redistribution: Trained Checkpoints

Trained 1-layer transformer checkpoints accompanying the anonymous submission Sparsity Moves Computation (under double-blind review).

The repository contains 5-seed checkpoint sets across all architecture and routing variants used in the paper: dense FFN, GLU, MoE (top-1 / top-2, learned and random routing), and MoE-GLU, on three tasks (add-7, modular addition, histogram counting).

Layout

checkpoints/
  <task>_<arch>_<config>_s<seed>/
    best_model.pt          # add-7, histogram (PyTorch state dict + config)
    modadd_best.pt         # modular addition

<task> โˆˆ {add7, modadd, hist}. <arch> โˆˆ {ffn, glu, moe, moe_glu}. <config> encodes width, activation, normalization, and routing variant (e.g. nonorm, narrow_nonorm, topk2_nonorm, randroute_nonorm, d170_silu_nonorm).

Each .pt file contains a Python dict with keys: model_state_dict, config, optimizer_state_dict, and one of accuracy / test_acc / step / epoch. The config dict stores architectural hyperparameters only.

Loading a checkpoint

from huggingface_hub import hf_hub_download
import torch

path = hf_hub_download(
    repo_id="Sparsity-Moves-Computation/moe-redistribution-checkpoints",
    filename="add7_ffn_nonorm_s42/best_model.pt",
)
ck = torch.load(path, weights_only=False, map_location="cpu")
print(ck["config"])          # architectural hyperparameters
print(ck["accuracy"])        # final eval accuracy

Bulk download

huggingface-cli download \
    Sparsity-Moves-Computation/moe-redistribution-checkpoints \
    --local-dir checkpoints/

Reproducibility

The link is provided in the paper. Loading these checkpoints with that repository's OneLayerTransformer class (in model/model.py) reproduces every result in the paper.

Notes

  • All identifying information has been removed from filenames, folder names, and .pt config dicts.
  • The repository is anonymized for the duration.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support