Language Modeling with Hyperspherical Flows

By Justin Deschenaux and Caglar Gulcehre.

This repo hosts the pretrained checkpoints for Language Modeling with Hyperspherical Flows (𝕊-FLM). For the abstract, training/sampling code, and reproduction scripts, see the companion code repo: jdeschena/s-flm.

Checkpoints

𝕊-FLM and the baselines we compare against (AR, MDLM, Duo, FLM, CANDI), trained on TinyGSM (250k steps, SmolLM-135M tokenizer) and OpenWebText (1M steps, GPT-2 tokenizer).

tinygsm/{ar,mdlm,duo}.ckpt
tinygsm/candi/{lr3e-4,lr1e-3}.ckpt
tinygsm/flm/{default,caps}.ckpt
tinygsm/sfm/{sphere_dit_truncated_fixed_no_renorm,
             sphere_dit_truncated_adaptive_no_renorm,
             sphere_arch_truncated_adaptive_no_renorm}.ckpt

owt/{ar,mdlm,duo,flm,sfm}.ckpt

huggingface-cli download jdeschena/s-flm tinygsm/duo.ckpt --local-dir ./checkpoints

Loading and sampling are handled by the code repo — see jdeschena/s-flm for the scripts.

Citation

@misc{deschenaux2026languagemodelinghypersphericalflows,
      title={Language Modeling with Hyperspherical Flows}, 
      author={Justin Deschenaux and Caglar Gulcehre},
      year={2026},
      eprint={2605.11125},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.11125}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train jdeschena/s-flm

Paper for jdeschena/s-flm

Language Modeling with Hyperspherical Flows

Paper • 2605.11125 • Published 3 days ago