SphereAR / README.md

nielsr HF Staff

Add model card for SphereAR

2d7af34 verified 4 months ago

preview code

raw

history blame

2.02 kB

metadata

pipeline_tag: text-to-image

SphereAR: Hyperspherical Latents Improve Continuous-Token Autoregressive Generation

This repository contains the official PyTorch implementation of the paper Hyperspherical Latents Improve Continuous-Token Autoregressive Generation.

SphereAR proposes a simple yet effective approach to continuous-token autoregressive (AR) image generation. It addresses issues like heterogeneous variance in VAE latents, which is amplified during AR decoding, by constraining all AR inputs and outputs---including after Classifier-Free Guidance (CFG)---to lie on a fixed-radius hypersphere (constant $\ell_2$ norm) via hyperspherical VAEs. This approach removes the scale component, thereby stabilizing AR decoding.

The model is a pure next-token AR generator with raster order. Empirically, on ImageNet 256×256 generation, SphereAR-H (943M) achieves a new state-of-the-art for AR models, reaching FID 1.34.

For more details, including implementation, training, and evaluation scripts, please refer to the official GitHub repository.

Model Checkpoints

Pre-trained model checkpoints are available:

Name	params	FID (256x256)	weight
S-VAE	75M	-	vae.pt
SphereAR-B	208M	1.92	SphereAR_B.pt
SphereAR-L	479M	1.54	SphereAR_L.pt
SphereAR-H	943M	1.34	SphereAR_H.pt

Citation

If you find this work useful, please consider citing the paper:

@article{ke2025hyperspherical,
   title={Hyperspherical Latents Improve Continuous-Token Autoregressive Generation}, 
   author={Guolin Ke and Hui Xue},
   journal={arXiv preprint arXiv:2509.24335},
   year={2025}
}