Add model card for SphereAR

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-to-image
3
+ library_name: pytorch
4
+ ---
5
+
6
+ # SphereAR: Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
7
+
8
+ This repository contains the official PyTorch implementation of the paper [Hyperspherical Latents Improve Continuous-Token Autoregressive Generation](https://huggingface.co/papers/2509.24335).
9
+
10
+ The official code and further details can be found on the GitHub repository: [https://github.com/guolinke/SphereAR](https://github.com/guolinke/SphereAR)
11
+
12
+ ## Abstract
13
+ Autoregressive (AR) models are promising for image generation, yet continuous-token AR variants often trail latent diffusion and masked-generation models. The core issue is heterogeneous variance in VAE latents, which is amplified during AR decoding, especially under classifier-free guidance (CFG), and can cause variance collapse. We propose SphereAR to address this issue. Its core design is to constrain all AR inputs and outputs -- including after CFG -- to lie on a fixed-radius hypersphere (constant $\ell_2$ norm), leveraging hyperspherical VAEs. Our theoretical analysis shows that hyperspherical constraint removes the scale component (the primary cause of variance collapse), thereby stabilizing AR decoding. Empirically, on ImageNet generation, SphereAR-H (943M) sets a new state of the art for AR models, achieving FID 1.34. Even at smaller scales, SphereAR-L (479M) reaches FID 1.54 and SphereAR-B (208M) reaches 1.92, matching or surpassing much larger baselines such as MAR-H (943M, 1.55) and VAR-d30 (2B, 1.92). To our knowledge, this is the first time a pure next-token AR image generator with raster order surpasses diffusion and masked-generation models at comparable parameter scales.
14
+
15
+ <p align="center">
16
+ <img src="https://github.com/guolinke/SphereAR/raw/main/figures/grid.jpg" width=780>
17
+ <p>
18
+
19
+ ## Introduction
20
+ SphereAR is a simple yet effective approach to continuous-token autoregressive (AR) image generation: it makes AR scale-invariant by constraining all AR inputs and outputs---**including after CFG**---to lie on a fixed-radius hypersphere (constant L2 norm) via hyperspherical VAEs.
21
+
22
+ The model is a **pure next-token** AR generator with **raster** order, matching standard language AR modeling (i.e., it is *not* next-scale AR like VAR and *not* next-set AR like MAR/MaskGIT).
23
+
24
+ On ImageNet 256×256, SphereAR achieves a state-of-the-art FID of **1.34** among AR image generators.
25
+
26
+ ## Model Checkpoints
27
+ The following pre-trained models are available for class-conditional image generation on ImageNet:
28
+
29
+ | Name | params | FID (256x256) | weight |
30
+ |---|:---:|:---:|:---:|
31
+ | S-VAE | 75M | - | [vae.pt](https://huggingface.co/guolinke/SphereAR/blob/main/vae.pt) |
32
+ | SphereAR-B | 208M | 1.92 | [SphereAR_B.pt](https://huggingface.co/guolinke/SphereAR/blob/main/SphereAR_B.pt) |
33
+ | SphereAR-L | 479M | 1.54 | [SphereAR_L.pt](https://huggingface.co/guolinke/SphereAR/blob/main/SphereAR_L.pt) |
34
+ | SphereAR-H | 943M | 1.34 | [SphereAR_H.pt](https://huggingface.co/guolinke/SphereAR/blob/main/SphereAR_H.pt) |
35
+
36
+ For detailed instructions on evaluation and training using these checkpoints, please refer to the [official GitHub repository](https://github.com/guolinke/SphereAR).
37
+
38
+ ## Citation
39
+ If you find this work useful, please consider citing the paper:
40
+
41
+ ```bibtex
42
+ @article{ke2025hyperspherical,
43
+ title={Hyperspherical Latents Improve Continuous-Token Autoregressive Generation},
44
+ author={Guolin Ke and Hui Xue},
45
+ journal={arXiv preprint arXiv:2509.24335},
46
+ year={2025}
47
+ }
48
+ ```