Add model card for SphereAR
#3
by
nielsr
HF Staff
- opened
README.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: text-to-image
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# SphereAR: Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
|
| 6 |
+
|
| 7 |
+
This repository contains the official PyTorch implementation of the paper [Hyperspherical Latents Improve Continuous-Token Autoregressive Generation](https://huggingface.co/papers/2509.24335).
|
| 8 |
+
|
| 9 |
+
SphereAR proposes a simple yet effective approach to continuous-token autoregressive (AR) image generation. It addresses issues like heterogeneous variance in VAE latents, which is amplified during AR decoding, by constraining all AR inputs and outputs---including after Classifier-Free Guidance (CFG)---to lie on a fixed-radius hypersphere (constant $\ell_2$ norm) via hyperspherical VAEs. This approach removes the scale component, thereby stabilizing AR decoding.
|
| 10 |
+
|
| 11 |
+
The model is a pure next-token AR generator with raster order. Empirically, on ImageNet 256×256 generation, SphereAR-H (943M) achieves a new state-of-the-art for AR models, reaching FID 1.34.
|
| 12 |
+
|
| 13 |
+
For more details, including implementation, training, and evaluation scripts, please refer to the [official GitHub repository](https://github.com/guolinke/SphereAR).
|
| 14 |
+
|
| 15 |
+
## Model Checkpoints
|
| 16 |
+
|
| 17 |
+
Pre-trained model checkpoints are available:
|
| 18 |
+
|
| 19 |
+
| Name | params | FID (256x256) | weight |
|
| 20 |
+
|---|:---:|:---:|:---:|
|
| 21 |
+
| S-VAE | 75M | - | [vae.pt](https://huggingface.co/guolinke/SphereAR/blob/main/vae.pt) |
|
| 22 |
+
| SphereAR-B | 208M | 1.92 | [SphereAR_B.pt](https://huggingface.co/guolinke/SphereAR/blob/main/SphereAR_B.pt) |
|
| 23 |
+
| SphereAR-L | 479M | 1.54 | [SphereAR_L.pt](https://huggingface.co/guolinke/SphereAR/blob/main/SphereAR_L.pt) |
|
| 24 |
+
| SphereAR-H | 943M | 1.34 | [SphereAR_H.pt](https://huggingface.co/guolinke/SphereAR/blob/main/SphereAR_H.pt) |
|
| 25 |
+
|
| 26 |
+
## Citation
|
| 27 |
+
|
| 28 |
+
If you find this work useful, please consider citing the paper:
|
| 29 |
+
|
| 30 |
+
```bibtex
|
| 31 |
+
@article{ke2025hyperspherical,
|
| 32 |
+
title={Hyperspherical Latents Improve Continuous-Token Autoregressive Generation},
|
| 33 |
+
author={Guolin Ke and Hui Xue},
|
| 34 |
+
journal={arXiv preprint arXiv:2509.24335},
|
| 35 |
+
year={2025}
|
| 36 |
+
}
|
| 37 |
+
```
|