nielsr HF Staff commited on
Commit
d097927
·
verified ·
1 Parent(s): 276f5c5

Improve model card: add pipeline tag, paper info, code link, and sample usage

Browse files

This PR enhances the model card by:
- Adding the `pipeline_tag: text-to-image` to correctly categorize the model for discovery on the Hugging Face Hub.
- Including the paper title and a link to its official Hugging Face paper page.
- Providing the full abstract from the paper for comprehensive understanding.
- Adding a direct link to the official GitHub repository for easy access to the code.
- Incorporating a detailed sample usage section, including installation instructions and inference commands, directly from the official GitHub repository's README to guide users on how to run the model.
- Adding a visual representation of the model from the GitHub README.

Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-to-image
4
+ ---
5
+
6
+ # Go with Your Gut: Scaling Confidence for Autoregressive Image Generation
7
+
8
+ This repository contains the official implementation for the paper [Go with Your Gut: Scaling Confidence for Autoregressive Image Generation](https://huggingface.co/papers/2509.26376).
9
+
10
+ ## Abstract
11
+
12
+ Test-time scaling (TTS) has demonstrated remarkable success in enhancing large language models, yet its application to next-token prediction (NTP) autoregressive (AR) image generation remains largely uncharted. Existing TTS approaches for visual AR (VAR), which rely on frequent partial decoding and external reward models, are ill-suited for NTP-based image generation due to the inherent incompleteness of intermediate decoding results. To bridge this gap, we introduce ScalingAR, the first TTS framework specifically designed for NTP-based AR image generation that eliminates the need for early decoding or auxiliary rewards. ScalingAR leverages token entropy as a novel signal in visual token generation and operates at two complementary scaling levels: (i) Profile Level, which streams a calibrated confidence state by fusing intrinsic and conditional signals; and (ii) Policy Level, which utilizes this state to adaptively terminate low-confidence trajectories and dynamically schedule guidance for phase-appropriate conditioning strength. Experiments on both general and compositional benchmarks show that ScalingAR (1) improves base models by 12.5% on GenEval and 15.2% on TIIF-Bench, (2) efficiently reduces visual token consumption by 62.0% while outperforming baselines, and (3) successfully enhances robustness, mitigating performance drops by 26.0% in challenging scenarios.
13
+
14
+ <div align="center">
15
+ <img src="https://github.com/EnVision-Research/ScalingAR/raw/main/asset/scalingar.png" alt="ScalingAR overview image">
16
+ </div>
17
+
18
+ **Code:** [https://github.com/EnVision-Research/ScalingAR](https://github.com/EnVision-Research/ScalingAR)
19
+
20
+ ## Sample Usage
21
+
22
+ ### Installation
23
+
24
+ 1. Clone this repository and navigate to the source folder
25
+ ```bash
26
+ git clone https://github.com/EnVision-Research/ScalingAR
27
+ cd ScalingAR
28
+ ```
29
+
30
+ 2. Build Environment
31
+
32
+ ```Shell
33
+ echo "Creating conda environment"
34
+ conda create -n ScalingAR python=3.10
35
+ conda activate ScalingAR
36
+
37
+ echo "Installing dependencies"
38
+ pip install -r requirements.txt
39
+ ```
40
+
41
+ ### Inference
42
+
43
+ **LlamaGen**
44
+
45
+ ```bash
46
+ PYTHONPATH=. python llamagen/sample_entropy.py --vq-ckpt ${VQ_CKPT} --gpt-ckpt ${LlamaGen_CKPT} --gpt-model GPT-XL --t5-path ${T5_PATH} --image-size 512
47
+ ```
48
+
49
+ **AR-GRPO**
50
+
51
+ ```bash
52
+ PYTHONPATH=. python AR_GRPO/sample_entropy.py --ckpt-path ${AR-GRPO_CKPT} --t5-path ${T5_PATH} --delay_load_text_encoder True --image-size 256
53
+ ```
54
+
55
+ ## Citation
56
+
57
+ Please consider citing our paper if our code is useful:
58
+
59
+ ```bib
60
+ @article{chen2025go,
61
+ title={Go with Your Gut: Scaling Confidence for Autoregressive Image Generation},
62
+ author={Chen, Harold Haodong and Wu, Xianfeng and Shu, Wen-Jie and Guo, Rongjin and Lan, Disen and Yang, Harry and Chen, Ying-Cong},
63
+ journal={arXiv preprint arXiv:2509.26376},
64
+ year={2025}
65
+ }
66
+ ```