Improve model card: add pipeline tag, paper info, code link, and sample usage

This PR enhances the model card by:
- Adding the `pipeline_tag: text-to-image` to correctly categorize the model for discovery on the Hugging Face Hub.
- Including the paper title and a link to its official Hugging Face paper page.
- Providing the full abstract from the paper for comprehensive understanding.
- Adding a direct link to the official GitHub repository for easy access to the code.
- Incorporating a detailed sample usage section, including installation instructions and inference commands, directly from the official GitHub repository's README to guide users on how to run the model.
- Adding a visual representation of the model from the GitHub README.

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,3 +1,66 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: text-to-image
+---
+# Go with Your Gut: Scaling Confidence for Autoregressive Image Generation
+This repository contains the official implementation for the paper [Go with Your Gut: Scaling Confidence for Autoregressive Image Generation](https://huggingface.co/papers/2509.26376).
+## Abstract
+Test-time scaling (TTS) has demonstrated remarkable success in enhancing large language models, yet its application to next-token prediction (NTP) autoregressive (AR) image generation remains largely uncharted. Existing TTS approaches for visual AR (VAR), which rely on frequent partial decoding and external reward models, are ill-suited for NTP-based image generation due to the inherent incompleteness of intermediate decoding results. To bridge this gap, we introduce ScalingAR, the first TTS framework specifically designed for NTP-based AR image generation that eliminates the need for early decoding or auxiliary rewards. ScalingAR leverages token entropy as a novel signal in visual token generation and operates at two complementary scaling levels: (i) Profile Level, which streams a calibrated confidence state by fusing intrinsic and conditional signals; and (ii) Policy Level, which utilizes this state to adaptively terminate low-confidence trajectories and dynamically schedule guidance for phase-appropriate conditioning strength. Experiments on both general and compositional benchmarks show that ScalingAR (1) improves base models by 12.5% on GenEval and 15.2% on TIIF-Bench, (2) efficiently reduces visual token consumption by 62.0% while outperforming baselines, and (3) successfully enhances robustness, mitigating performance drops by 26.0% in challenging scenarios.
+<div align="center">
+    <img src="https://github.com/EnVision-Research/ScalingAR/raw/main/asset/scalingar.png" alt="ScalingAR overview image">
+</div>
+**Code:** [https://github.com/EnVision-Research/ScalingAR](https://github.com/EnVision-Research/ScalingAR)
+## Sample Usage
+### Installation
+1.  Clone this repository and navigate to the source folder
+    ```bash
+    git clone https://github.com/EnVision-Research/ScalingAR
+    cd ScalingAR
+    ```
+2.  Build Environment
+    ```Shell
+    echo "Creating conda environment"
+    conda create -n ScalingAR python=3.10
+    conda activate ScalingAR
+    echo "Installing dependencies"
+    pip install -r requirements.txt
+    ```
+### Inference
+**LlamaGen**
+```bash
+PYTHONPATH=. python llamagen/sample_entropy.py --vq-ckpt ${VQ_CKPT} --gpt-ckpt ${LlamaGen_CKPT} --gpt-model GPT-XL --t5-path ${T5_PATH} --image-size 512
+```
+**AR-GRPO**
+```bash
+PYTHONPATH=. python AR_GRPO/sample_entropy.py --ckpt-path ${AR-GRPO_CKPT} --t5-path ${T5_PATH} --delay_load_text_encoder True --image-size 256
+```
+## Citation
+Please consider citing our paper if our code is useful:
+```bib
+@article{chen2025go,
+  title={Go with Your Gut: Scaling Confidence for Autoregressive Image Generation},
+  author={Chen, Harold Haodong and Wu, Xianfeng and Shu, Wen-Jie and Guo, Rongjin and Lan, Disen and Yang, Harry and Chen, Ying-Cong},
+  journal={arXiv preprint arXiv:2509.26376},
+  year={2025}
+}
+```