EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model

📖 Introduction

We propose EchoGen, the first feed-forward subject-driven image synthesis framework built on Visual Auto-Regressive models (VAR), capable of generating faithful renditions of a given subject in arbitrary text-described scenes without any test-time optimization. Unlike prior subject-driven approaches, EchoGen leverages the efficiency and hierarchical generation capabilities of visual autoregressive models to enable subject-driven generation. Evaluated on DreamBench and human preference benchmarks, EchoGen achieves subject fidelity, text alignment, and image quality that are comparable to or better than leading diffusion-based methods, while offering substantially faster sampling.

📌 Note

This repo is used for hosting EchoGen's checkpoints. For more details, please refer to

📖 Citation

If our work contributes to your research, please don't hesitate to give us a star ⭐ or cite us as follows:

@inproceedings{
    dong2026echogen,
    title={EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model},
    author={Ruixiao Dong and Zhendong Wang and Keli Liu and Li Li and Ying Chen and Kai Li and Daowen Li and Houqiang Li},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=ctmyCjo18u}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for UmiSonoda16/EchoGen

EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model

Paper • 2509.26127 • Published Sep 30, 2025