EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
π Introduction
We propose EchoGen, the first feed-forward subject-driven image synthesis framework built on Visual Auto-Regressive models (VAR), capable of generating faithful renditions of a given subject in arbitrary text-described scenes without any test-time optimization. Unlike prior subject-driven approaches, EchoGen leverages the efficiency and hierarchical generation capabilities of visual autoregressive models to enable subject-driven generation. Evaluated on DreamBench and human preference benchmarks, EchoGen achieves subject fidelity, text alignment, and image quality that are comparable to or better than leading diffusion-based methods, while offering substantially faster sampling.
π Note
This repo is used for hosting EchoGen's checkpoints. For more details, please refer to
π Citation
If our work contributes to your research, please don't hesitate to give us a star β or cite us as follows:
@inproceedings{
dong2026echogen,
title={EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model},
author={Ruixiao Dong and Zhendong Wang and Keli Liu and Li Li and Ying Chen and Kai Li and Daowen Li and Houqiang Li},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=ctmyCjo18u}
}