EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model

arXiv  code  huggingface 

EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model

πŸ“– Introduction

We propose EchoGen, the first feed-forward subject-driven image synthesis framework built on Visual Auto-Regressive models (VAR), capable of generating faithful renditions of a given subject in arbitrary text-described scenes without any test-time optimization. Unlike prior subject-driven approaches, EchoGen leverages the efficiency and hierarchical generation capabilities of visual autoregressive models to enable subject-driven generation. Evaluated on DreamBench and human preference benchmarks, EchoGen achieves subject fidelity, text alignment, and image quality that are comparable to or better than leading diffusion-based methods, while offering substantially faster sampling.

πŸ“Œ Note

This repo is used for hosting EchoGen's checkpoints. For more details, please refer to code 

πŸ“– Citation

If our work contributes to your research, please don't hesitate to give us a star ⭐ or cite us as follows:

@inproceedings{
    dong2026echogen,
    title={EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model},
    author={Ruixiao Dong and Zhendong Wang and Keli Liu and Li Li and Ying Chen and Kai Li and Daowen Li and Houqiang Li},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=ctmyCjo18u}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for UmiSonoda16/EchoGen