| pipeline_tag: image-to-image | |
| library_name: transformers | |
| This repository contains the model described in [Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations](https://huggingface.co/papers/2506.18898). | |
| Project page: https://tar.csuhan.com |