--- base_model: - Efficient-Large-Model/Sana_1600M_1024px_BF16 - VIPL-GENUN/Jodi tags: - Diffusion - Text-to-Image - Controllable-Generation - Image-Perception --- # Jodi We introduce Jodi, a diffusion framework that unifies visual generation and understanding by jointly modeling the image domain and multiple label domains. - **arXiv**: - **Project page**: - **GitHub**: - **Joint-1.6M Dataset**: ![](./assets/banner.jpg)
# Citation If you find this project helpful, please consider citing: ```bibtex @article{xu2025jodi, title={Jodi: Unification of Visual Generation and Understanding via Joint Modeling}, author={Xu, Yifeng and He, Zhenliang and Kan, Meina and Shan, Shiguang and Chen, Xilin}, journal={arXiv preprint arXiv:2505.19084}, year={2025} } ```