| base_model: | |
| - Efficient-Large-Model/Sana_1600M_1024px_BF16 | |
| - VIPL-GENUN/Jodi | |
| tags: | |
| - Diffusion | |
| - Text-to-Image | |
| - Controllable-Generation | |
| - Image-Perception | |
| pipeline_tag: image-to-image | |
| library_name: diffusers | |
| license: apache-2.0 | |
| # Jodi | |
| We introduce Jodi, a diffusion framework that unifies visual generation and understanding by jointly modeling the image domain and multiple label domains. | |
| - **arXiv**: <https://arxiv.org/abs/2505.19084> | |
| - **Project page**: <https://VIPL-GENUN.github.io/Project-Jodi> | |
| - **GitHub**: <https://github.com/VIPL-GENUN/Jodi> | |
|  |