metadata
base_model:
- black-forest-labs/FLUX.1-dev
- stabilityai/stable-diffusion-3.5-medium
library_name: diffusers
license: mit
pipeline_tag: text-to-image
TACA: Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
1The University of Hong Kong
2Nanjing University
3University of Chinese Academy of Sciences 4Nanyang Technological University
5Harbin Institute of Technology
3University of Chinese Academy of Sciences 4Nanyang Technological University
5Harbin Institute of Technology
(*Equal Contribution. ‡Project Leader. †Corresponding Author.)
Paper | Project Page | LoRA Weights | Code
About
We propose TACA, a parameter-efficient method that dynamically rebalances cross-modal attention in multimodal diffusion transformers to improve text-image alignment.