lijiang
/

Omni-Diffusion

Model card Files Files and versions

lijiang commited on Mar 12

Commit

2233ee1

·

verified ·

1 Parent(s): 71ca865

Update README.md

Files changed (1) hide show

README.md +1 -5

README.md CHANGED Viewed

@@ -12,10 +12,6 @@ Omni-Diffusion is the first any-to-any multimodal language model built entirely
 - **Project Page:** [https://omni-diffusion.github.io](https://omni-diffusion.github.io)
 - **Repository:** [https://github.com/VITA-MLLM/Omni-Diffusion](https://github.com/VITA-MLLM/Omni-Diffusion)
-## Model Description
-Omni-Diffusion employs a unified mask-based discrete diffusion model to capture the joint distribution over discrete multimodal tokens. This approach supports not only bimodal tasks (such as text-to-image or speech-to-text) but also more complex scenarios involving multiple modalities simultaneously, such as spoken visual question answering. On a diverse set of benchmarks, the method outperforms or performs on par with existing multimodal systems, highlighting the potential of diffusion models for multimodal foundation models.
 ## Usage
 As the model uses a custom architecture, it can be loaded using the `transformers` library with `trust_remote_code=True`:
@@ -35,7 +31,7 @@ If you find this work helpful for your research, please consider citing:
 ```bibtex
 @article{li2026omni,
   title={Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion},
-  author={Li, Lijiang and Long, Zuwei) and Shen, Yunhang and Gao, Heting and Cao, Haoyu and Sun, Xing and Shan, Caifeng and He, Ran and Fu, Chaoyou},
   journal={arXiv preprint arXiv:2603.06577},
   year={2026}
 }

 - **Project Page:** [https://omni-diffusion.github.io](https://omni-diffusion.github.io)
 - **Repository:** [https://github.com/VITA-MLLM/Omni-Diffusion](https://github.com/VITA-MLLM/Omni-Diffusion)
 ## Usage
 As the model uses a custom architecture, it can be loaded using the `transformers` library with `trust_remote_code=True`:
 ```bibtex
 @article{li2026omni,
   title={Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion},
+  author={Li, Lijiang and Long, Zuwei and Shen, Yunhang and Gao, Heting and Cao, Haoyu and Sun, Xing and Shan, Caifeng and He, Ran and Fu, Chaoyou},
   journal={arXiv preprint arXiv:2603.06577},
   year={2026}
 }