Instructions to use lijiang/Omni-Diffusion with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lijiang/Omni-Diffusion with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("lijiang/Omni-Diffusion", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -12,10 +12,6 @@ Omni-Diffusion is the first any-to-any multimodal language model built entirely
|
|
| 12 |
- **Project Page:** [https://omni-diffusion.github.io](https://omni-diffusion.github.io)
|
| 13 |
- **Repository:** [https://github.com/VITA-MLLM/Omni-Diffusion](https://github.com/VITA-MLLM/Omni-Diffusion)
|
| 14 |
|
| 15 |
-
## Model Description
|
| 16 |
-
|
| 17 |
-
Omni-Diffusion employs a unified mask-based discrete diffusion model to capture the joint distribution over discrete multimodal tokens. This approach supports not only bimodal tasks (such as text-to-image or speech-to-text) but also more complex scenarios involving multiple modalities simultaneously, such as spoken visual question answering. On a diverse set of benchmarks, the method outperforms or performs on par with existing multimodal systems, highlighting the potential of diffusion models for multimodal foundation models.
|
| 18 |
-
|
| 19 |
## Usage
|
| 20 |
|
| 21 |
As the model uses a custom architecture, it can be loaded using the `transformers` library with `trust_remote_code=True`:
|
|
@@ -35,7 +31,7 @@ If you find this work helpful for your research, please consider citing:
|
|
| 35 |
```bibtex
|
| 36 |
@article{li2026omni,
|
| 37 |
title={Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion},
|
| 38 |
-
author={Li, Lijiang and Long, Zuwei
|
| 39 |
journal={arXiv preprint arXiv:2603.06577},
|
| 40 |
year={2026}
|
| 41 |
}
|
|
|
|
| 12 |
- **Project Page:** [https://omni-diffusion.github.io](https://omni-diffusion.github.io)
|
| 13 |
- **Repository:** [https://github.com/VITA-MLLM/Omni-Diffusion](https://github.com/VITA-MLLM/Omni-Diffusion)
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
## Usage
|
| 16 |
|
| 17 |
As the model uses a custom architecture, it can be loaded using the `transformers` library with `trust_remote_code=True`:
|
|
|
|
| 31 |
```bibtex
|
| 32 |
@article{li2026omni,
|
| 33 |
title={Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion},
|
| 34 |
+
author={Li, Lijiang and Long, Zuwei and Shen, Yunhang and Gao, Heting and Cao, Haoyu and Sun, Xing and Shan, Caifeng and He, Ran and Fu, Chaoyou},
|
| 35 |
journal={arXiv preprint arXiv:2603.06577},
|
| 36 |
year={2026}
|
| 37 |
}
|