Instructions to use LanguageBind/UniWorld-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- univa
How to use LanguageBind/UniWorld-V1 with univa:
# Follow installation instructions at https://github.com/PKU-YuanGroup/UniWorld-V1 from univa.models.qwen2p5vl.modeling_univa_qwen2p5vl import UnivaQwen2p5VLForConditionalGeneration model = UnivaQwen2p5VLForConditionalGeneration.from_pretrained( "LanguageBind/UniWorld-V1", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", ).to("cuda") processor = AutoProcessor.from_pretrained("LanguageBind/UniWorld-V1") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -334,18 +334,37 @@ For more details, please refer to the [Contribution Guidelines](docs/Contributio
|
|
| 334 |
# ✏️ Citing
|
| 335 |
|
| 336 |
|
|
|
|
| 337 |
```bibtex
|
| 338 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
```
|
| 340 |
|
| 341 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 342 |
@article{lin2024open,
|
| 343 |
title={Open-Sora Plan: Open-Source Large Video Generation Model},
|
| 344 |
author={Lin, Bin and Ge, Yunyang and Cheng, Xinhua and Li, Zongjian and Zhu, Bin and Wang, Shaodong and He, Xianyi and Ye, Yang and Yuan, Shenghai and Chen, Liuhan and others},
|
| 345 |
journal={arXiv preprint arXiv:2412.00131},
|
| 346 |
year={2024}
|
| 347 |
}
|
| 348 |
-
```
|
| 349 |
|
| 350 |
|
| 351 |
# 🤝 Community contributors
|
|
|
|
| 334 |
# ✏️ Citing
|
| 335 |
|
| 336 |
|
| 337 |
+
|
| 338 |
```bibtex
|
| 339 |
+
@misc{lin2025uniworldhighresolutionsemanticencoders,
|
| 340 |
+
title={UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation},
|
| 341 |
+
author={Bin Lin and Zongjian Li and Xinhua Cheng and Yuwei Niu and Yang Ye and Xianyi He and Shenghai Yuan and Wangbo Yu and Shaodong Wang and Yunyang Ge and Yatian Pang and Li Yuan},
|
| 342 |
+
year={2025},
|
| 343 |
+
eprint={2506.03147},
|
| 344 |
+
archivePrefix={arXiv},
|
| 345 |
+
primaryClass={cs.CV},
|
| 346 |
+
url={https://arxiv.org/abs/2506.03147},
|
| 347 |
+
}
|
| 348 |
```
|
| 349 |
|
| 350 |
+
|
| 351 |
+
```bibtex
|
| 352 |
+
@article{niu2025wise,
|
| 353 |
+
title={Wise: A world knowledge-informed semantic evaluation for text-to-image generation},
|
| 354 |
+
author={Niu, Yuwei and Ning, Munan and Zheng, Mengren and Lin, Bin and Jin, Peng and Liao, Jiaqi and Ning, Kunpeng and Zhu, Bin and Yuan, Li},
|
| 355 |
+
journal={arXiv preprint arXiv:2503.07265},
|
| 356 |
+
year={2025}
|
| 357 |
+
}
|
| 358 |
+
```
|
| 359 |
+
|
| 360 |
+
```bibtex
|
| 361 |
@article{lin2024open,
|
| 362 |
title={Open-Sora Plan: Open-Source Large Video Generation Model},
|
| 363 |
author={Lin, Bin and Ge, Yunyang and Cheng, Xinhua and Li, Zongjian and Zhu, Bin and Wang, Shaodong and He, Xianyi and Ye, Yang and Yuan, Shenghai and Chen, Liuhan and others},
|
| 364 |
journal={arXiv preprint arXiv:2412.00131},
|
| 365 |
year={2024}
|
| 366 |
}
|
| 367 |
+
```
|
| 368 |
|
| 369 |
|
| 370 |
# 🤝 Community contributors
|