THUdyh
/

Ola-7b

@@ -17,9 +17,9 @@ Based on Qwen2.5 language model, it is trained on text, image, video and audio d
 Ola offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.
-- **Repository:** https://github.com/xxxxx
 - **Languages:** English, Chinese
-- **Paper:** https://arxiv.org/abs/2501.xxxx
 ## Use
@@ -314,3 +314,9 @@ def ola_inference(multimodal, audio_path):
 - **Code:** Pytorch
 ## Citation

 Ola offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.
+- **Repository:** https://github.com/Ola-Omni/Ola
 - **Languages:** English, Chinese
+- **Paper:** https://arxiv.org/abs/2502.04328
 ## Use
 - **Code:** Pytorch
 ## Citation
+@article{liu2025ola,
+title={Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment},
+author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
+journal={arXiv preprint arXiv:2502.04328},
+year={2025}
+}