utter-project
/

TowerVideo-2B

Video-Text-to-Text

llava_onevision

image-text-to-text

Model card Files Files and versions

Guilherme Viveiros commited on Oct 28, 2025

Commit

5247229

·

verified ·

1 Parent(s): fd859ee

Update README.md

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -191,12 +191,14 @@ TowerVision excels particularly in multimodal multilingual translation benchmark
 If you find TowerVideo useful in your research, please consider citing the following paper:
 ```bibtex
-@article{towervision2025,
-  title={Understanding and Improving Multilinguality in Vision-Language Models},
-  author={[Authors to be added]},
-  journal={[Journal to be added]},
-  year={2025},
-  note={Paper in preparation}
 }
 ```

 If you find TowerVideo useful in your research, please consider citing the following paper:
 ```bibtex
+@misc{viveiros2025towervisionunderstandingimprovingmultilinguality,
+      title={TowerVision: Understanding and Improving Multilinguality in Vision-Language Models},
+      author={André G. Viveiros and Patrick Fernandes and Saul Santos and Sonal Sannigrahi and Emmanouil Zaranis and Nuno M. Guerreiro and Amin Farajian and Pierre Colombo and Graham Neubig and André F. T. Martins},
+      year={2025},
+      eprint={2510.21849},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2510.21849},
 }
 ```