--- license: bsd-3-clause pipeline_tag: video-text-to-text --- # VideoMind-2B-FT-QVHighlights

VideoMind is a multi-modal agent framework that enhances video reasoning by emulating *human-like* processes, such as *breaking down tasks*, *localizing and verifying moments*, and *synthesizing answers*. ## 🔖 Model Details ### Model Description - **Model type:** Multi-modal Large Language Model - **Language(s):** English - **License:** BSD-3-Clause ### More Details Please refer to our [GitHub Repository](https://github.com/yeliudev/VideoMind) for more details about this model. ## 📖 Citation Please kindly cite our paper if you find this project helpful. ``` @article{liu2025videomind, title={VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning}, author={Liu, Ye and Lin, Kevin Qinghong and Chen, Chang Wen and Shou, Mike Zheng}, journal={arXiv preprint arXiv:2503.13444}, year={2025} } ```