Mar2Ding
/

Dispider

Safetensors

qwen2

Model card Files Files and versions

xet

Community

Mar2Ding commited on Mar 11, 2025

Commit

0f246e9

verified ·

1 Parent(s): 89170b7

Update README.md

Browse files

Files changed (1) hide show

README.md +45 -3

README.md CHANGED Viewed

@@ -1,3 +1,45 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+---
+license: apache-2.0
+---
+# Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction (CVPR 2025)
+[[Paper Link]](https://arxiv.org/abs/2501.03218)   [[Github repo]](https://github.com/Mark12Ding/Dispider)
+This repository contains the checkpoint (`ckpt`) of the **Dispider** model, a multimodal model designed for [online VideoLLM].
+## Quick Start
+First download the checkpoints at the folder.
+**Important**: Modify the ``mm_compressor`` path in config.json to align with your local environment. The checkpoint for ``mm_compressor`` is located within a sub-folder of this repository.
+For detailed evaluation, please refer to [Github repo](https://github.com/Mark12Ding/Dispider).
+## ✒️ Citation
+If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝.
+```bibtex
+@article{qian2025dispider,
+        title={Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction},
+        author={Qian, Rui and Ding, Shuangrui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Lin, Dahua and Wang, Jiaqi},
+        journal={arXiv preprint arXiv:2501.03218},
+        year={2025}
+      }
+@article{qian2025streaming,
+  title={Streaming long video understanding with large language models},
+  author={Qian, Rui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Ding, Shuangrui and Lin, Dahua and Wang, Jiaqi},
+  journal={Advances in Neural Information Processing Systems},
+  volume={37},
+  pages={119336--119360},
+  year={2025}
+}
+```