Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,45 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
---
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
---
|
| 7 |
+
# Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction (CVPR 2025)
|
| 8 |
+
|
| 9 |
+
[[Paper Link]](https://arxiv.org/abs/2501.03218) [[Github repo]](https://github.com/Mark12Ding/Dispider)
|
| 10 |
+
|
| 11 |
+
This repository contains the checkpoint (`ckpt`) of the **Dispider** model, a multimodal model designed for [online VideoLLM].
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
## Quick Start
|
| 15 |
+
First download the checkpoints at the folder.
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
**Important**: Modify the ``mm_compressor`` path in config.json to align with your local environment. The checkpoint for ``mm_compressor`` is located within a sub-folder of this repository.
|
| 19 |
+
|
| 20 |
+
For detailed evaluation, please refer to [Github repo](https://github.com/Mark12Ding/Dispider).
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
## ✒️ Citation
|
| 28 |
+
If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝.
|
| 29 |
+
```bibtex
|
| 30 |
+
@article{qian2025dispider,
|
| 31 |
+
title={Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction},
|
| 32 |
+
author={Qian, Rui and Ding, Shuangrui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Lin, Dahua and Wang, Jiaqi},
|
| 33 |
+
journal={arXiv preprint arXiv:2501.03218},
|
| 34 |
+
year={2025}
|
| 35 |
+
}
|
| 36 |
+
|
| 37 |
+
@article{qian2025streaming,
|
| 38 |
+
title={Streaming long video understanding with large language models},
|
| 39 |
+
author={Qian, Rui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Ding, Shuangrui and Lin, Dahua and Wang, Jiaqi},
|
| 40 |
+
journal={Advances in Neural Information Processing Systems},
|
| 41 |
+
volume={37},
|
| 42 |
+
pages={119336--119360},
|
| 43 |
+
year={2025}
|
| 44 |
+
}
|
| 45 |
+
```
|