Dispider / README.md

Mar2Ding

Update README.md

0f246e9 verified about 1 year ago

1.64 kB

license: apache-2.0

license: apache-2.0

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction (CVPR 2025)

[Paper Link] [Github repo]

This repository contains the checkpoint (ckpt) of the Dispider model, a multimodal model designed for [online VideoLLM].

Quick Start

First download the checkpoints at the folder.

Important: Modify the mm_compressor path in config.json to align with your local environment. The checkpoint for mm_compressor is located within a sub-folder of this repository.

For detailed evaluation, please refer to Github repo.

✒️ Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝.

@article{qian2025dispider,
        title={Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction},
        author={Qian, Rui and Ding, Shuangrui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Cao, Yuhang and Lin, Dahua and Wang, Jiaqi},
        journal={arXiv preprint arXiv:2501.03218},
        year={2025}
      }

@article{qian2025streaming,
  title={Streaming long video understanding with large language models},
  author={Qian, Rui and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Ding, Shuangrui and Lin, Dahua and Wang, Jiaqi},
  journal={Advances in Neural Information Processing Systems},
  volume={37},
  pages={119336--119360},
  year={2025}
}