wyc-readme
Browse files
README.md
CHANGED
|
@@ -1,11 +1,6 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
-
# TimeChatOnline-7B
|
| 5 |
-
|
| 6 |
-
This is the official implementation of TimeChatOnline-7B from the paper *"TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos"*.
|
| 7 |
-
**Paper**: [arXiv:2504.17343](https://arxiv.org/pdf/2504.17343)
|
| 8 |
-
**Project page**: [https://timechat-online.github.io/](https://timechat-online.github.io/)
|
| 9 |
|
| 10 |
# TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos</h1>
|
| 11 |
|
|
@@ -13,19 +8,9 @@ This is the official implementation of TimeChatOnline-7B from the paper *"TimeCh
|
|
| 13 |
🔗 <a href="https://timechat-online.github.io/" target="_blank">Project Page</a> · 📖 <a href="https://arxiv.org/abs/2504.17343" target="_blank">Paper</a> · ⭐ <a href="https://github.com/yaolinli/TimeChat-Online" target="_blank">GitHub</a> · 📊 <a href="https://huggingface.co/datasets/yaolily/TimeChat-Online-139K" target="_blank">Dataset</a> · 🤗 <a href="https://huggingface.co/wyccccc/TimeChatOnline-7B" target="_blank">Checkpoints</a>
|
| 14 |
</p>
|
| 15 |
|
| 16 |
-
📰 **News**
|
| 17 |
-
|
| 18 |
-
- **[2025-05-01]** Our paper and project page are now available.
|
| 19 |
-
|
| 20 |
-
🚀 **Coming Soon**
|
| 21 |
-
|
| 22 |
-
- [ ] Model checkpoints and inference code
|
| 23 |
-
- [ ] Full training code, scripts, and benchmark evaluation tools
|
| 24 |
|
| 25 |
## Introduction
|
| 26 |
|
| 27 |
-
|
| 28 |
-
|
| 29 |
**TimeChat-Online** is a novel online VideoLLM designed for efficient streaming video understanding. Its core innovation, the **Differential Token Drop (DTD)** module, tackles visual redundancy by selectively preserving only meaningful temporal changes while eliminating static content between frames. Our experiments show that over 80% of streaming video content is naturally redundant without requiring user-query guidance.
|
| 30 |
|
| 31 |
|
|
@@ -81,7 +66,7 @@ pip install qwen-vl-utils[decord]==0.0.8
|
|
| 81 |
from transformers import AutoProcessor
|
| 82 |
from qwen_vl_utils import process_vision_info
|
| 83 |
import time
|
| 84 |
-
#pay attention to this line, not import from transformers
|
| 85 |
from eval.qwen2_5_vl import Qwen2_5_VLForConditionalGeneration
|
| 86 |
|
| 87 |
curr_time = datetime.now().strftime("%Y%m%d_%H%M%S")
|
|
@@ -150,21 +135,6 @@ output_text = processor.batch_decode(
|
|
| 150 |
print(output_text)
|
| 151 |
```
|
| 152 |
|
| 153 |
-
|
| 154 |
-
## Dataset: TimeChat-Online-139K
|
| 155 |
-
|
| 156 |
-
For flexible real-time interaction, we introduce a comprehensive streaming video dataset with backward-tracing, real-time visual perception, and future-responding scenarios.
|
| 157 |
-
|
| 158 |
-
- **11,043** visually informative videos (average duration: 11.1 minutes)
|
| 159 |
-
- **139K** question-answer pairs covering backward tracing, real-time visual perception, and forward active responding
|
| 160 |
-
- Average of **87.8** scene-oriented key frames per video (~7.14 seconds between consecutive frames)
|
| 161 |
-
|
| 162 |
-
[todo] We will release the video frames at 1 fps and the question-answer pairs soon.
|
| 163 |
-
|
| 164 |
-
## Training
|
| 165 |
-
We utilize the ms-swift framework for model training. Please note that the training script requires modifications to both ms-swift and transformers code. For detailed instructions, refer to the guidelines in [`train/readme.md`](./train/readme.md) before execution.
|
| 166 |
-
|
| 167 |
-
|
| 168 |
## Citation
|
| 169 |
|
| 170 |
If you find our work helpful, please consider citing:
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos</h1>
|
| 6 |
|
|
|
|
| 8 |
🔗 <a href="https://timechat-online.github.io/" target="_blank">Project Page</a> · 📖 <a href="https://arxiv.org/abs/2504.17343" target="_blank">Paper</a> · ⭐ <a href="https://github.com/yaolinli/TimeChat-Online" target="_blank">GitHub</a> · 📊 <a href="https://huggingface.co/datasets/yaolily/TimeChat-Online-139K" target="_blank">Dataset</a> · 🤗 <a href="https://huggingface.co/wyccccc/TimeChatOnline-7B" target="_blank">Checkpoints</a>
|
| 9 |
</p>
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
## Introduction
|
| 13 |
|
|
|
|
|
|
|
| 14 |
**TimeChat-Online** is a novel online VideoLLM designed for efficient streaming video understanding. Its core innovation, the **Differential Token Drop (DTD)** module, tackles visual redundancy by selectively preserving only meaningful temporal changes while eliminating static content between frames. Our experiments show that over 80% of streaming video content is naturally redundant without requiring user-query guidance.
|
| 15 |
|
| 16 |
|
|
|
|
| 66 |
from transformers import AutoProcessor
|
| 67 |
from qwen_vl_utils import process_vision_info
|
| 68 |
import time
|
| 69 |
+
#pay attention to this line, not import from transformers, import from our GitHub repo's eval folder qwen2_5_vl
|
| 70 |
from eval.qwen2_5_vl import Qwen2_5_VLForConditionalGeneration
|
| 71 |
|
| 72 |
curr_time = datetime.now().strftime("%Y%m%d_%H%M%S")
|
|
|
|
| 135 |
print(output_text)
|
| 136 |
```
|
| 137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
## Citation
|
| 139 |
|
| 140 |
If you find our work helpful, please consider citing:
|