YuchiWang commited on
Commit
4aa3f6f
·
1 Parent(s): 12053df

wyc-readme

Browse files
Files changed (1) hide show
  1. README.md +1 -31
README.md CHANGED
@@ -1,11 +1,6 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- # TimeChatOnline-7B
5
-
6
- This is the official implementation of TimeChatOnline-7B from the paper *"TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos"*.
7
- **Paper**: [arXiv:2504.17343](https://arxiv.org/pdf/2504.17343)
8
- **Project page**: [https://timechat-online.github.io/](https://timechat-online.github.io/)
9
 
10
  # TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos</h1>
11
 
@@ -13,19 +8,9 @@ This is the official implementation of TimeChatOnline-7B from the paper *"TimeCh
13
  🔗 <a href="https://timechat-online.github.io/" target="_blank">Project Page</a> · 📖 <a href="https://arxiv.org/abs/2504.17343" target="_blank">Paper</a> · ⭐ <a href="https://github.com/yaolinli/TimeChat-Online" target="_blank">GitHub</a> · 📊 <a href="https://huggingface.co/datasets/yaolily/TimeChat-Online-139K" target="_blank">Dataset</a> · 🤗 <a href="https://huggingface.co/wyccccc/TimeChatOnline-7B" target="_blank">Checkpoints</a>
14
  </p>
15
 
16
- 📰 **News**
17
-
18
- - **[2025-05-01]** Our paper and project page are now available.
19
-
20
- 🚀 **Coming Soon**
21
-
22
- - [ ] Model checkpoints and inference code
23
- - [ ] Full training code, scripts, and benchmark evaluation tools
24
 
25
  ## Introduction
26
 
27
-
28
-
29
  **TimeChat-Online** is a novel online VideoLLM designed for efficient streaming video understanding. Its core innovation, the **Differential Token Drop (DTD)** module, tackles visual redundancy by selectively preserving only meaningful temporal changes while eliminating static content between frames. Our experiments show that over 80% of streaming video content is naturally redundant without requiring user-query guidance.
30
 
31
 
@@ -81,7 +66,7 @@ pip install qwen-vl-utils[decord]==0.0.8
81
  from transformers import AutoProcessor
82
  from qwen_vl_utils import process_vision_info
83
  import time
84
- #pay attention to this line, not import from transformers
85
  from eval.qwen2_5_vl import Qwen2_5_VLForConditionalGeneration
86
 
87
  curr_time = datetime.now().strftime("%Y%m%d_%H%M%S")
@@ -150,21 +135,6 @@ output_text = processor.batch_decode(
150
  print(output_text)
151
  ```
152
 
153
-
154
- ## Dataset: TimeChat-Online-139K
155
-
156
- For flexible real-time interaction, we introduce a comprehensive streaming video dataset with backward-tracing, real-time visual perception, and future-responding scenarios.
157
-
158
- - **11,043** visually informative videos (average duration: 11.1 minutes)
159
- - **139K** question-answer pairs covering backward tracing, real-time visual perception, and forward active responding
160
- - Average of **87.8** scene-oriented key frames per video (~7.14 seconds between consecutive frames)
161
-
162
- [todo] We will release the video frames at 1 fps and the question-answer pairs soon.
163
-
164
- ## Training
165
- We utilize the ms-swift framework for model training. Please note that the training script requires modifications to both ms-swift and transformers code. For detailed instructions, refer to the guidelines in [`train/readme.md`](./train/readme.md) before execution.
166
-
167
-
168
  ## Citation
169
 
170
  If you find our work helpful, please consider citing:
 
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
4
 
5
  # TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos</h1>
6
 
 
8
  🔗 <a href="https://timechat-online.github.io/" target="_blank">Project Page</a> · 📖 <a href="https://arxiv.org/abs/2504.17343" target="_blank">Paper</a> · ⭐ <a href="https://github.com/yaolinli/TimeChat-Online" target="_blank">GitHub</a> · 📊 <a href="https://huggingface.co/datasets/yaolily/TimeChat-Online-139K" target="_blank">Dataset</a> · 🤗 <a href="https://huggingface.co/wyccccc/TimeChatOnline-7B" target="_blank">Checkpoints</a>
9
  </p>
10
 
 
 
 
 
 
 
 
 
11
 
12
  ## Introduction
13
 
 
 
14
  **TimeChat-Online** is a novel online VideoLLM designed for efficient streaming video understanding. Its core innovation, the **Differential Token Drop (DTD)** module, tackles visual redundancy by selectively preserving only meaningful temporal changes while eliminating static content between frames. Our experiments show that over 80% of streaming video content is naturally redundant without requiring user-query guidance.
15
 
16
 
 
66
  from transformers import AutoProcessor
67
  from qwen_vl_utils import process_vision_info
68
  import time
69
+ #pay attention to this line, not import from transformers, import from our GitHub repo's eval folder qwen2_5_vl
70
  from eval.qwen2_5_vl import Qwen2_5_VLForConditionalGeneration
71
 
72
  curr_time = datetime.now().strftime("%Y%m%d_%H%M%S")
 
135
  print(output_text)
136
  ```
137
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
  ## Citation
139
 
140
  If you find our work helpful, please consider citing: