wyccccc
/

TimeChatOnline-7B

Safetensors

Model card Files Files and versions

xet

Community

YuchiWang commited on May 7, 2025

Commit

4aa3f6f

1 Parent(s): 12053df

wyc-readme

Browse files

Files changed (1) hide show

README.md +1 -31

README.md CHANGED Viewed

@@ -1,11 +1,6 @@
 ---
 license: apache-2.0
 ---
-# TimeChatOnline-7B
-This is the official implementation of TimeChatOnline-7B from the paper *"TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos"*.
-**Paper**: [arXiv:2504.17343](https://arxiv.org/pdf/2504.17343)
-**Project page**: [https://timechat-online.github.io/](https://timechat-online.github.io/)
 # TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos</h1>
@@ -13,19 +8,9 @@ This is the official implementation of TimeChatOnline-7B from the paper *"TimeCh
 🔗 <a href="https://timechat-online.github.io/" target="_blank">Project Page</a> · 📖 <a href="https://arxiv.org/abs/2504.17343" target="_blank">Paper</a> · ⭐ <a href="https://github.com/yaolinli/TimeChat-Online" target="_blank">GitHub</a> · 📊 <a href="https://huggingface.co/datasets/yaolily/TimeChat-Online-139K" target="_blank">Dataset</a> · 🤗 <a href="https://huggingface.co/wyccccc/TimeChatOnline-7B" target="_blank">Checkpoints</a>
 </p>
-📰 **News**
-- **[2025-05-01]** Our paper and project page are now available.
-🚀 **Coming Soon**
-- [ ] Model checkpoints and inference code
-- [ ] Full training code, scripts, and benchmark evaluation tools
 ## Introduction
 **TimeChat-Online** is a novel online VideoLLM designed for efficient streaming video understanding. Its core innovation, the **Differential Token Drop (DTD)** module, tackles visual redundancy by selectively preserving only meaningful temporal changes while eliminating static content between frames. Our experiments show that over 80% of streaming video content is naturally redundant without requiring user-query guidance.
@@ -81,7 +66,7 @@ pip install qwen-vl-utils[decord]==0.0.8
 from transformers import AutoProcessor
 from qwen_vl_utils import process_vision_info
 import time
-#pay attention to this line, not import from transformers
 from eval.qwen2_5_vl import Qwen2_5_VLForConditionalGeneration
 curr_time = datetime.now().strftime("%Y%m%d_%H%M%S")
@@ -150,21 +135,6 @@ output_text = processor.batch_decode(
 print(output_text)
 ```
-## Dataset: TimeChat-Online-139K
-For flexible real-time interaction, we introduce a comprehensive streaming video dataset with backward-tracing, real-time visual perception, and future-responding scenarios.
-- **11,043** visually informative videos (average duration: 11.1 minutes)
-- **139K** question-answer pairs covering backward tracing, real-time visual perception, and forward active responding
-- Average of **87.8** scene-oriented key frames per video (~7.14 seconds between consecutive frames)
-[todo] We will release the video frames at 1 fps and the question-answer pairs soon.
-## Training
-We utilize the ms-swift framework for model training. Please note that the training script requires modifications to both ms-swift and transformers code. For detailed instructions, refer to the guidelines in [`train/readme.md`](./train/readme.md) before execution.
 ## Citation
 If you find our work helpful, please consider citing:

 ---
 license: apache-2.0
 ---
 # TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos</h1>
 🔗 <a href="https://timechat-online.github.io/" target="_blank">Project Page</a> · 📖 <a href="https://arxiv.org/abs/2504.17343" target="_blank">Paper</a> · ⭐ <a href="https://github.com/yaolinli/TimeChat-Online" target="_blank">GitHub</a> · 📊 <a href="https://huggingface.co/datasets/yaolily/TimeChat-Online-139K" target="_blank">Dataset</a> · 🤗 <a href="https://huggingface.co/wyccccc/TimeChatOnline-7B" target="_blank">Checkpoints</a>
 </p>
 ## Introduction
 **TimeChat-Online** is a novel online VideoLLM designed for efficient streaming video understanding. Its core innovation, the **Differential Token Drop (DTD)** module, tackles visual redundancy by selectively preserving only meaningful temporal changes while eliminating static content between frames. Our experiments show that over 80% of streaming video content is naturally redundant without requiring user-query guidance.
 from transformers import AutoProcessor
 from qwen_vl_utils import process_vision_info
 import time
+#pay attention to this line, not import from transformers, import from our GitHub repo's eval folder qwen2_5_vl
 from eval.qwen2_5_vl import Qwen2_5_VLForConditionalGeneration
 curr_time = datetime.now().strftime("%Y%m%d_%H%M%S")
 print(output_text)
 ```
 ## Citation
 If you find our work helpful, please consider citing: