JungleGym commited on
Commit
b01958c
·
verified ·
1 Parent(s): 3c8f7cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -11,6 +11,11 @@ tags:
11
  - qwen2-vl
12
  library_name: transformers
13
  pipeline_tag: video-text-to-text
 
 
 
 
 
14
  ---
15
 
16
  # TimeLens-7B
@@ -22,7 +27,7 @@ pipeline_tag: video-text-to-text
22
 
23
  **TimeLens-7B** is an MLLM with strong video temporal grounding (VTG) capability, fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). It is trained with a carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe and improved timestamp encoding strategy proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
24
 
25
- ## Performance
26
 
27
  TimeLens-7B achieves strong video temporal grounding performance:
28
 
@@ -122,7 +127,8 @@ Install the following packages:
122
  ```bash
123
  pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
124
  pip install qwen-vl-utils[decord]==0.0.14
125
- pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir # Flash-Attention 2 to speed up generation
 
126
  ```
127
 
128
  Using 🤗Transformers for Inference:
@@ -201,4 +207,4 @@ If you find our work helpful for your research and applications, please cite our
201
 
202
  ```bibtex
203
  TODO
204
- ```
 
11
  - qwen2-vl
12
  library_name: transformers
13
  pipeline_tag: video-text-to-text
14
+ datasets:
15
+ - TencentARC/TimeLens-100K
16
+ - TencentARC/TimeLens-Bench
17
+ base_model:
18
+ - Qwen/Qwen2.5-VL-7B-Instruct
19
  ---
20
 
21
  # TimeLens-7B
 
27
 
28
  **TimeLens-7B** is an MLLM with strong video temporal grounding (VTG) capability, fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). It is trained with a carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe and improved timestamp encoding strategy proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
29
 
30
+ ## 📊 Performance
31
 
32
  TimeLens-7B achieves strong video temporal grounding performance:
33
 
 
127
  ```bash
128
  pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
129
  pip install qwen-vl-utils[decord]==0.0.14
130
+ # use Flash-Attention 2 to speed up generation
131
+ pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
132
  ```
133
 
134
  Using 🤗Transformers for Inference:
 
207
 
208
  ```bibtex
209
  TODO
210
+ ```