TencentARC
/

TimeLens-8B

Video-Text-to-Text

video-grounding

temporal-grounding

video-understanding

Model card Files Files and versions

JungleGym commited on 10 days ago

Commit

bad63a0

·

verified ·

1 Parent(s): 4cb89bb

Update README.md

Files changed (1) hide show

README.md +9 -3

README.md CHANGED Viewed

@@ -11,6 +11,11 @@ tags:
 - qwen3-vl
 library_name: transformers
 pipeline_tag: video-text-to-text
 ---
 # TimeLens-8B
@@ -22,7 +27,7 @@ pipeline_tag: video-text-to-text
 **TimeLens-8B** is an MLLM with state-of-the-art video temporal grounding performance among open-source models, finetuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct). It is trained with carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
-## Performance
 TimeLens-8B achieves state-of-the-art video temporal grounding performance among open-source models:
@@ -122,7 +127,8 @@ Install the following packages:
 ```bash
 pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
 pip install qwen-vl-utils[decord]==0.0.14
-pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir  # Flash-Attention 2 to speed up generation
 ```
 Using 🤗Transformers for Inference:
@@ -210,4 +216,4 @@ If you find our work helpful for your research and applications, please cite our
 ```bibtex
 TODO
-```

 - qwen3-vl
 library_name: transformers
 pipeline_tag: video-text-to-text
+datasets:
+- TencentARC/TimeLens-100K
+- TencentARC/TimeLens-Bench
+base_model:
+- Qwen/Qwen3-VL-8B-Instruct
 ---
 # TimeLens-8B
 **TimeLens-8B** is an MLLM with state-of-the-art video temporal grounding performance among open-source models, finetuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct). It is trained with carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
+## 📊 Performance
 TimeLens-8B achieves state-of-the-art video temporal grounding performance among open-source models:
 ```bash
 pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
 pip install qwen-vl-utils[decord]==0.0.14
+# use Flash-Attention 2 to speed up generation
+pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
 ```
 Using 🤗Transformers for Inference:
 ```bibtex
 TODO
+```