Update README.md
Browse files
README.md
CHANGED
|
@@ -11,6 +11,11 @@ tags:
|
|
| 11 |
- qwen3-vl
|
| 12 |
library_name: transformers
|
| 13 |
pipeline_tag: video-text-to-text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# TimeLens-8B
|
|
@@ -22,7 +27,7 @@ pipeline_tag: video-text-to-text
|
|
| 22 |
|
| 23 |
**TimeLens-8B** is an MLLM with state-of-the-art video temporal grounding performance among open-source models, finetuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct). It is trained with carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
|
| 24 |
|
| 25 |
-
## Performance
|
| 26 |
|
| 27 |
TimeLens-8B achieves state-of-the-art video temporal grounding performance among open-source models:
|
| 28 |
|
|
@@ -122,7 +127,8 @@ Install the following packages:
|
|
| 122 |
```bash
|
| 123 |
pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
|
| 124 |
pip install qwen-vl-utils[decord]==0.0.14
|
| 125 |
-
|
|
|
|
| 126 |
```
|
| 127 |
|
| 128 |
Using 🤗Transformers for Inference:
|
|
@@ -210,4 +216,4 @@ If you find our work helpful for your research and applications, please cite our
|
|
| 210 |
|
| 211 |
```bibtex
|
| 212 |
TODO
|
| 213 |
-
```
|
|
|
|
| 11 |
- qwen3-vl
|
| 12 |
library_name: transformers
|
| 13 |
pipeline_tag: video-text-to-text
|
| 14 |
+
datasets:
|
| 15 |
+
- TencentARC/TimeLens-100K
|
| 16 |
+
- TencentARC/TimeLens-Bench
|
| 17 |
+
base_model:
|
| 18 |
+
- Qwen/Qwen3-VL-8B-Instruct
|
| 19 |
---
|
| 20 |
|
| 21 |
# TimeLens-8B
|
|
|
|
| 27 |
|
| 28 |
**TimeLens-8B** is an MLLM with state-of-the-art video temporal grounding performance among open-source models, finetuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct). It is trained with carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
|
| 29 |
|
| 30 |
+
## 📊 Performance
|
| 31 |
|
| 32 |
TimeLens-8B achieves state-of-the-art video temporal grounding performance among open-source models:
|
| 33 |
|
|
|
|
| 127 |
```bash
|
| 128 |
pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
|
| 129 |
pip install qwen-vl-utils[decord]==0.0.14
|
| 130 |
+
# use Flash-Attention 2 to speed up generation
|
| 131 |
+
pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
|
| 132 |
```
|
| 133 |
|
| 134 |
Using 🤗Transformers for Inference:
|
|
|
|
| 216 |
|
| 217 |
```bibtex
|
| 218 |
TODO
|
| 219 |
+
```
|