Update README.md
Browse files
README.md
CHANGED
|
@@ -11,6 +11,11 @@ tags:
|
|
| 11 |
- qwen2-vl
|
| 12 |
library_name: transformers
|
| 13 |
pipeline_tag: video-text-to-text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# TimeLens-7B
|
|
@@ -22,7 +27,7 @@ pipeline_tag: video-text-to-text
|
|
| 22 |
|
| 23 |
**TimeLens-7B** is an MLLM with strong video temporal grounding (VTG) capability, fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). It is trained with a carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe and improved timestamp encoding strategy proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
|
| 24 |
|
| 25 |
-
## Performance
|
| 26 |
|
| 27 |
TimeLens-7B achieves strong video temporal grounding performance:
|
| 28 |
|
|
@@ -122,7 +127,8 @@ Install the following packages:
|
|
| 122 |
```bash
|
| 123 |
pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
|
| 124 |
pip install qwen-vl-utils[decord]==0.0.14
|
| 125 |
-
|
|
|
|
| 126 |
```
|
| 127 |
|
| 128 |
Using 🤗Transformers for Inference:
|
|
@@ -201,4 +207,4 @@ If you find our work helpful for your research and applications, please cite our
|
|
| 201 |
|
| 202 |
```bibtex
|
| 203 |
TODO
|
| 204 |
-
```
|
|
|
|
| 11 |
- qwen2-vl
|
| 12 |
library_name: transformers
|
| 13 |
pipeline_tag: video-text-to-text
|
| 14 |
+
datasets:
|
| 15 |
+
- TencentARC/TimeLens-100K
|
| 16 |
+
- TencentARC/TimeLens-Bench
|
| 17 |
+
base_model:
|
| 18 |
+
- Qwen/Qwen2.5-VL-7B-Instruct
|
| 19 |
---
|
| 20 |
|
| 21 |
# TimeLens-7B
|
|
|
|
| 27 |
|
| 28 |
**TimeLens-7B** is an MLLM with strong video temporal grounding (VTG) capability, fine-tuned from [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). It is trained with a carefully crafted RLVR (reinforcement learning with verifiable rewards) recipe and improved timestamp encoding strategy proposed in our [paper](TODO), utilizing our high-quality VTG training dataset [TimeLens-100K](https://huggingface.co/datasets/TencentARC/TimeLens-100K).
|
| 29 |
|
| 30 |
+
## 📊 Performance
|
| 31 |
|
| 32 |
TimeLens-7B achieves strong video temporal grounding performance:
|
| 33 |
|
|
|
|
| 127 |
```bash
|
| 128 |
pip install transformers==4.57.1 accelerate==1.6.0 torch==2.6.0 torchvision==0.21.0
|
| 129 |
pip install qwen-vl-utils[decord]==0.0.14
|
| 130 |
+
# use Flash-Attention 2 to speed up generation
|
| 131 |
+
pip install flash-attn==2.7.4.post1 --no-build-isolation --no-cache-dir
|
| 132 |
```
|
| 133 |
|
| 134 |
Using 🤗Transformers for Inference:
|
|
|
|
| 207 |
|
| 208 |
```bibtex
|
| 209 |
TODO
|
| 210 |
+
```
|