prithivMLmods
/

TimeLens-8B-GGUF

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

prithivMLmods commited on Dec 23, 2025

Commit

2ae2a27

·

verified ·

1 Parent(s): c8828d0

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -2,4 +2,8 @@
 license: apache-2.0
 base_model:
 - TencentARC/TimeLens-8B
----

 license: apache-2.0
 base_model:
 - TencentARC/TimeLens-8B
+---
+# **TimeLens-8B**
+TimeLens-8B from TencentARC is an 8B-parameter multimodal vision-language model fine-tuned from Qwen3-VL-8B-Instruct using a novel RLVR (reinforcement learning with verifiable rewards) recipe on the high-quality TimeLens-100K VTG dataset, achieving state-of-the-art video temporal grounding performance among open-source models with 72.0% R1@0.3 (Charades-TimeLens), 64.5% R1@0.3 (ActivityNet-TimeLens), and 75.6% R1@0.3 (QVHighlights-TimeLens), significantly outperforming baselines like Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B. Designed for precise localization of visual events described by natural language queries, it outputs timestamped segments in the format "The event happens in <start time> - <end time> seconds" using low FPS=2 sampling (min_pixels=642828, total_pixels=143362828) for efficient video processing via Transformers with Flash-Attention-2 support. Released with code, project page, and TimeLens-Bench evaluation suite, it excels on Charades-TimeLens, ActivityNet-TimeLens, and QVHighlights-TimeLens leaderboards for research in video understanding, temporal reasoning, and event detection.