Add library_name and usage example

This PR improves the model card by:
- Adding `library_name: transformers` to ensure the "how to use" widget appears with an automated code snippet.
- Including a "Quick Inference Code" section from the GitHub README as a "Usage" section in the model card, making it easier for users to get started with the model.

Files changed (1) hide show

README.md +70 -4

README.md CHANGED Viewed

@@ -1,18 +1,84 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
 pipeline_tag: video-text-to-text
 tags:
 - multimodal
 ---
 # TimeSearch-R-7B
 - **Code:** https://github.com/Time-Search/TimeSearch-R
 - **Paper:** [TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning](https://arxiv.org/abs/2511.05489)
 ## Citation
 If you find our work helpful, feel free to give us a cite.
@@ -24,4 +90,4 @@ If you find our work helpful, feel free to give us a cite.
   journal={arXiv preprint arXiv:2511.05489},
   year={2025}
 }
-```

 ---
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
+language:
+- en
+license: apache-2.0
 pipeline_tag: video-text-to-text
 tags:
 - multimodal
+library_name: transformers
 ---
 # TimeSearch-R-7B
 - **Code:** https://github.com/Time-Search/TimeSearch-R
 - **Paper:** [TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning](https://arxiv.org/abs/2511.05489)
+## Usage
+We provide the simple generation process for using our model. For more details, you could refer to [Github](https://github.com/Time-Search/TimeSearch-R).
+```python
+import numpy as np
+import torch
+from longvu.builder import load_pretrained_model
+from longvu.constants import (
+    DEFAULT_IMAGE_TOKEN,
+    IMAGE_TOKEN_INDEX,
+)
+from longvu.conversation import conv_templates, SeparatorStyle
+from longvu.mm_datautils import (
+    KeywordsStoppingCriteria,
+    process_images,
+    tokenizer_image_token,
+)
+from decord import cpu, VideoReader
+tokenizer, model, image_processor, context_len = load_pretrained_model(
+    "./checkpoints/longvu_qwen", None, "cambrian_qwen",
+)
+model.eval()
+video_path = "./examples/video1.mp4"
+qs = "Describe this video in detail"
+vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
+fps = float(vr.get_avg_fps())
+frame_indices = np.array([i for i in range(0, len(vr), round(fps),)])
+video = []
+for frame_index in frame_indices:
+    img = vr[frame_index].asnumpy()
+    video.append(img)
+video = np.stack(video)
+image_sizes = [video[0].shape[:2]]
+video = process_images(video, image_processor, model.config)
+video = [item.unsqueeze(0) for item in video]
+qs = DEFAULT_IMAGE_TOKEN + "
+" + qs
+conv = conv_templates["qwen"].copy()
+conv.append_message(conv.roles[0], qs)
+conv.append_message(conv.roles[1], None)
+prompt = conv.get_prompt()
+input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(model.device)
+stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
+keywords = [stop_str]
+stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)
+with torch.inference_mode():
+    output_ids = model.generate(
+        input_ids,
+        images=video,
+        image_sizes=image_sizes,
+        do_sample=False,
+        temperature=0.2,
+        max_new_tokens=128,
+        use_cache=True,
+        stopping_criteria=[stopping_criteria],
+    )
+pred = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
+```
 ## Citation
 If you find our work helpful, feel free to give us a cite.
   journal={arXiv preprint arXiv:2511.05489},
   year={2025}
 }
+```