Intel
/

tvp-base

@@ -29,6 +29,7 @@ You can use the raw model for temporal video grounding.
 Here is how to use this model to get the logits of a given video and text in PyTorch:
 ```python
 import av
 import numpy as np
 import torch
 from huggingface_hub import hf_hub_download
@@ -118,7 +119,21 @@ data = processor(
 output = model(**data)
-print(output)
 ```
 ### Limitations and bias

 Here is how to use this model to get the logits of a given video and text in PyTorch:
 ```python
 import av
+import cv2
 import numpy as np
 import torch
 from huggingface_hub import hf_hub_download
 output = model(**data)
+print(f"The model output is {output}")
+def get_video_duration(filename):
+    cap = cv2.VideoCapture(filename)
+    if cap.isOpened():
+        rate = cap.get(5)
+        frame_num =cap.get(7)
+        duration = frame_num/rate
+        return duration
+    return -1
+duration = get_video_duration(file)
+timestamp = output['logits'].tolist()
+start, end = round(timestamp[0][0]*duration, 1), round(timestamp[0][1]*duration, 1)
+print(f"The time slot of the video corresponding to the text is from {start}s to {end}s")
 ```
 ### Limitations and bias