Spaces:

vidhi0405
/

VideoToText

Sleeping

avinashHuggingface108 commited on Sep 18, 2025

Commit

ea0b5ea

1 Parent(s): 475892a

Fix SmolVLM2 scoring: Force numerical scores in responses

- Updated prompt to require 'Score: X/10' format at start of response
- Added regex pattern to extract the new score format
- Fixes issue where all segments got default 6.0 score
- Now produces varied scores (2.0-9.0) based on actual content analysis
- Tested with ooty_resort.mp4: excellent results with proper scoring

Files changed (1) hide show

audio_enhanced_highlights_final.py +6 -3

audio_enhanced_highlights_final.py CHANGED Viewed

@@ -141,8 +141,10 @@ class AudioVisualAnalyzer:
             try:
                 def generate_with_timeout():
                     prompt = ("Analyze this video frame for interesting, engaging, or highlight-worthy content. "
-                             "Rate the excitement/interest level from 1-10 and explain what makes it noteworthy. "
-                             "Focus on action, emotion, important moments, or visually striking elements.")
                     return self.vlm_handler.generate_response(frame_path, prompt)
                 # Run with timeout protection
@@ -205,8 +207,9 @@ class AudioVisualAnalyzer:
         """Extract numeric score from analysis text"""
         import re
-        # Look for patterns like "8/10", "score: 7", "rate: 6.5", etc.
         patterns = [
             r'(\d+(?:\.\d+)?)\s*/\s*10',  # "8/10" or "7.5/10"
             r'(?:score|rating|rate)(?:\s*[:=]\s*)(\d+(?:\.\d+)?)',  # "score: 8" or "rating=7.5"
             r'(\d+(?:\.\d+)?)\s*(?:out of|/)\s*10',  # "8 out of 10"

             try:
                 def generate_with_timeout():
                     prompt = ("Analyze this video frame for interesting, engaging, or highlight-worthy content. "
+                             "IMPORTANT: Start your response with 'Score: X/10' where X is a number from 1-10. "
+                             "Then explain what makes it noteworthy. Focus on action, emotion, important moments, or visually striking elements. "
+                             "Rate based on: Action/movement (high scores), People talking/interacting (medium-high), "
+                             "Static scenes (low-medium), Boring/empty scenes (low scores).")
                     return self.vlm_handler.generate_response(frame_path, prompt)
                 # Run with timeout protection
         """Extract numeric score from analysis text"""
         import re
+        # Look for patterns like "Score: 8/10", "8/10", "score: 7", etc.
         patterns = [
+            r'score:\s*(\d+(?:\.\d+)?)\s*/\s*10',  # "Score: 8/10" (our new format)
             r'(\d+(?:\.\d+)?)\s*/\s*10',  # "8/10" or "7.5/10"
             r'(?:score|rating|rate)(?:\s*[:=]\s*)(\d+(?:\.\d+)?)',  # "score: 8" or "rating=7.5"
             r'(\d+(?:\.\d+)?)\s*(?:out of|/)\s*10',  # "8 out of 10"