subashpoudel commited on
Commit
8e937fe
·
verified ·
1 Parent(s): 1281f7f

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +13 -9
app.py CHANGED
@@ -41,31 +41,35 @@ def initialize_agent():
41
 
42
  # --- Prompt Template ---
43
  subtitle_prompt_tuned = '''
44
- You are given a video. Your task is to extract the **spoken words** along with the **exact timestamps** of when they are spoken.
45
 
46
  Please follow these instructions strictly:
47
 
48
- 1. For **every three consecutive spoken words**, include:
49
  - A unique **line number**
50
  - The **exact start time** and **end time** in the format: HH:MM:SS,mmm --> HH:MM:SS,mmm
51
- - The **three spoken words** on the next line (exactly three words per block, separated by spaces)
52
- 2. Do **not** include more or fewer than three words per timestamp.
53
  3. Do not summarize, paraphrase, or skip any spoken content — include **all spoken words verbatim**.
54
  4. Do **not** include any sound effects or non-verbal cues like [Music], [Laughter], etc.
55
  5. Your output must be a **raw transcription** — no extra formatting, no explanations, no commentary.
56
  6. Maintain the exact **chronological order** as spoken in the video.
57
 
58
- ***FINAL AND CRITICAL REMINDER***: The **timestamp accuracy is the highest priority**. Focus on getting the **precise start and end time for every group of three words**.
59
 
60
  Example format:
61
 
62
  1
63
- 00:00:01,000 --> 00:00:01,800
64
- Hello everyone welcome
65
 
66
  2
67
- 00:00:01,810 --> 00:00:02,600
68
- to this tutorial
 
 
 
 
69
 
70
  ...
71
 
 
41
 
42
  # --- Prompt Template ---
43
  subtitle_prompt_tuned = '''
44
+ You are given a video. Your task is to extract the **spoken words** along with the **exact timestamps** of when each word is spoken.
45
 
46
  Please follow these instructions strictly:
47
 
48
+ 1. For **every single spoken word**, include:
49
  - A unique **line number**
50
  - The **exact start time** and **end time** in the format: HH:MM:SS,mmm --> HH:MM:SS,mmm
51
+ - The **spoken word** on the next line (exactly one word per block)
52
+ 2. Do **not** include more or fewer than one word per timestamp.
53
  3. Do not summarize, paraphrase, or skip any spoken content — include **all spoken words verbatim**.
54
  4. Do **not** include any sound effects or non-verbal cues like [Music], [Laughter], etc.
55
  5. Your output must be a **raw transcription** — no extra formatting, no explanations, no commentary.
56
  6. Maintain the exact **chronological order** as spoken in the video.
57
 
58
+ ***FINAL AND CRITICAL REMINDER***: The **timestamp accuracy is the highest priority**. Focus on getting the **precise start and end time for each word**.
59
 
60
  Example format:
61
 
62
  1
63
+ 00:00:01,000 --> 00:00:01,300
64
+ Hello
65
 
66
  2
67
+ 00:00:01,310 --> 00:00:01,600
68
+ everyone
69
+
70
+ 3
71
+ 00:00:01,610 --> 00:00:01,900
72
+ welcome
73
 
74
  ...
75