Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
|
@@ -41,31 +41,35 @@ def initialize_agent():
|
|
| 41 |
|
| 42 |
# --- Prompt Template ---
|
| 43 |
subtitle_prompt_tuned = '''
|
| 44 |
-
You are given a video. Your task is to extract the **spoken words** along with the **exact timestamps** of when
|
| 45 |
|
| 46 |
Please follow these instructions strictly:
|
| 47 |
|
| 48 |
-
1. For **every
|
| 49 |
- A unique **line number**
|
| 50 |
- The **exact start time** and **end time** in the format: HH:MM:SS,mmm --> HH:MM:SS,mmm
|
| 51 |
-
- The **
|
| 52 |
-
2. Do **not** include more or fewer than
|
| 53 |
3. Do not summarize, paraphrase, or skip any spoken content — include **all spoken words verbatim**.
|
| 54 |
4. Do **not** include any sound effects or non-verbal cues like [Music], [Laughter], etc.
|
| 55 |
5. Your output must be a **raw transcription** — no extra formatting, no explanations, no commentary.
|
| 56 |
6. Maintain the exact **chronological order** as spoken in the video.
|
| 57 |
|
| 58 |
-
***FINAL AND CRITICAL REMINDER***: The **timestamp accuracy is the highest priority**. Focus on getting the **precise start and end time for
|
| 59 |
|
| 60 |
Example format:
|
| 61 |
|
| 62 |
1
|
| 63 |
-
00:00:01,000 --> 00:00:01,
|
| 64 |
-
Hello
|
| 65 |
|
| 66 |
2
|
| 67 |
-
00:00:01,
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
...
|
| 71 |
|
|
|
|
| 41 |
|
| 42 |
# --- Prompt Template ---
|
| 43 |
subtitle_prompt_tuned = '''
|
| 44 |
+
You are given a video. Your task is to extract the **spoken words** along with the **exact timestamps** of when each word is spoken.
|
| 45 |
|
| 46 |
Please follow these instructions strictly:
|
| 47 |
|
| 48 |
+
1. For **every single spoken word**, include:
|
| 49 |
- A unique **line number**
|
| 50 |
- The **exact start time** and **end time** in the format: HH:MM:SS,mmm --> HH:MM:SS,mmm
|
| 51 |
+
- The **spoken word** on the next line (exactly one word per block)
|
| 52 |
+
2. Do **not** include more or fewer than one word per timestamp.
|
| 53 |
3. Do not summarize, paraphrase, or skip any spoken content — include **all spoken words verbatim**.
|
| 54 |
4. Do **not** include any sound effects or non-verbal cues like [Music], [Laughter], etc.
|
| 55 |
5. Your output must be a **raw transcription** — no extra formatting, no explanations, no commentary.
|
| 56 |
6. Maintain the exact **chronological order** as spoken in the video.
|
| 57 |
|
| 58 |
+
***FINAL AND CRITICAL REMINDER***: The **timestamp accuracy is the highest priority**. Focus on getting the **precise start and end time for each word**.
|
| 59 |
|
| 60 |
Example format:
|
| 61 |
|
| 62 |
1
|
| 63 |
+
00:00:01,000 --> 00:00:01,300
|
| 64 |
+
Hello
|
| 65 |
|
| 66 |
2
|
| 67 |
+
00:00:01,310 --> 00:00:01,600
|
| 68 |
+
everyone
|
| 69 |
+
|
| 70 |
+
3
|
| 71 |
+
00:00:01,610 --> 00:00:01,900
|
| 72 |
+
welcome
|
| 73 |
|
| 74 |
...
|
| 75 |
|