Spaces:

subashpoudel
/

Subtitle-Generator

Sleeping

subashpoudel commited on Sep 15, 2025

Commit

8e937fe

verified ·

1 Parent(s): 1281f7f

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -41,31 +41,35 @@ def initialize_agent():
 # --- Prompt Template ---
 subtitle_prompt_tuned = '''
-You are given a video. Your task is to extract the **spoken words** along with the **exact timestamps** of when they are spoken.
 Please follow these instructions strictly:
-1. For **every three consecutive spoken words**, include:
    - A unique **line number**
    - The **exact start time** and **end time** in the format: HH:MM:SS,mmm --> HH:MM:SS,mmm
-   - The **three spoken words** on the next line (exactly three words per block, separated by spaces)
-2. Do **not** include more or fewer than three words per timestamp.
 3. Do not summarize, paraphrase, or skip any spoken content — include **all spoken words verbatim**.
 4. Do **not** include any sound effects or non-verbal cues like [Music], [Laughter], etc.
 5. Your output must be a **raw transcription** — no extra formatting, no explanations, no commentary.
 6. Maintain the exact **chronological order** as spoken in the video.
-***FINAL AND CRITICAL REMINDER***: The **timestamp accuracy is the highest priority**. Focus on getting the **precise start and end time for every group of three words**.
 Example format:
 1
-00:00:01,000 --> 00:00:01,800
-Hello everyone welcome
 2
-00:00:01,810 --> 00:00:02,600
-to this tutorial
 ...

 # --- Prompt Template ---
 subtitle_prompt_tuned = '''
+You are given a video. Your task is to extract the **spoken words** along with the **exact timestamps** of when each word is spoken.
 Please follow these instructions strictly:
+1. For **every single spoken word**, include:
    - A unique **line number**
    - The **exact start time** and **end time** in the format: HH:MM:SS,mmm --> HH:MM:SS,mmm
+   - The **spoken word** on the next line (exactly one word per block)
+2. Do **not** include more or fewer than one word per timestamp.
 3. Do not summarize, paraphrase, or skip any spoken content — include **all spoken words verbatim**.
 4. Do **not** include any sound effects or non-verbal cues like [Music], [Laughter], etc.
 5. Your output must be a **raw transcription** — no extra formatting, no explanations, no commentary.
 6. Maintain the exact **chronological order** as spoken in the video.
+***FINAL AND CRITICAL REMINDER***: The **timestamp accuracy is the highest priority**. Focus on getting the **precise start and end time for each word**.
 Example format:
 1
+00:00:01,000 --> 00:00:01,300
+Hello
 2
+00:00:01,310 --> 00:00:01,600
+everyone
+3
+00:00:01,610 --> 00:00:01,900
+welcome
 ...