LPX55 commited on
Commit
b8edc35
·
verified ·
1 Parent(s): 1838e8c

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. .gitignore +2 -0
  2. README.md +72 -6
  3. app.py +484 -0
  4. main.py +348 -0
  5. requirements.txt +39 -0
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ .env
2
+ .venv
README.md CHANGED
@@ -1,13 +1,79 @@
1
  ---
2
- title: Deepgram Srt Generation
3
- emoji: 👁
4
  colorFrom: indigo
5
- colorTo: yellow
6
  sdk: gradio
7
- sdk_version: 6.18.0
8
- python_version: '3.13'
9
  app_file: app.py
10
  pinned: false
 
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Deepgram SRT Generator
3
+ emoji: 🎙️
4
  colorFrom: indigo
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.44.0
 
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # Deepgram SRT Generation
14
+
15
+ This project extracts audio from a video file, sends it to Deepgram for transcription, and generates an SRT file with captions.
16
+
17
+ ## Setup
18
+
19
+ 1. Clone the repository:
20
+ ```sh
21
+ git clone https://github.com/bradcypert/deepgram-srt-generation.git
22
+ cd deepgram-srt-generation
23
+ ```
24
+
25
+ 2. Install the required dependencies:
26
+ ```sh
27
+ pip install -r requirements.txt
28
+ ```
29
+
30
+ 3. Set your Deepgram API key as an environment variable:
31
+ ```sh
32
+ export DEEPGRAM_API_KEY=your_deepgram_api_key
33
+ ```
34
+
35
+ ## Usage
36
+
37
+ Run the script with the path to your video or audio file as an argument:
38
+ ```sh
39
+ python main.py path/to/your/file.mp4
40
+ ```
41
+
42
+ This will generate an SRT file with captions in the same directory as your media file.
43
+
44
+ ### CLI Options
45
+
46
+ You can customize the transcription with the following flags:
47
+ * `-m` / `--model`: Deepgram model to use (default: `nova-3`).
48
+ * `-l` / `--language`: Set the language tag (e.g. `ko`, `en`, `es`).
49
+ * `--no-diarize`: Disable speaker diarization.
50
+ * `-t` / `--translate-to`: Translate the generated subtitles using DeepL.
51
+
52
+ Example transcribing Korean audio and translating it to English:
53
+ ```sh
54
+ export DEEPL_API_KEY=your_deepl_api_key
55
+ python main.py path/to/your/korean_audio.mp3 -l ko -t en
56
+ ```
57
+
58
+ ### Translate-Only Mode
59
+
60
+ If you already have an SRT file and want to translate it to another language without transcribing again, simply pass the `.srt` file and target language:
61
+ ```sh
62
+ python main.py path/to/your/subtitles.srt -t ja
63
+ ```
64
+
65
+ ## Dependencies
66
+
67
+ - `httpx`
68
+ - `moviepy`
69
+ - `deepgram`
70
+ - `deepgram_captions`
71
+ - `deepl`
72
+ - `python-dotenv`
73
+
74
+ Make sure to install these dependencies using `pip` if they are not already installed.
75
+
76
+ ## License
77
+
78
+ This project is licensed under the MIT License.
79
+
app.py ADDED
@@ -0,0 +1,484 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import tempfile
3
+ import re
4
+ import httpx
5
+ from datetime import datetime
6
+ import gradio as gr
7
+ from dotenv import load_dotenv
8
+
9
+ # Load local environment variables
10
+ load_dotenv()
11
+
12
+ # Import core logic from main.py
13
+ from main import cleanup_srt_punctuation, translate_srt_content
14
+ from deepgram import DeepgramClient, PrerecordedOptions
15
+ from deepgram_captions import DeepgramConverter, srt
16
+ from moviepy.video.io.VideoFileClip import VideoFileClip
17
+
18
+ # CSS styling for a premium glassmorphism dark-mode look
19
+ custom_css = """
20
+ @import url('https://fonts.googleapis.com/css2?family=Outfit:wght@400;600;800&family=Inter:wght@400;500;600&display=swap');
21
+
22
+ body, .gradio-container {
23
+ font-family: 'Inter', sans-serif !important;
24
+ background: #0b0f19 !important;
25
+ }
26
+
27
+ /* Main card styling */
28
+ .glass-container {
29
+ background: rgba(17, 24, 39, 0.7) !important;
30
+ backdrop-filter: blur(16px);
31
+ -webkit-backdrop-filter: blur(16px);
32
+ border: 1px solid rgba(255, 255, 255, 0.08) !important;
33
+ border-radius: 20px !important;
34
+ padding: 30px !important;
35
+ box-shadow: 0 10px 40px 0 rgba(0, 0, 0, 0.5) !important;
36
+ }
37
+
38
+ /* Glowing text title */
39
+ .glow-title {
40
+ background: linear-gradient(135deg, #a5b4fc 0%, #c084fc 50%, #818cf8 100%);
41
+ -webkit-background-clip: text;
42
+ -webkit-text-fill-color: transparent;
43
+ font-weight: 800;
44
+ text-align: center;
45
+ font-size: 2.8rem;
46
+ margin-bottom: 8px;
47
+ font-family: 'Outfit', sans-serif;
48
+ letter-spacing: -0.5px;
49
+ }
50
+
51
+ .sub-title {
52
+ color: #9ca3af;
53
+ text-align: center;
54
+ font-size: 1.15rem;
55
+ margin-bottom: 30px;
56
+ font-family: 'Inter', sans-serif;
57
+ }
58
+
59
+ /* Styled primary action button */
60
+ .action-btn {
61
+ background: linear-gradient(90deg, #6366f1 0%, #8b5cf6 100%) !important;
62
+ color: white !important;
63
+ border: none !important;
64
+ font-weight: 600 !important;
65
+ border-radius: 12px !important;
66
+ padding: 12px 24px !important;
67
+ transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1) !important;
68
+ box-shadow: 0 4px 20px rgba(99, 102, 241, 0.3) !important;
69
+ }
70
+
71
+ .action-btn:hover {
72
+ transform: translateY(-2px);
73
+ box-shadow: 0 6px 25px rgba(99, 102, 241, 0.5) !important;
74
+ opacity: 0.95;
75
+ }
76
+
77
+ .action-btn:active {
78
+ transform: translateY(1px);
79
+ }
80
+
81
+ /* Inputs styling */
82
+ input, textarea, select {
83
+ background: #1f2937 !important;
84
+ border: 1px solid #374151 !important;
85
+ border-radius: 8px !important;
86
+ color: #f3f4f6 !important;
87
+ }
88
+
89
+ input:focus, textarea:focus, select:focus {
90
+ border-color: #818cf8 !important;
91
+ }
92
+
93
+ /* Tab styling */
94
+ .tabs {
95
+ border-bottom: 2px solid #1f2937 !important;
96
+ margin-bottom: 20px;
97
+ }
98
+
99
+ .tab-nav button {
100
+ font-family: 'Outfit', sans-serif;
101
+ font-size: 1.05rem !important;
102
+ color: #9ca3af !important;
103
+ padding: 10px 20px !important;
104
+ }
105
+
106
+ .tab-nav button.selected {
107
+ color: #818cf8 !important;
108
+ border-bottom: 2px solid #818cf8 !important;
109
+ }
110
+
111
+ /* Footer styling */
112
+ .footer-text {
113
+ text-align: center;
114
+ color: #4b5563;
115
+ font-size: 0.85rem;
116
+ margin-top: 40px;
117
+ }
118
+ """
119
+
120
+ def extract_audio(video_path):
121
+ """Extract audio track from video file using MoviePy."""
122
+ temp_dir = tempfile.gettempdir()
123
+ audio_path = os.path.join(temp_dir, f"extracted_{os.path.basename(video_path)}.mp3")
124
+
125
+ try:
126
+ with VideoFileClip(video_path) as video_clip:
127
+ audio_clip = video_clip.audio
128
+ # Write audio without verbose logging
129
+ audio_clip.write_audiofile(audio_path, logger=None)
130
+ return audio_path
131
+ except Exception as e:
132
+ raise gr.Error(f"Failed to extract audio from video: {str(e)}")
133
+
134
+ def process_transcribe(
135
+ file_path,
136
+ model,
137
+ language,
138
+ diarize,
139
+ translate_to,
140
+ dg_key_override,
141
+ dl_key_override
142
+ ):
143
+ """Core transcription and translation pipeline for audio/video input."""
144
+ if not file_path:
145
+ raise gr.Error("Please upload a file first.")
146
+
147
+ # Resolve Deepgram API Key
148
+ dg_key = dg_key_override.strip() if dg_key_override else os.getenv("DEEPGRAM_API_KEY")
149
+ if not dg_key:
150
+ raise gr.Error("Deepgram API Key is required. Please provide it in the UI or environment.")
151
+
152
+ # Resolve DeepL API Key (if translation requested)
153
+ dl_key = None
154
+ if translate_to:
155
+ dl_key = dl_key_override.strip() if dl_key_override else (os.getenv("DEEPL_API_KEY") or os.getenv("DEEPL_AUTH_KEY"))
156
+ if not dl_key:
157
+ raise gr.Error("DeepL API Key is required for translation. Please provide it in the UI or environment.")
158
+
159
+ # Check extension to determine if audio extraction is needed
160
+ _, ext = os.path.splitext(file_path.lower())
161
+ is_audio = ext in {'.mp3', '.wav', '.m4a', '.flac', '.ogg', '.aac', '.wma', '.opus', '.webm', '.m4b', '.mp4a', '.aiff', '.aif', '.mp2'}
162
+
163
+ audio_filepath = file_path
164
+ temp_audio_to_cleanup = None
165
+
166
+ if not is_audio:
167
+ gr.Info("Video file detected. Extracting audio track...")
168
+ audio_filepath = extract_audio(file_path)
169
+ temp_audio_to_cleanup = audio_filepath
170
+
171
+ try:
172
+ # Read the file data
173
+ with open(audio_filepath, "rb") as file:
174
+ buffer_data = file.read()
175
+
176
+ payload = {"buffer": buffer_data}
177
+
178
+ # Configure Deepgram options
179
+ options_dict = {
180
+ "model": model,
181
+ "smart_format": True,
182
+ "utterances": True,
183
+ "punctuate": True,
184
+ "diarize": diarize,
185
+ }
186
+ if language:
187
+ if language.lower() in {"auto", "detect"}:
188
+ options_dict["detect_language"] = True
189
+ else:
190
+ options_dict["language"] = language
191
+
192
+ options = PrerecordedOptions(**options_dict)
193
+ deepgram = DeepgramClient(dg_key)
194
+
195
+ gr.Info("Transcribing audio via Deepgram...")
196
+ response = deepgram.listen.rest.v("1").transcribe_file(
197
+ payload, options, timeout=httpx.Timeout(30000.0, connect=10.0)
198
+ )
199
+
200
+ # Process words check
201
+ has_words = False
202
+ try:
203
+ if hasattr(response, 'results') and response.results:
204
+ if response.results.channels and response.results.channels[0].alternatives:
205
+ if response.results.channels[0].alternatives[0].words:
206
+ has_words = True
207
+ except Exception:
208
+ pass
209
+
210
+ if not has_words:
211
+ original_srt = ""
212
+ gr.Warning("No speech detected in the audio file.")
213
+ else:
214
+ transcription = DeepgramConverter(response)
215
+ original_srt = srt(transcription)
216
+ original_srt = cleanup_srt_punctuation(original_srt)
217
+
218
+ # Write original SRT to temp file
219
+ temp_dir = tempfile.gettempdir()
220
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
221
+
222
+ orig_file_path = os.path.join(temp_dir, f"transcription_{timestamp}.srt")
223
+ with open(orig_file_path, "w", encoding="utf-8") as f:
224
+ f.write(original_srt)
225
+
226
+ translated_srt = ""
227
+ trans_file_path = None
228
+
229
+ # Handle translation if requested
230
+ if translate_to and original_srt:
231
+ gr.Info(f"Translating subtitles to {translate_to} using DeepL...")
232
+ target_lang = translate_to.upper()
233
+ if target_lang == "EN":
234
+ target_lang = "EN-US"
235
+ elif target_lang == "PT":
236
+ target_lang = "PT-BR"
237
+
238
+ translated_srt = translate_srt_content(original_srt, dl_key, target_lang)
239
+ translated_srt = cleanup_srt_punctuation(translated_srt)
240
+
241
+ trans_file_path = os.path.join(temp_dir, f"transcription_{timestamp}.{translate_to.lower()}.srt")
242
+ with open(trans_file_path, "w", encoding="utf-8") as f:
243
+ f.write(translated_srt)
244
+
245
+ return original_srt, orig_file_path, translated_srt, trans_file_path
246
+
247
+ except Exception as e:
248
+ raise gr.Error(f"An error occurred: {str(e)}")
249
+ finally:
250
+ # Cleanup temporary extracted audio
251
+ if temp_audio_to_cleanup and os.path.exists(temp_audio_to_cleanup):
252
+ try:
253
+ os.remove(temp_audio_to_cleanup)
254
+ except Exception:
255
+ pass
256
+
257
+ def process_translate_srt(srt_file, translate_to, dl_key_override):
258
+ """Translate an existing SRT file."""
259
+ if not srt_file:
260
+ raise gr.Error("Please upload an SRT file.")
261
+
262
+ dl_key = dl_key_override.strip() if dl_key_override else (os.getenv("DEEPL_API_KEY") or os.getenv("DEEPL_AUTH_KEY"))
263
+ if not dl_key:
264
+ raise gr.Error("DeepL API Key is required. Please provide it in the UI or environment.")
265
+
266
+ try:
267
+ with open(srt_file.name, "r", encoding="utf-8") as f:
268
+ original_content = f.read()
269
+
270
+ target_lang = translate_to.upper()
271
+ if target_lang == "EN":
272
+ target_lang = "EN-US"
273
+ elif target_lang == "PT":
274
+ target_lang = "PT-BR"
275
+
276
+ gr.Info(f"Translating SRT file to {translate_to} using DeepL...")
277
+ translated_content = translate_srt_content(original_content, dl_key, target_lang)
278
+ cleaned_content = cleanup_srt_punctuation(translated_content)
279
+
280
+ # Write to temp file
281
+ temp_dir = tempfile.gettempdir()
282
+ timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
283
+ translated_path = os.path.join(temp_dir, f"translated_{timestamp}.srt")
284
+ with open(translated_path, "w", encoding="utf-8") as f:
285
+ f.write(cleaned_content)
286
+
287
+ return cleaned_content, translated_path
288
+
289
+ except Exception as e:
290
+ raise gr.Error(f"Translation error: {str(e)}")
291
+
292
+
293
+ # ------------------ Build Interface ------------------
294
+
295
+ # Supported languages list
296
+ language_choices = [
297
+ ("Auto Detect", "auto"),
298
+ ("English", "en"),
299
+ ("Korean", "ko"),
300
+ ("Spanish", "es"),
301
+ ("French", "fr"),
302
+ ("German", "de"),
303
+ ("Italian", "it"),
304
+ ("Japanese", "ja"),
305
+ ("Chinese", "zh"),
306
+ ("Portuguese", "pt"),
307
+ ]
308
+
309
+ translation_choices = [
310
+ ("None", ""),
311
+ ("Korean", "ko"),
312
+ ("English", "en"),
313
+ ("Japanese", "ja"),
314
+ ("Spanish", "es"),
315
+ ("French", "fr"),
316
+ ("German", "de"),
317
+ ("Italian", "it"),
318
+ ("Chinese", "zh"),
319
+ ("Portuguese", "pt"),
320
+ ]
321
+
322
+ model_choices = [
323
+ ("Nova-3 (Latest / Recommended)", "nova-3"),
324
+ ("Nova-2 (Fast & Accurate)", "nova-2"),
325
+ ("Enhanced", "enhanced"),
326
+ ("Base", "base"),
327
+ ]
328
+
329
+ with gr.Blocks(css=custom_css, title="Deepgram SRT Generator & Translator") as demo:
330
+ with gr.Column(elem_classes="glass-container"):
331
+ gr.HTML("<h1 class='glow-title'>Deepgram SRT Subtitles</h1>")
332
+ gr.HTML("<p class='sub-title'>Generate and translate SRT subtitles with state-of-the-art accuracy</p>")
333
+
334
+ # API Keys Accordion (Collapsible for cleaner layout)
335
+ with gr.Accordion("🔑 API Credentials (Optional Override)", open=False):
336
+ with gr.Row():
337
+ dg_key_input = gr.Textbox(
338
+ label="Deepgram API Key",
339
+ placeholder="Enter key to override DEEPGRAM_API_KEY environment variable",
340
+ type="password"
341
+ )
342
+ dl_key_input = gr.Textbox(
343
+ label="DeepL API Key",
344
+ placeholder="Enter key to override DEEPL_API_KEY environment variable",
345
+ type="password"
346
+ )
347
+
348
+ with gr.Tabs(elem_classes="tabs"):
349
+
350
+ # --- Tab 1: Video Transcription ---
351
+ with gr.TabItem("🎥 Transcribe Video"):
352
+ with gr.Row():
353
+ with gr.Column(scale=1):
354
+ video_input = gr.Video(label="Upload Video", sources=["upload"])
355
+
356
+ with gr.Row():
357
+ video_model = gr.Dropdown(
358
+ choices=model_choices, value="nova-3", label="Deepgram Model"
359
+ )
360
+ video_lang = gr.Dropdown(
361
+ choices=language_choices, value="auto", label="Audio Language", allow_custom_value=True
362
+ )
363
+
364
+ with gr.Row():
365
+ video_diarize = gr.Checkbox(label="Speaker Diarization", value=True)
366
+ video_trans = gr.Dropdown(
367
+ choices=translation_choices, value="", label="Translate Subtitles to (DeepL)"
368
+ )
369
+
370
+ video_btn = gr.Button("Generate Subtitles", elem_classes="action-btn")
371
+
372
+ with gr.Column(scale=1):
373
+ with gr.Tabs():
374
+ with gr.TabItem("Original Subtitles"):
375
+ video_original_srt = gr.Textbox(label="SRT Output", show_copy_button=True, lines=15)
376
+ video_original_file = gr.File(label="Download original SRT")
377
+ with gr.TabItem("Translated Subtitles"):
378
+ video_translated_srt = gr.Textbox(label="Translated SRT Output", show_copy_button=True, lines=15)
379
+ video_translated_file = gr.File(label="Download translated SRT")
380
+
381
+ video_btn.click(
382
+ fn=process_transcribe,
383
+ inputs=[
384
+ video_input,
385
+ video_model,
386
+ video_lang,
387
+ video_diarize,
388
+ video_trans,
389
+ dg_key_input,
390
+ dl_key_input
391
+ ],
392
+ outputs=[
393
+ video_original_srt,
394
+ video_original_file,
395
+ video_translated_srt,
396
+ video_translated_file
397
+ ],
398
+ api_name="transcribe_video"
399
+ )
400
+
401
+ # --- Tab 2: Audio Transcription ---
402
+ with gr.TabItem("🎵 Transcribe Audio"):
403
+ with gr.Row():
404
+ with gr.Column(scale=1):
405
+ audio_input = gr.Audio(label="Upload Audio", type="filepath", sources=["upload"])
406
+
407
+ with gr.Row():
408
+ audio_model = gr.Dropdown(
409
+ choices=model_choices, value="nova-3", label="Deepgram Model"
410
+ )
411
+ audio_lang = gr.Dropdown(
412
+ choices=language_choices, value="auto", label="Audio Language", allow_custom_value=True
413
+ )
414
+
415
+ with gr.Row():
416
+ audio_diarize = gr.Checkbox(label="Speaker Diarization", value=True)
417
+ audio_trans = gr.Dropdown(
418
+ choices=translation_choices, value="", label="Translate Subtitles to (DeepL)"
419
+ )
420
+
421
+ audio_btn = gr.Button("Generate Subtitles", elem_classes="action-btn")
422
+
423
+ with gr.Column(scale=1):
424
+ with gr.Tabs():
425
+ with gr.TabItem("Original Subtitles"):
426
+ audio_original_srt = gr.Textbox(label="SRT Output", show_copy_button=True, lines=15)
427
+ audio_original_file = gr.File(label="Download original SRT")
428
+ with gr.TabItem("Translated Subtitles"):
429
+ audio_translated_srt = gr.Textbox(label="Translated SRT Output", show_copy_button=True, lines=15)
430
+ audio_translated_file = gr.File(label="Download translated SRT")
431
+
432
+ audio_btn.click(
433
+ fn=process_transcribe,
434
+ inputs=[
435
+ audio_input,
436
+ audio_model,
437
+ audio_lang,
438
+ audio_diarize,
439
+ audio_trans,
440
+ dg_key_input,
441
+ dl_key_input
442
+ ],
443
+ outputs=[
444
+ audio_original_srt,
445
+ audio_original_file,
446
+ audio_translated_srt,
447
+ audio_translated_file
448
+ ],
449
+ api_name="transcribe_audio"
450
+ )
451
+
452
+ # --- Tab 3: SRT Translation ---
453
+ with gr.TabItem("📄 Translate SRT File"):
454
+ with gr.Row():
455
+ with gr.Column(scale=1):
456
+ srt_input = gr.File(label="Upload SRT File", file_types=[".srt"])
457
+ srt_trans_lang = gr.Dropdown(
458
+ choices=translation_choices[1:], value="ko", label="Translate Subtitles to (DeepL)"
459
+ )
460
+ srt_btn = gr.Button("Translate File", elem_classes="action-btn")
461
+
462
+ with gr.Column(scale=1):
463
+ srt_output_text = gr.Textbox(label="Translated SRT Output", show_copy_button=True, lines=15)
464
+ srt_output_file = gr.File(label="Download translated SRT")
465
+
466
+ srt_btn.click(
467
+ fn=process_translate_srt,
468
+ inputs=[
469
+ srt_input,
470
+ srt_trans_lang,
471
+ dl_key_input
472
+ ],
473
+ outputs=[
474
+ srt_output_text,
475
+ srt_output_file
476
+ ],
477
+ api_name="translate_srt"
478
+ )
479
+
480
+ gr.HTML("<div class='footer-text'>Deepgram SRT Subtitle Tool • Powered by Deepgram & DeepL</div>")
481
+
482
+ if __name__ == "__main__":
483
+ demo.queue()
484
+ demo.launch(server_name="0.0.0.0", server_port=7860)
main.py ADDED
@@ -0,0 +1,348 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import httpx
3
+ import os
4
+ import argparse
5
+ from datetime import datetime
6
+ from dotenv import load_dotenv
7
+ from deepgram import DeepgramClient, PrerecordedOptions
8
+ from deepgram_captions import DeepgramConverter, srt
9
+ from moviepy.video.io.VideoFileClip import VideoFileClip
10
+ from moviepy.audio.io.AudioFileClip import AudioFileClip
11
+ import deepl
12
+ import re
13
+
14
+ load_dotenv()
15
+
16
+ def cleanup_srt_punctuation(srt_content):
17
+ # Split the SRT content into blocks
18
+ blocks = re.split(r'\n\s*\n', srt_content.strip())
19
+
20
+ parsed_blocks = []
21
+ for block in blocks:
22
+ lines = block.split('\n')
23
+ if len(lines) >= 2:
24
+ index = lines[0]
25
+ timecode = lines[1]
26
+ text = "\n".join(lines[2:]) if len(lines) > 2 else ""
27
+ parsed_blocks.append({
28
+ "index": index,
29
+ "timecode": timecode,
30
+ "text": text
31
+ })
32
+
33
+ # Rule 1: Clean up spaces before punctuation within each block
34
+ for block in parsed_blocks:
35
+ if "text" in block:
36
+ block["text"] = re.sub(r'\s+([.,!?~:;。、])', r'\1', block["text"])
37
+
38
+ # Rule 2 & 3: Handle leading punctuation and punctuation-only blocks
39
+ for i in range(len(parsed_blocks)):
40
+ block = parsed_blocks[i]
41
+ if "text" not in block:
42
+ continue
43
+
44
+ text = block["text"].strip()
45
+
46
+ # Check if the block is only punctuation
47
+ if text and all(c in ".,!?~:;。、" or c.isspace() for c in text):
48
+ for j in range(i - 1, -1, -1):
49
+ prev_block = parsed_blocks[j]
50
+ if "text" in prev_block and prev_block["text"].strip():
51
+ prev_block["text"] = prev_block["text"].rstrip() + " " + text
52
+ prev_block["text"] = re.sub(r'\s+([.,!?~:;。、])', r'\1', prev_block["text"])
53
+ break
54
+ block["text"] = ""
55
+ continue
56
+
57
+ # Check if the block starts with leading punctuation (e.g. ", text")
58
+ match = re.match(r'^([.,!?~:;。、\s]+)(.*)', block["text"])
59
+ if match:
60
+ lead_punct = match.group(1).strip()
61
+ remaining_text = match.group(2)
62
+ if lead_punct:
63
+ for j in range(i - 1, -1, -1):
64
+ prev_block = parsed_blocks[j]
65
+ if "text" in prev_block and prev_block["text"].strip():
66
+ prev_block["text"] = prev_block["text"].rstrip() + " " + lead_punct
67
+ prev_block["text"] = re.sub(r'\s+([.,!?~:;。、])', r'\1', prev_block["text"])
68
+ break
69
+ block["text"] = remaining_text
70
+
71
+ # Reconstruct and re-index the SRT string, filtering out empty blocks
72
+ reconstructed = []
73
+ entry = 1
74
+ for block in parsed_blocks:
75
+ text = block["text"].strip()
76
+ if text:
77
+ reconstructed.append(f"{entry}\n{block['timecode']}\n{text}")
78
+ entry += 1
79
+
80
+ return "\n\n".join(reconstructed) + "\n"
81
+
82
+ def translate_srt_content(srt_content, deepl_api_key, target_lang):
83
+ import deepl
84
+
85
+ # Split the SRT content into blocks
86
+ blocks = re.split(r'\n\s*\n', srt_content.strip())
87
+
88
+ parsed_blocks = []
89
+ text_list = []
90
+
91
+ for block in blocks:
92
+ lines = block.split('\n')
93
+ if len(lines) >= 2:
94
+ index = lines[0]
95
+ timecode = lines[1]
96
+ text = "\n".join(lines[2:]) if len(lines) > 2 else ""
97
+
98
+ # Extract speaker tag if any (e.g. "[speaker 0] Hello" or "[Speaker 1]")
99
+ tag = ""
100
+ clean_text = text
101
+ match = re.match(r'^(\[speaker \d+\]\s*)(.*)', text, re.IGNORECASE)
102
+ if match:
103
+ tag = match.group(1)
104
+ clean_text = match.group(2)
105
+
106
+ parsed_blocks.append({
107
+ "index": index,
108
+ "timecode": timecode,
109
+ "tag": tag,
110
+ "clean_text": clean_text
111
+ })
112
+ if clean_text.strip():
113
+ text_list.append(clean_text)
114
+ else:
115
+ parsed_blocks.append({
116
+ "raw": block
117
+ })
118
+
119
+ # Translate clean texts using DeepL text translation
120
+ translator = deepl.Translator(deepl_api_key)
121
+ translated_texts = []
122
+
123
+ # Chunk text requests to avoid hitting DeepL payload size limits
124
+ chunk_size = 50
125
+ for i in range(0, len(text_list), chunk_size):
126
+ chunk = text_list[i:i + chunk_size]
127
+ try:
128
+ results = translator.translate_text(chunk, target_lang=target_lang)
129
+ translated_texts.extend([r.text for r in results])
130
+ except Exception as e:
131
+ print(f"Error translating chunk: {e}")
132
+ translated_texts.extend(chunk)
133
+
134
+ # Reassemble the parsed blocks
135
+ text_idx = 0
136
+ reconstructed = []
137
+ entry = 1
138
+
139
+ for block in parsed_blocks:
140
+ if "raw" in block:
141
+ reconstructed.append(block["raw"])
142
+ else:
143
+ clean_text = block["clean_text"]
144
+ tag = block["tag"]
145
+
146
+ if clean_text.strip():
147
+ translated_text = translated_texts[text_idx] if text_idx < len(translated_texts) else clean_text
148
+ text_idx += 1
149
+ full_text = tag + translated_text
150
+ else:
151
+ full_text = tag + clean_text
152
+
153
+ # Filter out empty blocks after translation and re-index sequentially
154
+ stripped_text = full_text.strip()
155
+ if stripped_text:
156
+ reconstructed.append(f"{entry}\n{block['timecode']}\n{stripped_text}")
157
+ entry += 1
158
+
159
+ return "\n\n".join(reconstructed) + "\n"
160
+
161
+ def main():
162
+ parser = argparse.ArgumentParser(description="Transcribe video/audio to SRT subtitles using Deepgram.")
163
+ parser.add_argument("filepath", type=str, help="Path to the audio or video file to transcribe.")
164
+ parser.add_argument("-m", "--model", type=str, default="nova-3", help="Deepgram model to use (default: %(default)s).")
165
+ parser.add_argument("-l", "--language", type=str, default=None, help="BCP-47 language tag (e.g. 'en', 'es', 'fr'), or 'auto'/'detect' to enable automatic language detection.")
166
+ parser.add_argument("--no-diarize", dest="diarize", action="store_false", help="Disable speaker diarization.")
167
+ parser.add_argument("-t", "--translate-to", type=str, default=None, help="Translate the generated subtitles to this BCP-47 language tag (e.g. 'ko', 'en', 'ja') using DeepL.")
168
+ parser.set_defaults(diarize=True)
169
+
170
+ args = parser.parse_args()
171
+ filepath = args.filepath
172
+
173
+ # Resolve filepath. If it doesn't exist directly but exists in 'media/', use it from there.
174
+ if not os.path.exists(filepath):
175
+ media_fallback = os.path.join("media", filepath)
176
+ if os.path.exists(media_fallback):
177
+ filepath = media_fallback
178
+
179
+ if not os.path.exists(filepath):
180
+ print(f"Error: File '{filepath}' not found.")
181
+ print("Please check the path or place the file in the 'media' directory.")
182
+ return
183
+
184
+ _, ext = os.path.splitext(filepath.lower())
185
+
186
+ if ext == '.srt':
187
+ if not args.translate_to:
188
+ print("Error: When passing an .srt file, you must specify a target language using -t or --translate-to.")
189
+ return
190
+
191
+ deepl_api_key = os.getenv("DEEPL_API_KEY") or os.getenv("DEEPL_AUTH_KEY")
192
+ if not deepl_api_key:
193
+ print("Error: DEEPL_API_KEY or DEEPL_AUTH_KEY environment variable is not set.")
194
+ print("Please set it in your environment or add it to your .env file to use translation.")
195
+ return
196
+
197
+ try:
198
+ target_lang = args.translate_to.upper()
199
+ if target_lang == "EN":
200
+ target_lang = "EN-US"
201
+ elif target_lang == "PT":
202
+ target_lang = "PT-BR"
203
+
204
+ base, _ = os.path.splitext(filepath)
205
+ translated_srt_path = f"{base}.{args.translate_to.lower()}.srt"
206
+
207
+ print(f"Translating {filepath} to {args.translate_to} using DeepL...")
208
+ with open(filepath, "r", encoding="utf-8") as f:
209
+ original_content = f.read()
210
+
211
+ translated_content = translate_srt_content(original_content, deepl_api_key, target_lang)
212
+ cleaned_content = cleanup_srt_punctuation(translated_content)
213
+
214
+ with open(translated_srt_path, "w", encoding="utf-8") as f:
215
+ f.write(cleaned_content)
216
+
217
+ print(f"Successfully translated subtitles. Saved to: {translated_srt_path}")
218
+ except Exception as translate_err:
219
+ print(f"An error occurred during translation: {translate_err}")
220
+ return
221
+
222
+ api_key = os.getenv("DEEPGRAM_API_KEY")
223
+ if not api_key:
224
+ print("Error: DEEPGRAM_API_KEY environment variable is not set.")
225
+ print("Please set it in your environment or add it to a .env file in the project directory.")
226
+ return
227
+
228
+ try:
229
+ deepgram = DeepgramClient(api_key)
230
+ is_audio = ext in {'.mp3', '.wav', '.m4a', '.flac', '.ogg', '.aac', '.wma', '.opus', '.webm', '.m4b', '.mp4a', '.aiff', '.aif', '.mp2'}
231
+
232
+ audio_filepath = filepath
233
+ should_remove_audio = False
234
+
235
+ if not is_audio:
236
+ audio_filepath = f"{filepath}-audio.mp3"
237
+ should_remove_audio = False
238
+
239
+ audio_exists = False
240
+ if os.path.exists(audio_filepath) and os.path.getsize(audio_filepath) > 0:
241
+ try:
242
+ with VideoFileClip(filepath) as video_clip:
243
+ video_duration = video_clip.duration
244
+ with AudioFileClip(audio_filepath) as audio_clip:
245
+ audio_duration = audio_clip.duration
246
+
247
+ if abs(video_duration - audio_duration) < 1.0:
248
+ audio_exists = True
249
+ print(f"Found existing audio file '{audio_filepath}' with matching duration. Skipping extraction.")
250
+ except Exception as check_err:
251
+ print(f"Could not verify existing audio file: {check_err}. Re-extracting...")
252
+
253
+ if not audio_exists:
254
+ try:
255
+ with VideoFileClip(filepath) as video_clip:
256
+ audio_clip = video_clip.audio
257
+ audio_clip.write_audiofile(audio_filepath)
258
+ except Exception as e:
259
+ print(f"An error occurred extracting audio from video: {e}")
260
+ return
261
+
262
+ with open(audio_filepath, "rb") as file:
263
+ buffer_data = file.read()
264
+
265
+ payload = {"buffer": buffer_data}
266
+
267
+ options_dict = {
268
+ "model": args.model,
269
+ "smart_format": True,
270
+ "utterances": True,
271
+ "punctuate": True,
272
+ "diarize": args.diarize,
273
+ }
274
+ if args.language:
275
+ if args.language.lower() in {"auto", "detect"}:
276
+ options_dict["detect_language"] = True
277
+ else:
278
+ options_dict["language"] = args.language
279
+ options = PrerecordedOptions(**options_dict)
280
+
281
+ print("Making request to deepgram")
282
+
283
+ before = datetime.now()
284
+ response = deepgram.listen.rest.v("1").transcribe_file(
285
+ payload, options, timeout=httpx.Timeout(30000.0, connect=10.0)
286
+ )
287
+ after = datetime.now()
288
+ print("Got response from deepgram")
289
+
290
+ print(response.to_json(indent=4))
291
+
292
+ # Check if the transcription contains words to avoid IndexError on silent audio files
293
+ has_words = False
294
+ try:
295
+ if hasattr(response, 'results') and response.results:
296
+ if response.results.channels and response.results.channels[0].alternatives:
297
+ if response.results.channels[0].alternatives[0].words:
298
+ has_words = True
299
+ except Exception:
300
+ pass
301
+
302
+ if not has_words:
303
+ print("No speech or words detected in the audio file. Generating empty subtitle file.")
304
+ captions = ""
305
+ else:
306
+ transcription = DeepgramConverter(response)
307
+ captions = srt(transcription)
308
+
309
+ original_srt_path = f"{filepath}-captions.srt"
310
+ cleaned_captions = cleanup_srt_punctuation(captions)
311
+ with open(original_srt_path, "a", encoding="utf-8") as f:
312
+ f.write(cleaned_captions)
313
+
314
+ if args.translate_to:
315
+ print(f"Translating subtitles to {args.translate_to} using DeepL...")
316
+ deepl_api_key = os.getenv("DEEPL_API_KEY") or os.getenv("DEEPL_AUTH_KEY")
317
+ if not deepl_api_key:
318
+ print("Error: DEEPL_API_KEY or DEEPL_AUTH_KEY environment variable is not set.")
319
+ print("Please set it in your environment or add it to your .env file to use translation.")
320
+ else:
321
+ try:
322
+ target_lang = args.translate_to.upper()
323
+ # DeepL-specific target language code overrides
324
+ if target_lang == "EN":
325
+ target_lang = "EN-US"
326
+ elif target_lang == "PT":
327
+ target_lang = "PT-BR"
328
+
329
+ translated_srt_path = f"{filepath}-captions.{args.translate_to.lower()}.srt"
330
+
331
+ # Translate and post-process
332
+ translated_content = translate_srt_content(cleaned_captions, deepl_api_key, target_lang)
333
+ cleaned_content = cleanup_srt_punctuation(translated_content)
334
+ with open(translated_srt_path, "w", encoding="utf-8") as f:
335
+ f.write(cleaned_content)
336
+
337
+ print(f"Successfully translated subtitles. Saved to: {translated_srt_path}")
338
+ except Exception as translate_err:
339
+ print(f"An error occurred during translation: {translate_err}")
340
+
341
+ if should_remove_audio:
342
+ os.remove(audio_filepath)
343
+
344
+ except Exception as e:
345
+ print(f"Exception: {e}")
346
+
347
+ if __name__ == "__main__":
348
+ main()
requirements.txt ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ aenum==3.1.15
2
+ aiofiles==24.1.0
3
+ aiohappyeyeballs==2.6.1
4
+ aiohttp==3.11.18
5
+ aiosignal==1.3.2
6
+ anyio==4.9.0
7
+ attrs==25.3.0
8
+ certifi==2025.1.31
9
+ dataclasses-json==0.6.7
10
+ decorator==5.2.1
11
+ deepgram-captions==1.2.0
12
+ deepgram-sdk==3.11.0
13
+ deprecation==2.1.0
14
+ frozenlist==1.6.0
15
+ h11==0.14.0
16
+ httpcore==1.0.8
17
+ httpx==0.28.1
18
+ idna==3.10
19
+ imageio==2.37.0
20
+ imageio-ffmpeg==0.6.0
21
+ marshmallow==3.26.1
22
+ moviepy==2.1.2
23
+ multidict==6.4.3
24
+ mypy-extensions==1.1.0
25
+ numpy==2.2.5
26
+ packaging==25.0
27
+ pillow==10.4.0
28
+ proglog==0.1.11
29
+ propcache==0.3.1
30
+ python-dotenv==1.1.0
31
+ sniffio==1.3.1
32
+ tqdm==4.67.1
33
+ typing-extensions==4.13.2
34
+ typing-inspect==0.9.0
35
+ websockets==15.0.1
36
+ yarl==1.20.0
37
+ deepl==1.30.0
38
+ gradio>=4.44.0
39
+