Spaces:
Build error
Build error
Commit
·
2302206
1
Parent(s):
58d3731
Add video caption app with Whisper auto-captioning and styling options
Browse files- .gitignore +41 -0
- README.md +25 -0
- app.py +633 -0
- requirements.txt +9 -0
- setup.sh +24 -0
.gitignore
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Byte-compiled / optimized / DLL files
|
| 2 |
+
__pycache__/
|
| 3 |
+
*.py[cod]
|
| 4 |
+
*$py.class
|
| 5 |
+
|
| 6 |
+
# Distribution / packaging
|
| 7 |
+
dist/
|
| 8 |
+
build/
|
| 9 |
+
*.egg-info/
|
| 10 |
+
|
| 11 |
+
# Virtual environments
|
| 12 |
+
venv/
|
| 13 |
+
env/
|
| 14 |
+
ENV/
|
| 15 |
+
|
| 16 |
+
# Jupyter Notebook
|
| 17 |
+
.ipynb_checkpoints
|
| 18 |
+
|
| 19 |
+
# Temporary files
|
| 20 |
+
temp/
|
| 21 |
+
tmp/
|
| 22 |
+
*.temp
|
| 23 |
+
*.tmp
|
| 24 |
+
|
| 25 |
+
# OS-specific files
|
| 26 |
+
.DS_Store
|
| 27 |
+
Thumbs.db
|
| 28 |
+
|
| 29 |
+
# Model weights/large files
|
| 30 |
+
*.pt
|
| 31 |
+
*.pth
|
| 32 |
+
*.model
|
| 33 |
+
|
| 34 |
+
# Logs
|
| 35 |
+
logs/
|
| 36 |
+
*.log
|
| 37 |
+
|
| 38 |
+
# Testing
|
| 39 |
+
.coverage
|
| 40 |
+
htmlcov/
|
| 41 |
+
.pytest_cache/
|
README.md
CHANGED
|
@@ -9,4 +9,29 @@ app_file: app.py
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 9 |
pinned: false
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Video Caption Generator
|
| 13 |
+
|
| 14 |
+
This tool allows you to add captions to your videos with precise control over styling and positioning. You can either auto-generate captions using Whisper AI speech recognition or provide your own captions in SRT, ASS, or VTT format.
|
| 15 |
+
|
| 16 |
+
## Features
|
| 17 |
+
|
| 18 |
+
- **Auto Caption Generation**: Extract and transcribe audio from your video using OpenAI's Whisper model
|
| 19 |
+
- **Manual Caption Support**: Input your own captions in popular formats (SRT, ASS, VTT)
|
| 20 |
+
- **Customizable Styling**: Control font, size, color, and positioning of captions
|
| 21 |
+
- **High-Quality Output**: Burn captions directly into your video with FFmpeg
|
| 22 |
+
|
| 23 |
+
## How to Use
|
| 24 |
+
|
| 25 |
+
1. Upload your video file
|
| 26 |
+
2. Choose whether to auto-generate captions or provide your own
|
| 27 |
+
3. Customize font, size, color, and alignment
|
| 28 |
+
4. Click "Generate Captioned Video" and wait for processing
|
| 29 |
+
5. Download the resulting video with embedded captions
|
| 30 |
+
|
| 31 |
+
Perfect for creating accessible content, adding subtitles to multilingual videos, or emphasizing important information in educational content.
|
| 32 |
+
|
| 33 |
+
## Note
|
| 34 |
+
|
| 35 |
+
Processing time depends on video length and complexity. Auto-caption generation utilizes Whisper and may take longer for larger files.
|
| 36 |
+
|
| 37 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
app.py
ADDED
|
@@ -0,0 +1,633 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import tempfile
|
| 3 |
+
import gradio as gr
|
| 4 |
+
import ffmpeg
|
| 5 |
+
import logging
|
| 6 |
+
import whisper as openai_whisper # Renamed to avoid potential conflicts
|
| 7 |
+
import numpy as np
|
| 8 |
+
import torch
|
| 9 |
+
import datetime
|
| 10 |
+
import subprocess
|
| 11 |
+
import shlex
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
import re # For parsing ASS/SRT
|
| 14 |
+
|
| 15 |
+
# Configure logging
|
| 16 |
+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
# Define fonts directory - adapt for Hugging Face environment if needed
|
| 20 |
+
FONTS_DIR = '/usr/share/fonts/truetype' # Common Linux font location
|
| 21 |
+
# Check common font locations for other OS if needed
|
| 22 |
+
if not os.path.exists(FONTS_DIR) and os.path.exists('/System/Library/Fonts'): # macOS
|
| 23 |
+
FONTS_DIR = '/System/Library/Fonts'
|
| 24 |
+
elif not os.path.exists(FONTS_DIR) and os.path.exists('C:\Windows\Fonts'): # Windows
|
| 25 |
+
FONTS_DIR = 'C:\Windows\Fonts'
|
| 26 |
+
|
| 27 |
+
FONT_PATHS = {}
|
| 28 |
+
ACCEPTABLE_FONTS = ['Arial', 'Helvetica', 'Times New Roman'] # Start with common fallbacks
|
| 29 |
+
try:
|
| 30 |
+
if FONTS_DIR and os.path.exists(FONTS_DIR):
|
| 31 |
+
logger.info(f"Searching for fonts in: {FONTS_DIR}")
|
| 32 |
+
found_fonts = []
|
| 33 |
+
for root, dirs, files in os.walk(FONTS_DIR):
|
| 34 |
+
for file in files:
|
| 35 |
+
if file.lower().endswith(('.ttf', '.otf', '.ttc')):
|
| 36 |
+
font_path = os.path.join(root, file)
|
| 37 |
+
font_name = os.path.splitext(file)[0]
|
| 38 |
+
# Basic name cleanup
|
| 39 |
+
base_font_name = re.sub(r'[-_ ]?(bold|italic|regular|medium|light|condensed)?$', '', font_name, flags=re.IGNORECASE)
|
| 40 |
+
if base_font_name not in FONT_PATHS:
|
| 41 |
+
FONT_PATHS[base_font_name] = font_path
|
| 42 |
+
found_fonts.append(base_font_name)
|
| 43 |
+
if found_fonts:
|
| 44 |
+
ACCEPTABLE_FONTS = sorted(list(set(found_fonts + ACCEPTABLE_FONTS)))
|
| 45 |
+
logger.info(f"Found system fonts: {ACCEPTABLE_FONTS}")
|
| 46 |
+
else:
|
| 47 |
+
logger.warning(f"No font files found in {FONTS_DIR}. Using defaults.")
|
| 48 |
+
else:
|
| 49 |
+
logger.warning(f"Font directory {FONTS_DIR} not found. Using defaults: {ACCEPTABLE_FONTS}")
|
| 50 |
+
except Exception as e:
|
| 51 |
+
logger.warning(f"Could not load system fonts from {FONTS_DIR}: {e}. Using defaults: {ACCEPTABLE_FONTS}")
|
| 52 |
+
|
| 53 |
+
# Global variable for Whisper model to avoid reloading
|
| 54 |
+
whisper_model = None
|
| 55 |
+
|
| 56 |
+
def generate_style_line(options):
|
| 57 |
+
"""Generate ASS style line from options. Uses common defaults.
|
| 58 |
+
Ensure color format is correct (&HBBGGRRAA or &HAABBGGRR depending on FFmpeg build)
|
| 59 |
+
Using &HBBGGRR format for PrimaryColour based on common FFmpeg usage.
|
| 60 |
+
"""
|
| 61 |
+
# Convert hex color picker (#FFFFFF) to ASS format (&HBBGGRR)
|
| 62 |
+
def hex_to_ass_bgr(hex_color):
|
| 63 |
+
hex_color = hex_color.lstrip('#')
|
| 64 |
+
if len(hex_color) == 6:
|
| 65 |
+
r, g, b = tuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))
|
| 66 |
+
return f"&H{b:02X}{g:02X}{r:02X}"
|
| 67 |
+
return '&H00FFFFFF' # Default to white if format is wrong
|
| 68 |
+
|
| 69 |
+
primary_color_ass = hex_to_ass_bgr(options.get('primary_color', '#FFFFFF'))
|
| 70 |
+
|
| 71 |
+
style_options = {
|
| 72 |
+
'Name': 'Default',
|
| 73 |
+
'Fontname': options.get('font_name', 'Arial'), # Ensure this font is accessible to FFmpeg
|
| 74 |
+
'Fontsize': options.get('font_size', 24),
|
| 75 |
+
'PrimaryColour': primary_color_ass,
|
| 76 |
+
'SecondaryColour': '&H000000FF', # Often unused, but good to define
|
| 77 |
+
'OutlineColour': '&H00000000', # Black outline
|
| 78 |
+
'BackColour': '&H80000000', # Semi-transparent black background/shadow
|
| 79 |
+
'Bold': 0, # Use -1 for True, 0 for False in ASS
|
| 80 |
+
'Italic': 0,
|
| 81 |
+
'Underline': 0,
|
| 82 |
+
'StrikeOut': 0,
|
| 83 |
+
'ScaleX': 100,
|
| 84 |
+
'ScaleY': 100,
|
| 85 |
+
'Spacing': 0,
|
| 86 |
+
'Angle': 0,
|
| 87 |
+
'BorderStyle': 1, # 1 = Outline + Shadow
|
| 88 |
+
'Outline': 2, # Outline thickness
|
| 89 |
+
'Shadow': 1, # Shadow distance
|
| 90 |
+
'Alignment': options.get('alignment', 2), # 2 = Bottom Center
|
| 91 |
+
'MarginL': 10,
|
| 92 |
+
'MarginR': 10,
|
| 93 |
+
'MarginV': 10, # Bottom margin
|
| 94 |
+
'Encoding': 1 # Default ANSI encoding
|
| 95 |
+
}
|
| 96 |
+
logger.info(f"Generated ASS Style Options: {style_options}")
|
| 97 |
+
return f"Style: {','.join(map(str, style_options.values()))}"
|
| 98 |
+
|
| 99 |
+
def transcribe_audio(audio_path, progress=None):
|
| 100 |
+
"""Transcribe audio using Whisper ASR model."""
|
| 101 |
+
global whisper_model
|
| 102 |
+
logger.info(f"Starting transcription for: {audio_path}")
|
| 103 |
+
try:
|
| 104 |
+
if whisper_model is None:
|
| 105 |
+
safe_progress_update(progress, 0.1, "Loading Whisper model...")
|
| 106 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 107 |
+
logger.info(f"Using device: {device} for Whisper")
|
| 108 |
+
# Use a smaller model if only CPU is available to potentially speed things up
|
| 109 |
+
model_size = "base" if device == "cuda" else "tiny.en" # or "tiny"
|
| 110 |
+
logger.info(f"Loading Whisper model size: {model_size}")
|
| 111 |
+
whisper_model = openai_whisper.load_model(model_size, device=device)
|
| 112 |
+
safe_progress_update(progress, 0.3, "Model loaded, processing audio...")
|
| 113 |
+
|
| 114 |
+
result = whisper_model.transcribe(audio_path, fp16=torch.cuda.is_available())
|
| 115 |
+
logger.info(f"Transcription result (first 100 chars): {str(result)[:100]}")
|
| 116 |
+
safe_progress_update(progress, 0.7, "Transcription complete, formatting captions...")
|
| 117 |
+
return result
|
| 118 |
+
except Exception as e:
|
| 119 |
+
logger.exception(f"Error transcribing audio: {audio_path}") # Use logger.exception to include traceback
|
| 120 |
+
raise
|
| 121 |
+
|
| 122 |
+
def format_time(seconds):
|
| 123 |
+
"""Format time in SRT/ASS format (H:MM:SS.ms)."""
|
| 124 |
+
# ASS format uses H:MM:SS.xx (hundredths of a second)
|
| 125 |
+
hundredths = int((seconds % 1) * 100)
|
| 126 |
+
s = int(seconds) % 60
|
| 127 |
+
m = int(seconds / 60) % 60
|
| 128 |
+
h = int(seconds / 3600)
|
| 129 |
+
return f"{h}:{m:02d}:{s:02d}.{hundredths:02d}"
|
| 130 |
+
|
| 131 |
+
def format_time_srt(seconds):
|
| 132 |
+
"""Format time in SRT format (HH:MM:SS,ms)."""
|
| 133 |
+
ms = int((seconds % 1) * 1000)
|
| 134 |
+
s = int(seconds) % 60
|
| 135 |
+
m = int(seconds / 60) % 60
|
| 136 |
+
h = int(seconds / 3600)
|
| 137 |
+
return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
|
| 138 |
+
|
| 139 |
+
def generate_srt_from_transcript(segments):
|
| 140 |
+
"""Convert whisper segments to SRT format."""
|
| 141 |
+
srt_content = ""
|
| 142 |
+
for i, segment in enumerate(segments):
|
| 143 |
+
start_time = format_time_srt(segment["start"])
|
| 144 |
+
end_time = format_time_srt(segment["end"])
|
| 145 |
+
text = segment["text"].strip()
|
| 146 |
+
srt_content += f"{i+1}\n{start_time} --> {end_time}\n{text}\n\n"
|
| 147 |
+
logger.info(f"Generated SRT (first 200 chars): {srt_content[:200]}")
|
| 148 |
+
return srt_content.strip()
|
| 149 |
+
|
| 150 |
+
def generate_ass_dialogue_line(segment, style_name='Default'):
|
| 151 |
+
"""Generate a single ASS dialogue line from a segment."""
|
| 152 |
+
start_time = format_time(segment["start"])
|
| 153 |
+
end_time = format_time(segment["end"])
|
| 154 |
+
text = segment["text"].strip().replace('\n', '\\N') # Replace newline with ASS newline
|
| 155 |
+
# Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
|
| 156 |
+
return f"Dialogue: 0,{start_time},{end_time},{style_name},,0,0,0,,{text}"
|
| 157 |
+
|
| 158 |
+
def generate_ass_from_transcript(segments, style_options):
|
| 159 |
+
"""Convert whisper segments to ASS format including style header."""
|
| 160 |
+
style_line = generate_style_line(style_options)
|
| 161 |
+
ass_header = f"""
|
| 162 |
+
[Script Info]
|
| 163 |
+
Title: Generated Captions
|
| 164 |
+
ScriptType: v4.00+
|
| 165 |
+
WrapStyle: 0
|
| 166 |
+
PlayResX: 384
|
| 167 |
+
PlayResY: 288
|
| 168 |
+
ScaledBorderAndShadow: yes
|
| 169 |
+
|
| 170 |
+
[V4+ Styles]
|
| 171 |
+
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
|
| 172 |
+
{style_line}
|
| 173 |
+
|
| 174 |
+
[Events]
|
| 175 |
+
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
|
| 176 |
+
"""
|
| 177 |
+
dialogue_lines = [generate_ass_dialogue_line(seg) for seg in segments]
|
| 178 |
+
full_ass_content = ass_header + "\n".join(dialogue_lines)
|
| 179 |
+
logger.info(f"Generated ASS (first 300 chars): {full_ass_content[:300]}")
|
| 180 |
+
return full_ass_content
|
| 181 |
+
|
| 182 |
+
def extract_audio(video_path, output_path):
|
| 183 |
+
"""Extract audio from video file using ffmpeg subprocess."""
|
| 184 |
+
logger.info(f"Attempting to extract audio from {video_path} to {output_path}")
|
| 185 |
+
try:
|
| 186 |
+
command = [
|
| 187 |
+
"ffmpeg", "-i", video_path,
|
| 188 |
+
"-vn", # No video
|
| 189 |
+
"-acodec", "pcm_s16le", # Standard WAV format
|
| 190 |
+
"-ac", "1", # Mono
|
| 191 |
+
"-ar", "16000", # 16kHz sample rate (common for ASR)
|
| 192 |
+
"-y", # Overwrite output
|
| 193 |
+
output_path
|
| 194 |
+
]
|
| 195 |
+
logger.info(f"Running audio extraction command: {' '.join(map(shlex.quote, command))}")
|
| 196 |
+
process = subprocess.run(
|
| 197 |
+
command,
|
| 198 |
+
stdout=subprocess.PIPE,
|
| 199 |
+
stderr=subprocess.PIPE,
|
| 200 |
+
text=True,
|
| 201 |
+
encoding='utf-8', # Explicitly set encoding
|
| 202 |
+
check=False
|
| 203 |
+
)
|
| 204 |
+
|
| 205 |
+
if process.returncode != 0:
|
| 206 |
+
logger.error(f"FFmpeg audio extraction error (Code {process.returncode}):\nSTDOUT:\n{process.stdout}\nSTDERR:\n{process.stderr}")
|
| 207 |
+
return False, f"FFmpeg failed (Code {process.returncode}): {process.stderr[:500]}..."
|
| 208 |
+
|
| 209 |
+
if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
|
| 210 |
+
logger.error(f"Audio extraction failed: Output file not created or empty. FFmpeg stderr: {process.stderr}")
|
| 211 |
+
return False, f"Output audio file not created or empty. FFmpeg stderr: {process.stderr[:500]}..."
|
| 212 |
+
|
| 213 |
+
logger.info(f"Audio extracted successfully to {output_path}, size: {os.path.getsize(output_path)} bytes")
|
| 214 |
+
return True, ""
|
| 215 |
+
except Exception as e:
|
| 216 |
+
logger.exception(f"Exception during audio extraction from {video_path}")
|
| 217 |
+
return False, str(e)
|
| 218 |
+
|
| 219 |
+
def run_ffmpeg_with_subtitles(video_path, subtitle_path, output_path, style_options=None):
|
| 220 |
+
"""Burn subtitles into video using ffmpeg subprocess.
|
| 221 |
+
|
| 222 |
+
Args:
|
| 223 |
+
video_path: Path to input video
|
| 224 |
+
subtitle_path: Path to ASS subtitle file
|
| 225 |
+
output_path: Path to save output video
|
| 226 |
+
style_options: Optional style parameters (not directly used, but kept for consistency)
|
| 227 |
+
|
| 228 |
+
Returns:
|
| 229 |
+
tuple: (success, error_message)
|
| 230 |
+
"""
|
| 231 |
+
logger.info(f"Attempting to burn subtitles from {subtitle_path} into {video_path}")
|
| 232 |
+
|
| 233 |
+
# Check if the subtitle file exists and is not empty
|
| 234 |
+
if not os.path.exists(subtitle_path) or os.path.getsize(subtitle_path) == 0:
|
| 235 |
+
return False, f"Subtitle file {subtitle_path} does not exist or is empty"
|
| 236 |
+
|
| 237 |
+
# Check if the video file exists
|
| 238 |
+
if not os.path.exists(video_path):
|
| 239 |
+
return False, f"Video file {video_path} does not exist"
|
| 240 |
+
|
| 241 |
+
# Validate the video file using ffprobe
|
| 242 |
+
try:
|
| 243 |
+
probe_cmd = [
|
| 244 |
+
"ffprobe", "-v", "error",
|
| 245 |
+
"-select_streams", "v:0",
|
| 246 |
+
"-show_entries", "stream=codec_name,width,height",
|
| 247 |
+
"-of", "json",
|
| 248 |
+
video_path
|
| 249 |
+
]
|
| 250 |
+
probe_result = subprocess.run(
|
| 251 |
+
probe_cmd,
|
| 252 |
+
stdout=subprocess.PIPE,
|
| 253 |
+
stderr=subprocess.PIPE,
|
| 254 |
+
text=True,
|
| 255 |
+
encoding='utf-8'
|
| 256 |
+
)
|
| 257 |
+
|
| 258 |
+
if probe_result.returncode != 0:
|
| 259 |
+
logger.error(f"FFprobe validation failed: {probe_result.stderr}")
|
| 260 |
+
return False, f"FFprobe validation failed: {probe_result.stderr[:200]}..."
|
| 261 |
+
except Exception as e:
|
| 262 |
+
logger.exception(f"Exception during video validation: {video_path}")
|
| 263 |
+
return False, f"Video validation failed: {str(e)}"
|
| 264 |
+
|
| 265 |
+
try:
|
| 266 |
+
# The subtitle path needs to be properly escaped for the filter complex
|
| 267 |
+
# On Windows, backslashes need special handling
|
| 268 |
+
subtitle_path_esc = subtitle_path.replace('\\', '\\\\')
|
| 269 |
+
|
| 270 |
+
# Ensure paths are properly quoted for the shell command
|
| 271 |
+
command = [
|
| 272 |
+
"ffmpeg",
|
| 273 |
+
"-i", video_path,
|
| 274 |
+
"-vf", f"ass='{subtitle_path_esc}'",
|
| 275 |
+
"-c:v", "libx264", # Use H.264 codec for broad compatibility
|
| 276 |
+
"-preset", "medium", # Balance between speed and quality
|
| 277 |
+
"-crf", "23", # Reasonable quality setting (lower is better)
|
| 278 |
+
"-c:a", "aac", # Use AAC for audio
|
| 279 |
+
"-b:a", "128k", # Decent audio bitrate
|
| 280 |
+
"-movflags", "+faststart", # Optimize for web playback
|
| 281 |
+
"-y", # Overwrite output if exists
|
| 282 |
+
output_path
|
| 283 |
+
]
|
| 284 |
+
|
| 285 |
+
logger.info(f"Running subtitle burn command: {' '.join(map(shlex.quote, command))}")
|
| 286 |
+
|
| 287 |
+
process = subprocess.run(
|
| 288 |
+
command,
|
| 289 |
+
stdout=subprocess.PIPE,
|
| 290 |
+
stderr=subprocess.PIPE,
|
| 291 |
+
text=True,
|
| 292 |
+
encoding='utf-8',
|
| 293 |
+
check=False
|
| 294 |
+
)
|
| 295 |
+
|
| 296 |
+
if process.returncode != 0:
|
| 297 |
+
logger.error(f"FFmpeg subtitle burn error (Code {process.returncode}):\nSTDOUT:\n{process.stdout}\nSTDERR:\n{process.stderr}")
|
| 298 |
+
return False, f"FFmpeg failed (Code {process.returncode}): {process.stderr[:500]}..."
|
| 299 |
+
|
| 300 |
+
# Verify output file was created and is not empty
|
| 301 |
+
if not os.path.exists(output_path) or os.path.getsize(output_path) == 0:
|
| 302 |
+
logger.error(f"Subtitle burning failed: Output file not created or empty. FFmpeg stderr: {process.stderr}")
|
| 303 |
+
return False, f"Output video file not created or empty. FFmpeg stderr: {process.stderr[:500]}..."
|
| 304 |
+
|
| 305 |
+
logger.info(f"Subtitles burned successfully, output: {output_path}, size: {os.path.getsize(output_path)} bytes")
|
| 306 |
+
return True, ""
|
| 307 |
+
|
| 308 |
+
except Exception as e:
|
| 309 |
+
logger.exception(f"Exception during subtitle burning: {video_path}")
|
| 310 |
+
return False, str(e)
|
| 311 |
+
|
| 312 |
+
def safe_progress_update(progress_callback, value, desc=""):
|
| 313 |
+
"""Safely update progress without crashing if progress_callback is None or fails."""
|
| 314 |
+
if progress_callback is not None:
|
| 315 |
+
try:
|
| 316 |
+
progress_callback(value, desc)
|
| 317 |
+
except Exception as e:
|
| 318 |
+
# Avoid flooding logs for simple progress updates
|
| 319 |
+
# logger.warning(f"Progress update progress failed: {e}")
|
| 320 |
+
pass # Silently ignore progress update errors
|
| 321 |
+
|
| 322 |
+
def parse_srt_to_dialogue(srt_content):
|
| 323 |
+
"""Basic SRT parser to list of dialogue events for ASS conversion."""
|
| 324 |
+
dialogue = []
|
| 325 |
+
# Regex to find index, timecodes, and text blocks
|
| 326 |
+
# Allows comma or period for milliseconds separator
|
| 327 |
+
pattern = re.compile(
|
| 328 |
+
r'^\s*(\d+)\s*$\n?' # Index line
|
| 329 |
+
r'(\d{1,2}):(\d{2}):(\d{2})[,.](\d{3})\s*-->\s*' # Start time
|
| 330 |
+
r'(\d{1,2}):(\d{2}):(\d{2})[,.](\d{3})\s*$\n' # End time
|
| 331 |
+
r'(.*?)(?=\n\s*\n\d+\s*$|\Z)', # Text block (non-greedy) until blank line and next index or end of string
|
| 332 |
+
re.DOTALL | re.MULTILINE
|
| 333 |
+
)
|
| 334 |
+
|
| 335 |
+
logger.info("Attempting to parse SRT/VTT content...")
|
| 336 |
+
matches_found = 0
|
| 337 |
+
last_index = 0
|
| 338 |
+
for match in pattern.finditer(srt_content):
|
| 339 |
+
matches_found += 1
|
| 340 |
+
try:
|
| 341 |
+
index = int(match.group(1))
|
| 342 |
+
sh, sm, ss, sms = map(int, match.group(2, 3, 4, 5))
|
| 343 |
+
eh, em, es, ems = map(int, match.group(6, 7, 8, 9))
|
| 344 |
+
start_sec = sh * 3600 + sm * 60 + ss + sms / 1000.0
|
| 345 |
+
end_sec = eh * 3600 + em * 60 + es + ems / 1000.0
|
| 346 |
+
text = match.group(10).strip().replace('\n', '\\N') # Replace newline with ASS \N
|
| 347 |
+
|
| 348 |
+
# Basic validation
|
| 349 |
+
if end_sec < start_sec:
|
| 350 |
+
logger.warning(f"SRT parse warning: End time {end_sec} before start time {start_sec} at index {index}. Skipping.")
|
| 351 |
+
continue
|
| 352 |
+
if not text:
|
| 353 |
+
logger.warning(f"SRT parse warning: Empty text content at index {index}. Skipping.")
|
| 354 |
+
continue
|
| 355 |
+
|
| 356 |
+
dialogue.append({'start': start_sec, 'end': end_sec, 'text': text})
|
| 357 |
+
last_index = match.end()
|
| 358 |
+
|
| 359 |
+
except Exception as e:
|
| 360 |
+
logger.warning(f"Could not parse SRT block starting near index {match.group(1)}: {e}")
|
| 361 |
+
|
| 362 |
+
# Check if parsing consumed a reasonable amount of the input
|
| 363 |
+
if matches_found > 0 and last_index < len(srt_content) * 0.8:
|
| 364 |
+
logger.warning(f"SRT parsing finished early. Found {matches_found} blocks, but stopped near character {last_index} of {len(srt_content)}. Input format might be inconsistent.")
|
| 365 |
+
elif matches_found == 0 and len(srt_content) > 10:
|
| 366 |
+
logger.error(f"SRT parsing failed. No dialogue blocks found in content starting with: {srt_content[:100]}...")
|
| 367 |
+
|
| 368 |
+
logger.info(f"Parsed {len(dialogue)} dialogue events from SRT/VTT content.")
|
| 369 |
+
return dialogue
|
| 370 |
+
|
| 371 |
+
def parse_ass_to_dialogue(ass_content):
|
| 372 |
+
"""Basic ASS parser to extract dialogue events."""
|
| 373 |
+
dialogue = []
|
| 374 |
+
# Regex for ASS Dialogue line - make capturing groups non-optional where possible
|
| 375 |
+
# Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
|
| 376 |
+
pattern = re.compile(
|
| 377 |
+
r'^Dialogue:\s*'
|
| 378 |
+
r'(?P<layer>\d+),\s*'
|
| 379 |
+
r'(?P<start>\d+:\d{2}:\d{2}\.\d{2}),\s*'
|
| 380 |
+
r'(?P<end>\d+:\d{2}:\d{2}\.\d{2}),\s*'
|
| 381 |
+
r'(?P<style>[^,]*),\s*' # Style name
|
| 382 |
+
r'(?P<name>[^,]*),\s*' # Actor name
|
| 383 |
+
r'(?P<marginL>\d+),\s*'
|
| 384 |
+
r'(?P<marginR>\d+),\s*'
|
| 385 |
+
r'(?P<marginV>\d+),\s*'
|
| 386 |
+
r'(?P<effect>[^,]*),\s*' # Effect
|
| 387 |
+
r'(?P<text>.*?)$', # Text (rest of line)
|
| 388 |
+
re.IGNORECASE
|
| 389 |
+
)
|
| 390 |
+
|
| 391 |
+
# Helper to convert H:MM:SS.xx to seconds
|
| 392 |
+
def time_to_seconds(time_str):
|
| 393 |
+
try:
|
| 394 |
+
parts = time_str.split(':')
|
| 395 |
+
h = int(parts[0])
|
| 396 |
+
m = int(parts[1])
|
| 397 |
+
s_parts = parts[2].split('.')
|
| 398 |
+
s = int(s_parts[0])
|
| 399 |
+
cs = int(s_parts[1])
|
| 400 |
+
return h * 3600 + m * 60 + s + cs / 100.0
|
| 401 |
+
except Exception as e:
|
| 402 |
+
logger.error(f"Failed to parse time string '{time_str}': {e}")
|
| 403 |
+
return 0.0 # Return 0 on failure to avoid crashing, but log it
|
| 404 |
+
|
| 405 |
+
logger.info("Attempting to parse ASS content...")
|
| 406 |
+
lines_parsed = 0
|
| 407 |
+
for line in ass_content.splitlines():
|
| 408 |
+
line = line.strip()
|
| 409 |
+
if not line.lower().startswith('dialogue:'):
|
| 410 |
+
continue
|
| 411 |
+
|
| 412 |
+
match = pattern.match(line)
|
| 413 |
+
if match:
|
| 414 |
+
lines_parsed += 1
|
| 415 |
+
try:
|
| 416 |
+
start_sec = time_to_seconds(match.group('start'))
|
| 417 |
+
end_sec = time_to_seconds(match.group('end'))
|
| 418 |
+
text = match.group('text').strip() # Already handles \N from ASS spec
|
| 419 |
+
|
| 420 |
+
if end_sec < start_sec:
|
| 421 |
+
logger.warning(f"ASS parse warning: End time {end_sec} before start time {start_sec} in line: '{line}'. Skipping.")
|
| 422 |
+
continue
|
| 423 |
+
if not text:
|
| 424 |
+
logger.warning(f"ASS parse warning: Empty text content in line: '{line}'. Skipping.")
|
| 425 |
+
continue
|
| 426 |
+
|
| 427 |
+
dialogue.append({'start': start_sec, 'end': end_sec, 'text': text})
|
| 428 |
+
except Exception as e:
|
| 429 |
+
logger.warning(f"Could not parse ASS dialogue line: '{line}'. Error: {e}")
|
| 430 |
+
else:
|
| 431 |
+
logger.warning(f"ASS dialogue line did not match expected pattern: '{line}'")
|
| 432 |
+
|
| 433 |
+
if lines_parsed == 0 and len(ass_content) > 50: # Check if content was substantial
|
| 434 |
+
logger.error(f"ASS parsing failed. No dialogue lines matched the expected pattern in content starting with: {ass_content[:200]}...")
|
| 435 |
+
|
| 436 |
+
logger.info(f"Parsed {len(dialogue)} dialogue events from {lines_parsed} matched ASS lines.")
|
| 437 |
+
return dialogue
|
| 438 |
+
|
| 439 |
+
def process_video_with_captions(video, captions, caption_type, font_name, font_size,
|
| 440 |
+
primary_color, alignment, auto_caption):
|
| 441 |
+
"""Main processing function."""
|
| 442 |
+
progress = gr.Progress(track_tqdm=True)
|
| 443 |
+
temp_dir = None
|
| 444 |
+
try:
|
| 445 |
+
progress(0, desc="Initializing...")
|
| 446 |
+
temp_dir = tempfile.mkdtemp()
|
| 447 |
+
logger.info(f"Created temp dir: {temp_dir}")
|
| 448 |
+
|
| 449 |
+
video_path = os.path.join(temp_dir, "input_video.mp4")
|
| 450 |
+
output_path = os.path.join(temp_dir, "output_video.mp4")
|
| 451 |
+
# Removed initial_subtitle_path, only need final
|
| 452 |
+
final_ass_path = os.path.join(temp_dir, "captions_final.ass")
|
| 453 |
+
|
| 454 |
+
# --- Handle Video Input ---
|
| 455 |
+
progress(0.05, desc="Saving video...")
|
| 456 |
+
if hasattr(video, 'name') and video.name and os.path.exists(video.name):
|
| 457 |
+
import shutil
|
| 458 |
+
shutil.copy(video.name, video_path)
|
| 459 |
+
logger.info(f"Copied input video from Gradio temp file {video.name} to {video_path}")
|
| 460 |
+
elif isinstance(video, str) and os.path.exists(video):
|
| 461 |
+
import shutil
|
| 462 |
+
shutil.copy(video, video_path)
|
| 463 |
+
logger.info(f"Copied input video from path {video} to {video_path}")
|
| 464 |
+
else:
|
| 465 |
+
raise gr.Error("Could not access uploaded video file. Please try uploading again.")
|
| 466 |
+
|
| 467 |
+
# --- Prepare Styles ---
|
| 468 |
+
progress(0.1, desc="Preparing styles...")
|
| 469 |
+
generated_captions_display_text = ""
|
| 470 |
+
alignment_map = {"Bottom Center": 2, "Bottom Left": 1, "Bottom Right": 3}
|
| 471 |
+
style_options = {
|
| 472 |
+
'font_name': font_name,
|
| 473 |
+
'font_size': font_size,
|
| 474 |
+
'primary_color': primary_color,
|
| 475 |
+
'alignment': alignment_map.get(alignment, 2)
|
| 476 |
+
}
|
| 477 |
+
|
| 478 |
+
# --- Auto-Generate or Process Provided Captions ---
|
| 479 |
+
dialogue_events = [] # To hold {'start': float, 'end': float, 'text': str}
|
| 480 |
+
|
| 481 |
+
if auto_caption:
|
| 482 |
+
logger.info("Auto-generating captions...")
|
| 483 |
+
progress(0.15, desc="Extracting audio...")
|
| 484 |
+
audio_path = os.path.join(temp_dir, "audio.wav")
|
| 485 |
+
success, error_msg = extract_audio(video_path, audio_path)
|
| 486 |
+
if not success: raise gr.Error(f"Audio extraction failed: {error_msg}")
|
| 487 |
+
|
| 488 |
+
progress(0.25, desc="Transcribing audio...")
|
| 489 |
+
transcript = transcribe_audio(audio_path, progress=progress)
|
| 490 |
+
if not transcript or not transcript.get("segments"): raise gr.Error("No speech detected.")
|
| 491 |
+
dialogue_events = transcript["segments"] # Use segments directly
|
| 492 |
+
progress(0.6, desc="Generating ASS captions...")
|
| 493 |
+
|
| 494 |
+
else: # Use provided captions
|
| 495 |
+
logger.info(f"Using provided {caption_type} captions.")
|
| 496 |
+
if not captions or captions.strip() == "": raise gr.Error("Caption input is empty.")
|
| 497 |
+
|
| 498 |
+
progress(0.6, desc=f"Processing {caption_type} captions...")
|
| 499 |
+
if caption_type.lower() == 'ass':
|
| 500 |
+
logger.info("Parsing provided ASS content.")
|
| 501 |
+
dialogue_events = parse_ass_to_dialogue(captions)
|
| 502 |
+
if not dialogue_events:
|
| 503 |
+
raise gr.Error("Could not parse dialogue lines from provided ASS content.")
|
| 504 |
+
elif caption_type.lower() in ['srt', 'vtt']:
|
| 505 |
+
logger.info(f"Parsing provided {caption_type} content.")
|
| 506 |
+
dialogue_events = parse_srt_to_dialogue(captions)
|
| 507 |
+
if not dialogue_events:
|
| 508 |
+
raise gr.Error(f"Could not parse provided {caption_type} content.")
|
| 509 |
+
else:
|
| 510 |
+
raise gr.Error(f"Unsupported caption type: {caption_type}")
|
| 511 |
+
|
| 512 |
+
# --- Generate Final ASS File ---
|
| 513 |
+
if not dialogue_events:
|
| 514 |
+
raise gr.Error("No caption dialogue events found or generated.")
|
| 515 |
+
|
| 516 |
+
logger.info(f"Generating final ASS file with {len(dialogue_events)} events and UI styles.")
|
| 517 |
+
final_ass_content = generate_ass_from_transcript(dialogue_events, style_options)
|
| 518 |
+
generated_captions_display_text = final_ass_content # Show the final generated ASS
|
| 519 |
+
|
| 520 |
+
with open(final_ass_path, 'w', encoding='utf-8') as f:
|
| 521 |
+
f.write(final_ass_content)
|
| 522 |
+
logger.info(f"Written final styled ASS to {final_ass_path}")
|
| 523 |
+
|
| 524 |
+
# Verify file creation
|
| 525 |
+
if not os.path.exists(final_ass_path) or os.path.getsize(final_ass_path) == 0:
|
| 526 |
+
raise gr.Error(f"Internal error: Failed to write final ASS file to {final_ass_path}")
|
| 527 |
+
|
| 528 |
+
# --- Burn Subtitles ---
|
| 529 |
+
progress(0.7, desc="Burning subtitles into video...")
|
| 530 |
+
success, error_msg = run_ffmpeg_with_subtitles(
|
| 531 |
+
video_path, final_ass_path, output_path, style_options
|
| 532 |
+
)
|
| 533 |
+
if not success:
|
| 534 |
+
logger.error(f"Subtitle burning failed. Video: {video_path}, ASS: {final_ass_path}")
|
| 535 |
+
raise gr.Error(f"FFmpeg failed to burn subtitles: {error_msg}")
|
| 536 |
+
|
| 537 |
+
progress(1.0, desc="Processing complete!")
|
| 538 |
+
logger.info(f"Output video generated: {output_path}")
|
| 539 |
+
|
| 540 |
+
return output_path, generated_captions_display_text
|
| 541 |
+
|
| 542 |
+
except Exception as e:
|
| 543 |
+
logger.exception(f"Error in process_video_with_captions")
|
| 544 |
+
if temp_dir and os.path.exists(temp_dir):
|
| 545 |
+
try:
|
| 546 |
+
files = os.listdir(temp_dir)
|
| 547 |
+
logger.error(f"Files in temp dir {temp_dir} during error: {files}")
|
| 548 |
+
except Exception as list_e:
|
| 549 |
+
logger.error(f"Could not list temp dir {temp_dir}: {list_e}")
|
| 550 |
+
if isinstance(e, gr.Error): raise e
|
| 551 |
+
else: raise gr.Error(f"An unexpected error occurred: {str(e)}")
|
| 552 |
+
|
| 553 |
+
# Function to toggle interactivity
|
| 554 |
+
def toggle_captions_input(auto_generate):
|
| 555 |
+
"""Toggle the interactivity of the captions input."""
|
| 556 |
+
return gr.update(interactive=not auto_generate)
|
| 557 |
+
|
| 558 |
+
# --- Gradio Interface ---
|
| 559 |
+
with gr.Blocks(title="Video Caption Generator") as app:
|
| 560 |
+
gr.Markdown("## Video Caption Generator")
|
| 561 |
+
gr.Markdown("Upload a video, choose styling, and add captions. Use auto-generation or provide your own SRT/ASS/VTT.")
|
| 562 |
+
|
| 563 |
+
with gr.Row():
|
| 564 |
+
with gr.Column(scale=1):
|
| 565 |
+
gr.Markdown("**Input & Options**")
|
| 566 |
+
video_input = gr.Video(label="Upload Video")
|
| 567 |
+
auto_caption = gr.Checkbox(label="Auto-generate captions (Overrides below)", value=False)
|
| 568 |
+
captions_input = gr.Textbox(
|
| 569 |
+
label="Or Enter Captions Manually",
|
| 570 |
+
placeholder="1\n00:00:01,000 --> 00:00:05,000\nHello World\n\n2\n...",
|
| 571 |
+
lines=8,
|
| 572 |
+
interactive=True
|
| 573 |
+
)
|
| 574 |
+
caption_type = gr.Dropdown(
|
| 575 |
+
choices=["srt", "ass", "vtt"],
|
| 576 |
+
value="srt",
|
| 577 |
+
label="Format (if providing captions manually)"
|
| 578 |
+
)
|
| 579 |
+
|
| 580 |
+
gr.Markdown("**Caption Styling** (Applied to auto-generated or converted ASS)")
|
| 581 |
+
with gr.Row():
|
| 582 |
+
font_name = gr.Dropdown(
|
| 583 |
+
choices=ACCEPTABLE_FONTS,
|
| 584 |
+
value=ACCEPTABLE_FONTS[0] if ACCEPTABLE_FONTS else "Arial",
|
| 585 |
+
label="Font"
|
| 586 |
+
)
|
| 587 |
+
font_size = gr.Slider(minimum=10, maximum=60, value=24, step=1, label="Font Size")
|
| 588 |
+
with gr.Row():
|
| 589 |
+
primary_color = gr.ColorPicker(value="#FFFFFF", label="Text Color")
|
| 590 |
+
alignment = gr.Dropdown(
|
| 591 |
+
choices=["Bottom Center", "Bottom Left", "Bottom Right"],
|
| 592 |
+
value="Bottom Center",
|
| 593 |
+
label="Alignment"
|
| 594 |
+
)
|
| 595 |
+
|
| 596 |
+
process_btn = gr.Button("Generate Captioned Video", variant="primary")
|
| 597 |
+
|
| 598 |
+
with gr.Column(scale=1):
|
| 599 |
+
gr.Markdown("**Output**")
|
| 600 |
+
video_output = gr.Video(label="Captioned Video")
|
| 601 |
+
generated_captions_output = gr.Textbox(
|
| 602 |
+
label="Generated Captions (ASS format if auto-generated)",
|
| 603 |
+
lines=10,
|
| 604 |
+
interactive=False
|
| 605 |
+
)
|
| 606 |
+
|
| 607 |
+
# Link checkbox to captions input interactivity
|
| 608 |
+
auto_caption.change(
|
| 609 |
+
fn=toggle_captions_input,
|
| 610 |
+
inputs=[auto_caption],
|
| 611 |
+
outputs=[captions_input]
|
| 612 |
+
)
|
| 613 |
+
|
| 614 |
+
# Define the main processing function call for the button
|
| 615 |
+
process_btn.click(
|
| 616 |
+
fn=process_video_with_captions,
|
| 617 |
+
inputs=[
|
| 618 |
+
video_input,
|
| 619 |
+
captions_input,
|
| 620 |
+
caption_type,
|
| 621 |
+
font_name,
|
| 622 |
+
font_size,
|
| 623 |
+
primary_color,
|
| 624 |
+
alignment,
|
| 625 |
+
auto_caption
|
| 626 |
+
],
|
| 627 |
+
outputs=[video_output, generated_captions_output],
|
| 628 |
+
# api_name="generate_captions"
|
| 629 |
+
)
|
| 630 |
+
|
| 631 |
+
# Launch the app
|
| 632 |
+
if __name__ == "__main__":
|
| 633 |
+
app.launch(debug=True, share=False) # Enable debug for local testing
|
requirements.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=3.50.2
|
| 2 |
+
ffmpeg-python>=0.2.0
|
| 3 |
+
opencv-python-headless>=4.8.0
|
| 4 |
+
numpy>=1.22.0
|
| 5 |
+
openai-whisper>=20231117
|
| 6 |
+
tqdm>=4.66.0
|
| 7 |
+
torch>=2.0.0
|
| 8 |
+
transformers>=4.35.0
|
| 9 |
+
pathlib>=1.0.1
|
setup.sh
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Install FFmpeg if not already installed
|
| 4 |
+
if ! command -v ffmpeg &> /dev/null
|
| 5 |
+
then
|
| 6 |
+
echo "FFmpeg not found, installing..."
|
| 7 |
+
apt-get update && apt-get install -y ffmpeg
|
| 8 |
+
else
|
| 9 |
+
echo "FFmpeg is already installed"
|
| 10 |
+
fi
|
| 11 |
+
|
| 12 |
+
# Install FFprobe if not already installed (should come with FFmpeg but checking to be safe)
|
| 13 |
+
if ! command -v ffprobe &> /dev/null
|
| 14 |
+
then
|
| 15 |
+
echo "FFprobe not found, installing..."
|
| 16 |
+
apt-get update && apt-get install -y ffmpeg
|
| 17 |
+
else
|
| 18 |
+
echo "FFprobe is already installed"
|
| 19 |
+
fi
|
| 20 |
+
|
| 21 |
+
# Make sure the script has appropriate permissions in case it needs execution
|
| 22 |
+
chmod -R 755 .
|
| 23 |
+
|
| 24 |
+
echo "Setup complete!"
|