Spaces:
Sleeping
Sleeping
| title: SRT Processing Tool | |
| emoji: π¬ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.5.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # π¬ SRT Processing Tool | |
| A production-ready web application for processing SRT subtitle files, powered by Gradio and ready for Hugging Face Spaces. | |
| **Resegment and translate your subtitle files easily in your browser!** | |
| ## β¨ Features | |
| - **π€ Audio to SRT**: Transcribe audio files using NVIDIA Parakeet TDT | |
| - **π SRT Resegmentation**: Optimize subtitle segments by character limits, respecting punctuation boundaries | |
| - **π SRT Translation**: Translate subtitle files using AI (OpenAI, Aliyun DashScope, or OpenRouter) | |
| - **β‘ One-Stop Workflow**: Transcribe, resegment, and translate in a single integrated process! | |
| - **π Production Ready**: Optimized for Hugging Face Spaces deployment | |
| ## π Live Demo | |
| **Try it live:** [https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool](https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool) | |
| This app is deployed on Hugging Face Spaces! To deploy your own version: | |
| 1. Fork this repository | |
| 2. Go to [Hugging Face Spaces](https://huggingface.co/spaces) | |
| 3. Create a new Space | |
| 4. Connect your GitHub repository | |
| 5. Select Gradio as the SDK | |
| 6. Set the app file to `app.py` | |
| 7. Add your API keys as secrets (see below) | |
| 8. Deploy! | |
| ## π API Keys Configuration | |
| For translation features, add your API keys as secrets in Hugging Face Spaces: | |
| 1. Go to your Space settings | |
| 2. Navigate to "Variables and secrets" | |
| 3. Add the following secrets: | |
| ### Required Secrets (choose based on provider): | |
| - **Aliyun DashScope**: `DASHSCOPE_API_KEY` | |
| - **OpenAI**: `OPENAI_API_KEY` | |
| - **OpenRouter**: `OPENROUTER_API_KEY` | |
| ### Optional Secrets (for OpenRouter attribution): | |
| - `OPENROUTER_SITE_URL` (maps to `HTTP-Referer`) | |
| - `OPENROUTER_APP_TITLE` (maps to `X-Title`) | |
| ## π¦ Local Installation | |
| ```bash | |
| # Clone the repository | |
| git clone https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool | |
| cd SRT-Processing-Tool | |
| # Create virtual environment | |
| python -m venv venv | |
| source venv/bin/activate # On Windows: venv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ## π Local Run | |
| ```bash | |
| python app.py | |
| ``` | |
| The app will be available at `http://localhost:7860` | |
| ## π Usage | |
| 1. Open the app in your browser | |
| 2. Select Input Type: **SRT File** or **Audio File** | |
| 3. Upload your file | |
| 4. Choose operation: | |
| - **Transcribe only** (Audio only): Just transcribe audio to SRT | |
| - **Translate only**: Translate subtitles to target language | |
| - **Resegment only**: Optimize subtitle segments by character limits | |
| 5. Configure settings: | |
| - **Translation Settings**: Target language, provider, model, workers | |
| - **Resegmentation Settings**: Maximum characters per segment | |
| 6. Click "π Process File" | |
| 7. Download your processed file! | |
| ## π§ Configuration | |
| ### ASR Model | |
| - **NVIDIA Parakeet TDT**: `nvidia/parakeet-tdt-0.6b-v3` (default) | |
| ### Default Models | |
| - **OpenAI**: `gpt-4.1` (uses Responses API) | |
| - **Aliyun DashScope**: `qwen-max` | |
| - **OpenRouter**: `openai/gpt-4o` | |
| ### Environment Variables | |
| You can also use a `.env` file for local development: | |
| ```env | |
| # Aliyun DashScope | |
| DASHSCOPE_API_KEY=your_key_here | |
| # OpenAI | |
| OPENAI_API_KEY=your_key_here | |
| # OpenRouter | |
| OPENROUTER_API_KEY=your_key_here | |
| OPENROUTER_SITE_URL=https://your-site.com | |
| OPENROUTER_APP_TITLE=Your App Title | |
| # Optional: override model for all providers | |
| MODEL=your_model_name | |
| ``` | |
| ## π» CLI Usage | |
| You can also use the SRT processor from the command line: | |
| ```bash | |
| # Resegment only | |
| python tools/srt_processor.py input.srt output.srt --operation resegment --max-chars 125 | |
| # Translate (OpenAI) | |
| python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openai --model gpt-4.1 --workers 5 | |
| # Translate (OpenRouter) | |
| python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openrouter --model openai/gpt-4o --workers 5 | |
| # Translate (DashScope) | |
| python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider dashscope --model qwen-max --workers 5 | |
| ``` | |
| ## ποΈ Project Structure | |
| ``` | |
| . | |
| βββ app.py # Main Gradio application | |
| βββ tools/ | |
| β βββ __init__.py | |
| β βββ srt_processor.py # Core SRT processing logic | |
| β βββ audio_transcriber.py # Audio transcription (NeMo ASR) | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| ``` | |
| ## π License | |
| MIT License | |
| ## π€ Contributing | |
| Contributions are welcome! Please feel free to submit a Pull Request. | |
| --- | |
| **Made with β€οΈ for subtitle processing** |