SRT-Processing-Tool / README.md
BiliSakura's picture
Update README.md
572653d verified
---
title: SRT Processing Tool
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
---
# 🎬 SRT Processing Tool
A production-ready web application for processing SRT subtitle files, powered by Gradio and ready for Hugging Face Spaces.
**Resegment and translate your subtitle files easily in your browser!**
## ✨ Features
- **🎀 Audio to SRT**: Transcribe audio files using NVIDIA Parakeet TDT
- **πŸ”„ SRT Resegmentation**: Optimize subtitle segments by character limits, respecting punctuation boundaries
- **🌍 SRT Translation**: Translate subtitle files using AI (OpenAI, Aliyun DashScope, or OpenRouter)
- **⚑ One-Stop Workflow**: Transcribe, resegment, and translate in a single integrated process!
- **πŸš€ Production Ready**: Optimized for Hugging Face Spaces deployment
## πŸš€ Live Demo
**Try it live:** [https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool](https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool)
This app is deployed on Hugging Face Spaces! To deploy your own version:
1. Fork this repository
2. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
3. Create a new Space
4. Connect your GitHub repository
5. Select Gradio as the SDK
6. Set the app file to `app.py`
7. Add your API keys as secrets (see below)
8. Deploy!
## πŸ”‘ API Keys Configuration
For translation features, add your API keys as secrets in Hugging Face Spaces:
1. Go to your Space settings
2. Navigate to "Variables and secrets"
3. Add the following secrets:
### Required Secrets (choose based on provider):
- **Aliyun DashScope**: `DASHSCOPE_API_KEY`
- **OpenAI**: `OPENAI_API_KEY`
- **OpenRouter**: `OPENROUTER_API_KEY`
### Optional Secrets (for OpenRouter attribution):
- `OPENROUTER_SITE_URL` (maps to `HTTP-Referer`)
- `OPENROUTER_APP_TITLE` (maps to `X-Title`)
## πŸ“¦ Local Installation
```bash
# Clone the repository
git clone https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool
cd SRT-Processing-Tool
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
```
## πŸƒ Local Run
```bash
python app.py
```
The app will be available at `http://localhost:7860`
## πŸ“– Usage
1. Open the app in your browser
2. Select Input Type: **SRT File** or **Audio File**
3. Upload your file
4. Choose operation:
- **Transcribe only** (Audio only): Just transcribe audio to SRT
- **Translate only**: Translate subtitles to target language
- **Resegment only**: Optimize subtitle segments by character limits
5. Configure settings:
- **Translation Settings**: Target language, provider, model, workers
- **Resegmentation Settings**: Maximum characters per segment
6. Click "πŸš€ Process File"
7. Download your processed file!
## πŸ”§ Configuration
### ASR Model
- **NVIDIA Parakeet TDT**: `nvidia/parakeet-tdt-0.6b-v3` (default)
### Default Models
- **OpenAI**: `gpt-4.1` (uses Responses API)
- **Aliyun DashScope**: `qwen-max`
- **OpenRouter**: `openai/gpt-4o`
### Environment Variables
You can also use a `.env` file for local development:
```env
# Aliyun DashScope
DASHSCOPE_API_KEY=your_key_here
# OpenAI
OPENAI_API_KEY=your_key_here
# OpenRouter
OPENROUTER_API_KEY=your_key_here
OPENROUTER_SITE_URL=https://your-site.com
OPENROUTER_APP_TITLE=Your App Title
# Optional: override model for all providers
MODEL=your_model_name
```
## πŸ’» CLI Usage
You can also use the SRT processor from the command line:
```bash
# Resegment only
python tools/srt_processor.py input.srt output.srt --operation resegment --max-chars 125
# Translate (OpenAI)
python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openai --model gpt-4.1 --workers 5
# Translate (OpenRouter)
python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openrouter --model openai/gpt-4o --workers 5
# Translate (DashScope)
python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider dashscope --model qwen-max --workers 5
```
## πŸ—οΈ Project Structure
```
.
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ tools/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ srt_processor.py # Core SRT processing logic
β”‚ └── audio_transcriber.py # Audio transcription (NeMo ASR)
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md # This file
```
## πŸ“ License
MIT License
## 🀝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
---
**Made with ❀️ for subtitle processing**