---
title: SRT Processing Tool
emoji: 🎬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
---

# 🎬 SRT Processing Tool

A production-ready web application for processing SRT subtitle files, powered by Gradio and ready for Hugging Face Spaces.

**Resegment and translate your subtitle files easily in your browser!**

## ✨ Features

- **🎤 Audio to SRT**: Transcribe audio files using NVIDIA Parakeet TDT
- **🔄 SRT Resegmentation**: Optimize subtitle segments by character limits, respecting punctuation boundaries
- **🌍 SRT Translation**: Translate subtitle files using AI (OpenAI, Aliyun DashScope, or OpenRouter)
- **⚡ One-Stop Workflow**: Transcribe, resegment, and translate in a single integrated process!
- **🚀 Production Ready**: Optimized for Hugging Face Spaces deployment

## 🚀 Live Demo

**Try it live:** [https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool](https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool)

This app is deployed on Hugging Face Spaces! To deploy your own version:

1. Fork this repository
2. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
3. Create a new Space
4. Connect your GitHub repository
5. Select Gradio as the SDK
6. Set the app file to `app.py`
7. Add your API keys as secrets (see below)
8. Deploy!

## 🔑 API Keys Configuration

For translation features, add your API keys as secrets in Hugging Face Spaces:

1. Go to your Space settings
2. Navigate to "Variables and secrets"
3. Add the following secrets:

### Required Secrets (choose based on provider):

- **Aliyun DashScope**: `DASHSCOPE_API_KEY`
- **OpenAI**: `OPENAI_API_KEY`
- **OpenRouter**: `OPENROUTER_API_KEY`

### Optional Secrets (for OpenRouter attribution):

- `OPENROUTER_SITE_URL` (maps to `HTTP-Referer`)
- `OPENROUTER_APP_TITLE` (maps to `X-Title`)

## 📦 Local Installation

```bash
# Clone the repository
git clone https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool
cd SRT-Processing-Tool

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

## 🏃 Local Run

```bash
python app.py
```

The app will be available at `http://localhost:7860`

## 📖 Usage

1. Open the app in your browser
2. Select Input Type: **SRT File** or **Audio File**
3. Upload your file
4. Choose operation:
   - **Transcribe only** (Audio only): Just transcribe audio to SRT
   - **Translate only**: Translate subtitles to target language
   - **Resegment only**: Optimize subtitle segments by character limits
5. Configure settings:
   - **Translation Settings**: Target language, provider, model, workers
   - **Resegmentation Settings**: Maximum characters per segment
6. Click "🚀 Process File"
7. Download your processed file!

## 🔧 Configuration

### ASR Model
- **NVIDIA Parakeet TDT**: `nvidia/parakeet-tdt-0.6b-v3` (default)

### Default Models

- **OpenAI**: `gpt-4.1` (uses Responses API)
- **Aliyun DashScope**: `qwen-max`
- **OpenRouter**: `openai/gpt-4o`

### Environment Variables

You can also use a `.env` file for local development:

```env
# Aliyun DashScope
DASHSCOPE_API_KEY=your_key_here

# OpenAI
OPENAI_API_KEY=your_key_here

# OpenRouter
OPENROUTER_API_KEY=your_key_here
OPENROUTER_SITE_URL=https://your-site.com
OPENROUTER_APP_TITLE=Your App Title

# Optional: override model for all providers
MODEL=your_model_name
```

## 💻 CLI Usage

You can also use the SRT processor from the command line:

```bash
# Resegment only
python tools/srt_processor.py input.srt output.srt --operation resegment --max-chars 125

# Translate (OpenAI)
python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openai --model gpt-4.1 --workers 5

# Translate (OpenRouter)
python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openrouter --model openai/gpt-4o --workers 5

# Translate (DashScope)
python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider dashscope --model qwen-max --workers 5
```

## 🏗️ Project Structure

```
.
├── app.py                 # Main Gradio application
├── tools/
│   ├── __init__.py
│   ├── srt_processor.py   # Core SRT processing logic
│   └── audio_transcriber.py # Audio transcription (NeMo ASR)
├── requirements.txt       # Python dependencies
└── README.md             # This file
```

## 📝 License

MIT License

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

---

**Made with ❤️ for subtitle processing**