Spaces:

BiliSakura
/

SRT-Processing-Tool

Sleeping

App Files Files Community

SRT-Processing-Tool / README.md

BiliSakura

Update README.md

572653d verified 3 months ago

preview code

raw

history blame contribute delete

4.73 kB

	---
	title: SRT Processing Tool
	emoji: 🎬
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 6.5.1
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🎬 SRT Processing Tool

	A production-ready web application for processing SRT subtitle files, powered by Gradio and ready for Hugging Face Spaces.

	Resegment and translate your subtitle files easily in your browser!

	## ✨ Features

	- 🎤 Audio to SRT: Transcribe audio files using NVIDIA Parakeet TDT
	- 🔄 SRT Resegmentation: Optimize subtitle segments by character limits, respecting punctuation boundaries
	- 🌍 SRT Translation: Translate subtitle files using AI (OpenAI, Aliyun DashScope, or OpenRouter)
	- ⚡ One-Stop Workflow: Transcribe, resegment, and translate in a single integrated process!
	- 🚀 Production Ready: Optimized for Hugging Face Spaces deployment

	## 🚀 Live Demo

	Try it live: [https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool](https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool)

	This app is deployed on Hugging Face Spaces! To deploy your own version:

	1. Fork this repository
	2. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
	3. Create a new Space
	4. Connect your GitHub repository
	5. Select Gradio as the SDK
	6. Set the app file to `app.py`
	7. Add your API keys as secrets (see below)
	8. Deploy!

	## 🔑 API Keys Configuration

	For translation features, add your API keys as secrets in Hugging Face Spaces:

	1. Go to your Space settings
	2. Navigate to "Variables and secrets"
	3. Add the following secrets:

	### Required Secrets (choose based on provider):

	- Aliyun DashScope: `DASHSCOPE_API_KEY`
	- OpenAI: `OPENAI_API_KEY`
	- OpenRouter: `OPENROUTER_API_KEY`

	### Optional Secrets (for OpenRouter attribution):

	- `OPENROUTER_SITE_URL` (maps to `HTTP-Referer`)
	- `OPENROUTER_APP_TITLE` (maps to `X-Title`)

	## 📦 Local Installation

	```bash
	# Clone the repository
	git clone https://huggingface.co/spaces/BiliSakura/SRT-Processing-Tool
	cd SRT-Processing-Tool

	# Create virtual environment
	python -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt
	```

	## 🏃 Local Run

	```bash
	python app.py
	```

	The app will be available at `http://localhost:7860`

	## 📖 Usage

	1. Open the app in your browser
	2. Select Input Type: SRT File or Audio File
	3. Upload your file
	4. Choose operation:
	- Transcribe only (Audio only): Just transcribe audio to SRT
	- Translate only: Translate subtitles to target language
	- Resegment only: Optimize subtitle segments by character limits
	5. Configure settings:
	- Translation Settings: Target language, provider, model, workers
	- Resegmentation Settings: Maximum characters per segment
	6. Click "🚀 Process File"
	7. Download your processed file!

	## 🔧 Configuration

	### ASR Model
	- NVIDIA Parakeet TDT: `nvidia/parakeet-tdt-0.6b-v3` (default)

	### Default Models

	- OpenAI: `gpt-4.1` (uses Responses API)
	- Aliyun DashScope: `qwen-max`
	- OpenRouter: `openai/gpt-4o`

	### Environment Variables

	You can also use a `.env` file for local development:

	```env
	# Aliyun DashScope
	DASHSCOPE_API_KEY=your_key_here

	# OpenAI
	OPENAI_API_KEY=your_key_here

	# OpenRouter
	OPENROUTER_API_KEY=your_key_here
	OPENROUTER_SITE_URL=https://your-site.com
	OPENROUTER_APP_TITLE=Your App Title

	# Optional: override model for all providers
	MODEL=your_model_name
	```

	## 💻 CLI Usage

	You can also use the SRT processor from the command line:

	```bash
	# Resegment only
	python tools/srt_processor.py input.srt output.srt --operation resegment --max-chars 125

	# Translate (OpenAI)
	python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openai --model gpt-4.1 --workers 5

	# Translate (OpenRouter)
	python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider openrouter --model openai/gpt-4o --workers 5

	# Translate (DashScope)
	python tools/srt_processor.py input.srt output.srt --operation translate --target-lang zh --provider dashscope --model qwen-max --workers 5
	```

	## 🏗️ Project Structure

	```
	.
	├── app.py # Main Gradio application
	├── tools/
	│ ├── __init__.py
	│ ├── srt_processor.py # Core SRT processing logic
	│ └── audio_transcriber.py # Audio transcription (NeMo ASR)
	├── requirements.txt # Python dependencies
	└── README.md # This file
	```

	## 📝 License

	MIT License

	## 🤝 Contributing

	Contributions are welcome! Please feel free to submit a Pull Request.

	---

	Made with ❤️ for subtitle processing