🎬 视频字幕转录和翻译 / Transcribe and Translate Subtitles

一个强大的、隐私优先的视频字幕转录和翻译工具
A powerful, privacy-first tool for transcribing and translating video subtitles

🔒 隐私保证 / Privacy Guarantee

🚨 所有处理完全离线运行 / All processing runs completely offline

无需互联网连接，确保最大程度的隐私和数据安全

No internet connection required, ensuring maximum privacy and data security.

Github

🚀 快速入门 / Quick Start

环境准备 / Prerequisites

# 安装 FFmpeg / Install FFmpeg
conda install ffmpeg

pip install -r requirements.txt

# 安装 Python 依赖 / Install Python dependencies
# 请根据您的硬件平台安装正确的包 / Please according to your hardware platform install the right package
# ----------------------------------------
# For CPU only
# onnxruntime>=1.23.2
# ----------------------------------------
# For Linux + AMD
# 请先按照 URL 设置 ROCm / Please follow the URL to set up the ROCm first before pip install onnxruntime-rocm
# https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-onnx.html
# https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html
# onnxruntime>=1.23.2
# onnxruntime-rocm>=1.23.0
# ----------------------------------------
# For Windows + (Intel or AMD)
# onnxruntime>=1.23.2
# onnxruntime-directml>=1.23.0
# ----------------------------------------
# For Intel OpenVINO CPU & GPU & NPU
# onnxruntime>=1.23.2
# onnxruntime-openvino>=1.23.0
# ----------------------------------------
# For NVIDIA-CUDA
# onnxruntime>=1.23.2
# onnxruntime-gpu>=1.23.2
# ----------------------------------------

设置

下载模型: 从 HuggingFace 获取所需模型，只下载您想要的模型并保持文件夹路径与当前定义相同，无需全部下载。
下载脚本: 将 run.py 放置在您的 Transcribe_and_Translate_Subtitles 文件夹中
添加媒体: 将您的音视频放置在 Transcribe_and_Translate_Subtitles/Media/ 目录下
运行: 务必在Transcribe_and_Translate_Subtitles目录下执行 python run.py 并打开 Web 界面

Setup

Download Models: Get the required models from HuggingFace. Download only the models you want and keep the folder path the same as currently defined, no need to download all.
Download Script: Place run.py in your Transcribe_and_Translate_Subtitles folder
Add Media: Place your audios/videos in Transcribe_and_Translate_Subtitles/Media/
Run: You must execute python run.py in the Transcribe_and_Translate_Subtitles folder and open the web interface

结果 / Results

在以下位置找到您处理后的字幕 / Find your processed subtitles in:

Transcribe_and_Translate_Subtitles/Results/Subtitles/

准备好开始了吗？/ Ready to get started? 🎉

✨ 功能特性 / Features

🔇 降噪模型 / Noise Reduction Models

🎤 语音活动检测 (VAD) / Voice Activity Detection (VAD)

Faster-Whisper-Silero
Official-Silero-v6
HumAware
NVIDIA-NeMo-VAD-v2.0
TEN-VAD
Pyannote-Segmentation-3.0
- 注意：您需要接受 Pyannote 的使用条款并下载 Pyannote 的 pytorch_model.bin 文件。将其放置在 VAD/pyannote_segmentation 文件夹中。
- Note: You need to accept Pyannote's terms of use and download the Pyannote pytorch_model.bin file. Place it in the VAD/pyannote_segmentation folder.

🗣️ 语音识别 (ASR) / Speech Recognition (ASR)

多语言模型 / Multilingual Models

Fun-ASR-Nano-2512-Multilingual
Fun-ASR-MLT-Nano-2512-Multilingual
SenseVoice-Small-Multilingual
Dolphin-Small-Asian 亚洲语言
Paraformer-Large-Chinese 中文
Paraformer-Large-English 英语
FireRedASR-AED-L Chinese 中文
Official-Whisper-Large-v3-Multilingual
Official-Whisper-Large-v3-Turbo-Multilingual
阿拉伯语 / Arabic
巴斯克语 / Basque
粤语 / Cantonese-Yue
中文 / Chinese
台湾客家话 / Chinese-Hakka
台湾闽南语 / Chinese-Minnan
台湾华语 / Chinese-Taiwan
CrisperWhisper-Multilingual
丹麦语 / Danish
印度英语 / English-Indian
英语 v3.5 / Engish-v3.5
法语 / French
瑞士德语 / German-Swiss
德语 / German
希腊语 / Greek
意大利语 / Italian
日语-动漫 / Japanese-Anime
日语 / Japanese
韩语 / Korean
马来语 / Malaysian
波斯语 / Persian
波兰语 / Polish
葡萄牙语 / Portuguese
俄语 / Russian
塞尔维亚语 / Serbian
西班牙语 / Spanish
泰语 / Thai
土耳其语 / Turkish
乌尔都语 / Urdu
越南语 / Vietnamese

🤖 翻译模型 (LLM) / Translation Models (LLM)

🖥️ 硬件支持 / Hardware Support

💻 中央處理器 (CPU)	🎮 圖形處理器 (GPU)	🧠 神經網路處理單元 (NPU)
Apple Silicon AMD ARM Intel	Apple CoreML AMD ROCm Intel OpenVINO NVIDIA CUDA Windows DirectML	Apple CoreML AMD Ryzen-VitisAI Intel OpenVINO

📊 性能基准测试 / Performance Benchmarks

测试条件 / Test Conditions： Ubuntu 24.04, Intel i3-12300, 7602 秒视频

操作系统 (OS)	后端 (Backend)	降噪器 (Denoiser)	VAD	语音识别 (ASR)	大语言模型 (LLM)	实时率 (Real-Time Factor)
Ubuntu-24.04	CPU i3-12300	-	Silero	SenseVoiceSmall	-	0.08
Ubuntu-24.04	CPU i3-12300	GTCRN	Silero	SenseVoiceSmall	Qwen2.5-7B-Instruct	0.50
Ubuntu-24.04	CPU i3-12300	GTCRN	FSMN	SenseVoiceSmall	-	0.054
Ubuntu-24.04	CPU i3-12300	ZipEnhancer	FSMN	SenseVoiceSmall	-	0.39
Ubuntu-24.04	CPU i3-12300	GTCRN	Silero	Whisper-Large-V3	-	0.20
Ubuntu-24.04	CPU i3-12300	GTCRN	FSMN	Whisper-Large-V3-Turbo	-	0.148

🛠️ 问题排查 / Troubleshooting

常见问题 / Common Issues

Silero VAD 错误 / Silero VAD Error: 首次运行时只需重启应用程序 / Simply restart the application on first run
libc++ 错误 (Linux) / libc++ Error (Linux):
```
sudo apt update
sudo apt install libc++1
```
苹果芯片 / Apple Silicon: 请避免安装 onnxruntime-openvino，因为它会导致错误 / Avoid installing onnxruntime-openvino as it will cause errors

📋 更新历史 / Update History

🆕 2026/1/4 - 更新 / Release

✅ 新增 ASR / Added ASR:
- FunASR-Nano-2512
- FunASR-Nano-MLT-2512
✅ 更新 LLM / update LLM:
- 更新 Hunyuan-MT-1.5-1.8B-Abliterated
- 更新 Hunyuan-MT-1.5-7B-Abliterated
✅ 性能改进 / Performance Improvements:
- 改善 SenseVoice & Paraformer 长音频的准确度
- 改善 Nvidia_VAD, Ten_VAD, HumAware_VAD 音频切割准确度
- 修复 LLM 在翻译时偶尔输出乱码文字
- Improve the long audio accuracy of SenseVoice and Paraformer
- Improve the accuracy of Nvidia_VAD, Ten_VAD, and HumAwareVAD audio segmentation
- Fix LLM occasionally outputting garbled text during translation

🆕 2025/9/19 - 重大更新 / Major Release

✅ 新增 ASR / Added ASR:
- 28 个地区微调的 Whisper 模型
- 28 region fine-tuned Whisper models
✅ 新增降噪器 / Added Denoiser: MossFormer2_SE_48K
✅ 新增 LLM 模型 / Added LLM Models:
- Qwen3-4B-Instruct-2507-abliterated
- Qwen3-8B-abliterated-v2
- Hunyuan-MT-7B-abliterated
- Seed-X-PRO-7B
✅ 性能改进 / Performance Improvements:
- 为类 Whisper 的 ASR 模型应用了束搜索（Beam Search）和重复惩罚（Repeat Penalty）
- 应用 ONNX Runtime IOBinding 实现最大加速（比常规 ort_session.run() 快 10%以上）
- 支持单次推理处理 20 秒的音频片段
- 改进了多线程性能
- Applied Beam Search & Repeat Penalty for Whisper-like ASR models
- Applied ONNX Runtime IOBinding for maximum speed up (10%+ faster than normal ort_session.run())
- Support for 20 seconds audio segment per single run inference
- Improved multi-threads performance
✅ 硬件支持扩展 / Hardware Support Expansion:
- AMD-ROCm 执行提供程序 / Execution Provider
- AMD-MIGraphX 执行提供程序 / Execution Provider
- NVIDIA TensorRTX 执行提供程序 / Execution Provider
- (必须先配置环境，否则无法工作 / Must config the env first or it will not work)
✅ 准确性改进 / Accuracy Improvements:
- SenseVoice
- Paraformer
- FireRedASR
- Dolphin
- ZipEnhancer
- MossFormerGAN_SE_16K
- NVIDIA-NeMo-VAD
✅ 速度改进 / Speed Improvements:
- MelBandRoformer (通过转换为单声道提升速度 / speed boost by converting to mono channel)
❌ 移除的模型 / Removed Models:
- FSMN-VAD
- Qwen3-4B-Official
- Qwen3-8B-Official
- Gemma3-4B-it
- Gemma3-12B-it
- InternLM3
- Phi-4-Instruct

2025/7/5 - 降噪增强 / Noise Reduction Enhancement

✅ 新增降噪模型 / Added noise reduction model: MossFormerGAN_SE_16K

2025/6/11 - VAD 模型扩展 / VAD Models Expansion

✅ 新增 VAD 模型 / Added VAD Models:
- HumAware-VAD
- NVIDIA-NeMo-VAD
- TEN-VAD

2025/6/3 - 亚洲语言支持 / Asian Language Support

✅ 新增 Dolphin ASR 模型以支持亚洲语言 / Added Dolphin ASR model to support Asian languages

2025/5/13 - GPU 加速 / GPU Acceleration

✅ 新增 Float16/32 ASR 模型以支持 CUDA/DirectML GPU / Added Float16/32 ASR models to support CUDA/DirectML GPU usage
✅ GPU 性能 / GPU Performance: 这些模型可以实现超过 99% 的 GPU 算子部署 / These models can achieve >99% GPU operator deployment

2025/5/9 - 主要功能发布 / Major Feature Release

✅ 灵活性改进 / Flexibility Improvements:
- 新增不使用 VAD（语音活动检测）的选项 / Added option to not use VAD (Voice Activity Detection)
✅ 新增模型 / Added Models:
- 降噪 / Noise reduction: MelBandRoformer
- ASR: CrisperWhisper
- ASR: Whisper-Large-v3.5-Distil (英语微调 / English fine-tuned)
- ASR: FireRedASR-AED-L (支持中文及方言 / Chinese + dialects support)
- 三个日语动漫微调的 Whisper 模型 / Three Japanese anime fine-tuned Whisper models
✅ 性能优化 / Performance Optimizations:
- 移除 IPEX-LLM 框架以提升整体性能 / Removed IPEX-LLM framework to enhance overall performance
- 取消 LLM 量化选项，统一使用 Q4F32 格式 / Cancelled LLM quantization options, standardized on Q4F32 format
- Whisper 系列推理速度提升 10% 以上 / Improved Whisper series inference speed by over 10%
✅ 准确性改进 / Accuracy Improvements:
- 提升 FSMN-VAD 准确率 / Improved FSMN-VAD accuracy
- 提升 Paraformer 识别准确率 / Improved Paraformer recognition accuracy
- 提升 SenseVoice 识别准确率 / Improved SenseVoice recognition accuracy
✅ LLM 支持 ONNX Runtime 100% GPU 算子部署 / LLM Support with ONNX Runtime 100% GPU operator deployment:
- Qwen3-4B/8B
- InternLM3-8B
- Phi-4-mini-Instruct
- Gemma3-4B/12B-it
✅ 硬件支持扩展 / Hardware Support Expansion:
- Intel OpenVINO
- NVIDIA CUDA GPU
- Windows DirectML GPU (支持集成显卡和独立显卡 / supports integrated and discrete GPUs)

🗺️ 路线图 / Roadmap

Beam Search for LLM
分离说话人 / Speaker separation
声纹识别 / voiceprint recognition
视频超分 / Video Upscaling - 提升分辨率 / Enhance resolution
实时播放器 / Real-time Player - 实时转录和翻译 / Live transcription and translation

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support