YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
🎬 视频字幕转录和翻译 / Transcribe and Translate Subtitles
一个强大的、隐私优先的视频字幕转录和翻译工具
A powerful, privacy-first tool for transcribing and translating video subtitles
🔒 隐私保证 / Privacy Guarantee
🚨 所有处理完全离线运行 / All processing runs completely offline
- 无需互联网连接,确保最大程度的隐私和数据安全
- No internet connection required, ensuring maximum privacy and data security.
- Github
🚀 快速入门 / Quick Start
环境准备 / Prerequisites
# 安装 FFmpeg / Install FFmpeg
conda install ffmpeg
pip install -r requirements.txt
# 安装 Python 依赖 / Install Python dependencies
# 请根据您的硬件平台安装正确的包 / Please according to your hardware platform install the right package
# ----------------------------------------
# For CPU only
# onnxruntime>=1.23.2
# ----------------------------------------
# For Linux + AMD
# 请先按照 URL 设置 ROCm / Please follow the URL to set up the ROCm first before pip install onnxruntime-rocm
# https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-onnx.html
# https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html
# onnxruntime>=1.23.2
# onnxruntime-rocm>=1.23.0
# ----------------------------------------
# For Windows + (Intel or AMD)
# onnxruntime>=1.23.2
# onnxruntime-directml>=1.23.0
# ----------------------------------------
# For Intel OpenVINO CPU & GPU & NPU
# onnxruntime>=1.23.2
# onnxruntime-openvino>=1.23.0
# ----------------------------------------
# For NVIDIA-CUDA
# onnxruntime>=1.23.2
# onnxruntime-gpu>=1.23.2
# ----------------------------------------
设置
- 下载模型: 从 HuggingFace 获取所需模型 ,只下载您想要的模型并保持文件夹路径与当前定义相同,无需全部下载。
- 下载脚本: 将
run.py放置在您的Transcribe_and_Translate_Subtitles文件夹中 - 添加媒体: 将您的音视频放置在
Transcribe_and_Translate_Subtitles/Media/目录下 - 运行: 务必在
Transcribe_and_Translate_Subtitles目录下执行python run.py并打开 Web 界面
Setup
- Download Models: Get the required models from HuggingFace. Download only the models you want and keep the folder path the same as currently defined, no need to download all.
- Download Script: Place
run.pyin yourTranscribe_and_Translate_Subtitlesfolder - Add Media: Place your audios/videos in
Transcribe_and_Translate_Subtitles/Media/ - Run: You must execute
python run.pyin theTranscribe_and_Translate_Subtitlesfolder and open the web interface
结果 / Results
在以下位置找到您处理后的字幕 / Find your processed subtitles in:
Transcribe_and_Translate_Subtitles/Results/Subtitles/
准备好开始了吗?/ Ready to get started? 🎉
✨ 功能特性 / Features
🔇 降噪模型 / Noise Reduction Models
🎤 语音活动检测 (VAD) / Voice Activity Detection (VAD)
- Faster-Whisper-Silero
- Official-Silero-v6
- HumAware
- NVIDIA-NeMo-VAD-v2.0
- TEN-VAD
- Pyannote-Segmentation-3.0
- 注意:您需要接受 Pyannote 的使用条款并下载 Pyannote 的
pytorch_model.bin文件。将其放置在VAD/pyannote_segmentation文件夹中。 - Note: You need to accept Pyannote's terms of use and download the Pyannote
pytorch_model.binfile. Place it in theVAD/pyannote_segmentationfolder.
- 注意:您需要接受 Pyannote 的使用条款并下载 Pyannote 的
🗣️ 语音识别 (ASR) / Speech Recognition (ASR)
多语言模型 / Multilingual Models
- Fun-ASR-Nano-2512-Multilingual
- Fun-ASR-MLT-Nano-2512-Multilingual
- SenseVoice-Small-Multilingual
- Dolphin-Small-Asian 亚洲语言
- Paraformer-Large-Chinese 中文
- Paraformer-Large-English 英语
- FireRedASR-AED-L Chinese 中文
- Official-Whisper-Large-v3-Multilingual
- Official-Whisper-Large-v3-Turbo-Multilingual
- 阿拉伯语 / Arabic
- 巴斯克语 / Basque
- 粤语 / Cantonese-Yue
- 中文 / Chinese
- 台湾客家话 / Chinese-Hakka
- 台湾闽南语 / Chinese-Minnan
- 台湾华语 / Chinese-Taiwan
- CrisperWhisper-Multilingual
- 丹麦语 / Danish
- 印度英语 / English-Indian
- 英语 v3.5 / Engish-v3.5
- 法语 / French
- 瑞士德语 / German-Swiss
- 德语 / German
- 希腊语 / Greek
- 意大利语 / Italian
- 日语-动漫 / Japanese-Anime
- 日语 / Japanese
- 韩语 / Korean
- 马来语 / Malaysian
- 波斯语 / Persian
- 波兰语 / Polish
- 葡萄牙语 / Portuguese
- 俄语 / Russian
- 塞尔维亚语 / Serbian
- 西班牙语 / Spanish
- 泰语 / Thai
- 土耳其语 / Turkish
- 乌尔都语 / Urdu
- 越南语 / Vietnamese
🤖 翻译模型 (LLM) / Translation Models (LLM)
🖥️ 硬件支持 / Hardware Support
| 💻 中央處理器 (CPU) | 🎮 圖形處理器 (GPU) | 🧠 神經網路處理單元 (NPU) |
|
|
|
📊 性能基准测试 / Performance Benchmarks
测试条件 / Test Conditions: Ubuntu 24.04, Intel i3-12300, 7602 秒视频
| 操作系统 (OS) | 后端 (Backend) | 降噪器 (Denoiser) | VAD | 语音识别 (ASR) | 大语言模型 (LLM) | 实时率 (Real-Time Factor) |
|---|---|---|---|---|---|---|
| Ubuntu-24.04 | CPU i3-12300 | - | Silero | SenseVoiceSmall | - | 0.08 |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | Silero | SenseVoiceSmall | Qwen2.5-7B-Instruct | 0.50 |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | FSMN | SenseVoiceSmall | - | 0.054 |
| Ubuntu-24.04 | CPU i3-12300 | ZipEnhancer | FSMN | SenseVoiceSmall | - | 0.39 |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | Silero | Whisper-Large-V3 | - | 0.20 |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | FSMN | Whisper-Large-V3-Turbo | - | 0.148 |
🛠️ 问题排查 / Troubleshooting
常见问题 / Common Issues
- Silero VAD 错误 / Silero VAD Error: 首次运行时只需重启应用程序 / Simply restart the application on first run
- libc++ 错误 (Linux) / libc++ Error (Linux):
sudo apt update sudo apt install libc++1 - 苹果芯片 / Apple Silicon: 请避免安装
onnxruntime-openvino,因为它会导致错误 / Avoid installingonnxruntime-openvinoas it will cause errors
📋 更新历史 / Update History
🆕 2026/1/04 - 更新 / Release
- ✅ 新增 ASR / Added ASR:
- FunASR-Nano-2512
- FunASR-Nano-MLT-2512
- ✅ 性能改进 / Performance Improvements:
- 改善 SenseVoice & Paraformer 长音频的准确度
- 改善 Nvidia_VAD, Ten_VAD, HumAware_VAD 音频切割准确度
- 修复 LLM 在翻译时偶尔输出乱码文字
- Improve the long audio accuracy of SenseVoice and Paraformer
- Improve the accuracy of Nvidia_VAD, Ten_VAD, and HumAwareVAD audio segmentation
- Fix LLM occasionally outputting garbled text during translation
🆕 2025/9/19 - 重大更新 / Major Release
- ✅ 新增 ASR / Added ASR:
- 28 个地区微调的 Whisper 模型
- 28 region fine-tuned Whisper models
- ✅ 新增降噪器 / Added Denoiser: MossFormer2_SE_48K
- ✅ 新增 LLM 模型 / Added LLM Models:
- Qwen3-4B-Instruct-2507-abliterated
- Qwen3-8B-abliterated-v2
- Hunyuan-MT-7B-abliterated
- Seed-X-PRO-7B
- ✅ 性能改进 / Performance Improvements:
- 为类 Whisper 的 ASR 模型应用了束搜索(Beam Search)和重复惩罚(Repeat Penalty)
- 应用 ONNX Runtime IOBinding 实现最大加速(比常规 ort_session.run() 快 10%以上)
- 支持单次推理处理 20 秒的音频片段
- 改进了多线程性能
- Applied Beam Search & Repeat Penalty for Whisper-like ASR models
- Applied ONNX Runtime IOBinding for maximum speed up (10%+ faster than normal ort_session.run())
- Support for 20 seconds audio segment per single run inference
- Improved multi-threads performance
- ✅ 硬件支持扩展 / Hardware Support Expansion:
- AMD-ROCm 执行提供程序 / Execution Provider
- AMD-MIGraphX 执行提供程序 / Execution Provider
- NVIDIA TensorRTX 执行提供程序 / Execution Provider
- (必须先配置环境,否则无法工作 / Must config the env first or it will not work)
- ✅ 准确性改进 / Accuracy Improvements:
- SenseVoice
- Paraformer
- FireRedASR
- Dolphin
- ZipEnhancer
- MossFormerGAN_SE_16K
- NVIDIA-NeMo-VAD
- ✅ 速度改进 / Speed Improvements:
- MelBandRoformer (通过转换为单声道提升速度 / speed boost by converting to mono channel)
- ❌ 移除的模型 / Removed Models:
- FSMN-VAD
- Qwen3-4B-Official
- Qwen3-8B-Official
- Gemma3-4B-it
- Gemma3-12B-it
- InternLM3
- Phi-4-Instruct
2025/7/5 - 降噪增强 / Noise Reduction Enhancement
- ✅ 新增降噪模型 / Added noise reduction model: MossFormerGAN_SE_16K
2025/6/11 - VAD 模型扩展 / VAD Models Expansion
- ✅ 新增 VAD 模型 / Added VAD Models:
- HumAware-VAD
- NVIDIA-NeMo-VAD
- TEN-VAD
2025/6/3 - 亚洲语言支持 / Asian Language Support
- ✅ 新增 Dolphin ASR 模型以支持亚洲语言 / Added Dolphin ASR model to support Asian languages
2025/5/13 - GPU 加速 / GPU Acceleration
- ✅ 新增 Float16/32 ASR 模型以支持 CUDA/DirectML GPU / Added Float16/32 ASR models to support CUDA/DirectML GPU usage
- ✅ GPU 性能 / GPU Performance: 这些模型可以实现超过 99% 的 GPU 算子部署 / These models can achieve >99% GPU operator deployment
2025/5/9 - 主要功能发布 / Major Feature Release
- ✅ 灵活性改进 / Flexibility Improvements:
- 新增不使用 VAD(语音活动检测)的选项 / Added option to not use VAD (Voice Activity Detection)
- ✅ 新增模型 / Added Models:
- 降噪 / Noise reduction: MelBandRoformer
- ASR: CrisperWhisper
- ASR: Whisper-Large-v3.5-Distil (英语微调 / English fine-tuned)
- ASR: FireRedASR-AED-L (支持中文及方言 / Chinese + dialects support)
- 三个日语动漫微调的 Whisper 模型 / Three Japanese anime fine-tuned Whisper models
- ✅ 性能优化 / Performance Optimizations:
- 移除 IPEX-LLM 框架以提升整体性能 / Removed IPEX-LLM framework to enhance overall performance
- 取消 LLM 量化选项,统一使用 Q4F32 格式 / Cancelled LLM quantization options, standardized on Q4F32 format
- Whisper 系列推理速度提升 10% 以上 / Improved Whisper series inference speed by over 10%
- ✅ 准确性改进 / Accuracy Improvements:
- 提升 FSMN-VAD 准确率 / Improved FSMN-VAD accuracy
- 提升 Paraformer 识别准确率 / Improved Paraformer recognition accuracy
- 提升 SenseVoice 识别准确率 / Improved SenseVoice recognition accuracy
- ✅ LLM 支持 ONNX Runtime 100% GPU 算子部署 / LLM Support with ONNX Runtime 100% GPU operator deployment:
- Qwen3-4B/8B
- InternLM3-8B
- Phi-4-mini-Instruct
- Gemma3-4B/12B-it
- ✅ 硬件支持扩展 / Hardware Support Expansion:
- Intel OpenVINO
- NVIDIA CUDA GPU
- Windows DirectML GPU (支持集成显卡和独立显卡 / supports integrated and discrete GPUs)
🗺️ 路线图 / Roadmap
- Beam Search for LLM
- 视频超分 / Video Upscaling - 提升分辨率 / Enhance resolution
- 实时播放器 / Real-time Player - 实时转录和翻译 / Live transcription and translation
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
