YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

🎬 视频字幕转录和翻译 / Transcribe and Translate Subtitles

一个强大的、隐私优先的视频字幕转录和翻译工具
A powerful, privacy-first tool for transcribing and translating video subtitles

Privacy First ONNX Runtime Multi-Platform


🔒 隐私保证 / Privacy Guarantee

🚨 所有处理完全离线运行 / All processing runs completely offline

  • 无需互联网连接,确保最大程度的隐私和数据安全
  • No internet connection required, ensuring maximum privacy and data security.
  • Github

🚀 快速入门 / Quick Start

环境准备 / Prerequisites

# 安装 FFmpeg / Install FFmpeg
conda install ffmpeg

pip install -r requirements.txt

# 安装 Python 依赖 / Install Python dependencies
# 请根据您的硬件平台安装正确的包 / Please according to your hardware platform install the right package
# ----------------------------------------
# For CPU only
# onnxruntime>=1.23.2
# ----------------------------------------
# For Linux + AMD
# 请先按照 URL 设置 ROCm / Please follow the URL to set up the ROCm first before pip install onnxruntime-rocm
# https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-onnx.html
# https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html
# onnxruntime>=1.23.2
# onnxruntime-rocm>=1.23.0
# ----------------------------------------
# For Windows + (Intel or AMD)
# onnxruntime>=1.23.2
# onnxruntime-directml>=1.23.0
# ----------------------------------------
# For Intel OpenVINO CPU & GPU & NPU
# onnxruntime>=1.23.2
# onnxruntime-openvino>=1.23.0
# ----------------------------------------
# For NVIDIA-CUDA
# onnxruntime>=1.23.2
# onnxruntime-gpu>=1.23.2
# ----------------------------------------

设置

  1. 下载模型: 从 HuggingFace 获取所需模型 ,只下载您想要的模型并保持文件夹路径与当前定义相同,无需全部下载。
  2. 下载脚本: 将 run.py 放置在您的 Transcribe_and_Translate_Subtitles 文件夹中
  3. 添加媒体: 将您的音视频放置在 Transcribe_and_Translate_Subtitles/Media/ 目录下
  4. 运行: 务必在Transcribe_and_Translate_Subtitles目录下执行 python run.py 并打开 Web 界面

Setup

  1. Download Models: Get the required models from HuggingFace. Download only the models you want and keep the folder path the same as currently defined, no need to download all.
  2. Download Script: Place run.py in your Transcribe_and_Translate_Subtitles folder
  3. Add Media: Place your audios/videos in Transcribe_and_Translate_Subtitles/Media/
  4. Run: You must execute python run.py in the Transcribe_and_Translate_Subtitles folder and open the web interface

结果 / Results

在以下位置找到您处理后的字幕 / Find your processed subtitles in:

Transcribe_and_Translate_Subtitles/Results/Subtitles/

准备好开始了吗?/ Ready to get started? 🎉

demo


✨ 功能特性 / Features

🔇 降噪模型 / Noise Reduction Models

🎤 语音活动检测 (VAD) / Voice Activity Detection (VAD)

🗣️ 语音识别 (ASR) / Speech Recognition (ASR)

多语言模型 / Multilingual Models

🤖 翻译模型 (LLM) / Translation Models (LLM)


🖥️ 硬件支持 / Hardware Support

💻 中央處理器 (CPU) 🎮 圖形處理器 (GPU) 🧠 神經網路處理單元 (NPU)
  • Apple Silicon
  • AMD
  • ARM
  • Intel
  • Apple CoreML
  • AMD ROCm
  • Intel OpenVINO
  • NVIDIA CUDA
  • Windows DirectML
  • Apple CoreML
  • AMD Ryzen-VitisAI
  • Intel OpenVINO

📊 性能基准测试 / Performance Benchmarks

测试条件 / Test Conditions: Ubuntu 24.04, Intel i3-12300, 7602 秒视频

操作系统 (OS) 后端 (Backend) 降噪器 (Denoiser) VAD 语音识别 (ASR) 大语言模型 (LLM) 实时率
(Real-Time Factor)
Ubuntu-24.04 CPU i3-12300 - Silero SenseVoiceSmall - 0.08
Ubuntu-24.04 CPU i3-12300 GTCRN Silero SenseVoiceSmall Qwen2.5-7B-Instruct 0.50
Ubuntu-24.04 CPU i3-12300 GTCRN FSMN SenseVoiceSmall - 0.054
Ubuntu-24.04 CPU i3-12300 ZipEnhancer FSMN SenseVoiceSmall - 0.39
Ubuntu-24.04 CPU i3-12300 GTCRN Silero Whisper-Large-V3 - 0.20
Ubuntu-24.04 CPU i3-12300 GTCRN FSMN Whisper-Large-V3-Turbo - 0.148

🛠️ 问题排查 / Troubleshooting

常见问题 / Common Issues

  • Silero VAD 错误 / Silero VAD Error: 首次运行时只需重启应用程序 / Simply restart the application on first run
  • libc++ 错误 (Linux) / libc++ Error (Linux):
    sudo apt update
    sudo apt install libc++1
    
  • 苹果芯片 / Apple Silicon: 请避免安装 onnxruntime-openvino,因为它会导致错误 / Avoid installing onnxruntime-openvino as it will cause errors

📋 更新历史 / Update History

🆕 2026/1/04 - 更新 / Release

  • 新增 ASR / Added ASR:
    • FunASR-Nano-2512
    • FunASR-Nano-MLT-2512
  • 性能改进 / Performance Improvements:
    • 改善 SenseVoice & Paraformer 长音频的准确度
    • 改善 Nvidia_VAD, Ten_VAD, HumAware_VAD 音频切割准确度
    • 修复 LLM 在翻译时偶尔输出乱码文字
    • Improve the long audio accuracy of SenseVoice and Paraformer
    • Improve the accuracy of Nvidia_VAD, Ten_VAD, and HumAwareVAD audio segmentation
    • Fix LLM occasionally outputting garbled text during translation

🆕 2025/9/19 - 重大更新 / Major Release

  • 新增 ASR / Added ASR:
    • 28 个地区微调的 Whisper 模型
    • 28 region fine-tuned Whisper models
  • 新增降噪器 / Added Denoiser: MossFormer2_SE_48K
  • 新增 LLM 模型 / Added LLM Models:
    • Qwen3-4B-Instruct-2507-abliterated
    • Qwen3-8B-abliterated-v2
    • Hunyuan-MT-7B-abliterated
    • Seed-X-PRO-7B
  • 性能改进 / Performance Improvements:
    • 为类 Whisper 的 ASR 模型应用了束搜索(Beam Search)和重复惩罚(Repeat Penalty)
    • 应用 ONNX Runtime IOBinding 实现最大加速(比常规 ort_session.run() 快 10%以上)
    • 支持单次推理处理 20 秒的音频片段
    • 改进了多线程性能
    • Applied Beam Search & Repeat Penalty for Whisper-like ASR models
    • Applied ONNX Runtime IOBinding for maximum speed up (10%+ faster than normal ort_session.run())
    • Support for 20 seconds audio segment per single run inference
    • Improved multi-threads performance
  • 硬件支持扩展 / Hardware Support Expansion:
    • AMD-ROCm 执行提供程序 / Execution Provider
    • AMD-MIGraphX 执行提供程序 / Execution Provider
    • NVIDIA TensorRTX 执行提供程序 / Execution Provider
    • (必须先配置环境,否则无法工作 / Must config the env first or it will not work)
  • 准确性改进 / Accuracy Improvements:
    • SenseVoice
    • Paraformer
    • FireRedASR
    • Dolphin
    • ZipEnhancer
    • MossFormerGAN_SE_16K
    • NVIDIA-NeMo-VAD
  • 速度改进 / Speed Improvements:
    • MelBandRoformer (通过转换为单声道提升速度 / speed boost by converting to mono channel)
  • 移除的模型 / Removed Models:
    • FSMN-VAD
    • Qwen3-4B-Official
    • Qwen3-8B-Official
    • Gemma3-4B-it
    • Gemma3-12B-it
    • InternLM3
    • Phi-4-Instruct

2025/7/5 - 降噪增强 / Noise Reduction Enhancement

  • 新增降噪模型 / Added noise reduction model: MossFormerGAN_SE_16K

2025/6/11 - VAD 模型扩展 / VAD Models Expansion

  • 新增 VAD 模型 / Added VAD Models:
    • HumAware-VAD
    • NVIDIA-NeMo-VAD
    • TEN-VAD

2025/6/3 - 亚洲语言支持 / Asian Language Support

  • 新增 Dolphin ASR 模型以支持亚洲语言 / Added Dolphin ASR model to support Asian languages

2025/5/13 - GPU 加速 / GPU Acceleration

  • 新增 Float16/32 ASR 模型以支持 CUDA/DirectML GPU / Added Float16/32 ASR models to support CUDA/DirectML GPU usage
  • GPU 性能 / GPU Performance: 这些模型可以实现超过 99% 的 GPU 算子部署 / These models can achieve >99% GPU operator deployment

2025/5/9 - 主要功能发布 / Major Feature Release

  • 灵活性改进 / Flexibility Improvements:
    • 新增不使用 VAD(语音活动检测)的选项 / Added option to not use VAD (Voice Activity Detection)
  • 新增模型 / Added Models:
    • 降噪 / Noise reduction: MelBandRoformer
    • ASR: CrisperWhisper
    • ASR: Whisper-Large-v3.5-Distil (英语微调 / English fine-tuned)
    • ASR: FireRedASR-AED-L (支持中文及方言 / Chinese + dialects support)
    • 三个日语动漫微调的 Whisper 模型 / Three Japanese anime fine-tuned Whisper models
  • 性能优化 / Performance Optimizations:
    • 移除 IPEX-LLM 框架以提升整体性能 / Removed IPEX-LLM framework to enhance overall performance
    • 取消 LLM 量化选项,统一使用 Q4F32 格式 / Cancelled LLM quantization options, standardized on Q4F32 format
    • Whisper 系列推理速度提升 10% 以上 / Improved Whisper series inference speed by over 10%
  • 准确性改进 / Accuracy Improvements:
    • 提升 FSMN-VAD 准确率 / Improved FSMN-VAD accuracy
    • 提升 Paraformer 识别准确率 / Improved Paraformer recognition accuracy
    • 提升 SenseVoice 识别准确率 / Improved SenseVoice recognition accuracy
  • LLM 支持 ONNX Runtime 100% GPU 算子部署 / LLM Support with ONNX Runtime 100% GPU operator deployment:
    • Qwen3-4B/8B
    • InternLM3-8B
    • Phi-4-mini-Instruct
    • Gemma3-4B/12B-it
  • 硬件支持扩展 / Hardware Support Expansion:
    • Intel OpenVINO
    • NVIDIA CUDA GPU
    • Windows DirectML GPU (支持集成显卡和独立显卡 / supports integrated and discrete GPUs)

🗺️ 路线图 / Roadmap

  • Beam Search for LLM
  • 视频超分 / Video Upscaling - 提升分辨率 / Enhance resolution
  • 实时播放器 / Real-time Player - 实时转录和翻译 / Live transcription and translation
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support