Automatic Speech Recognition
Transformers
PyTorch
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results
Instructions to use openai/whisper-large-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large-v3") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3") - Inference
- Notebooks
- Google Colab
- Kaggle
Benchmark: FunASR 170x realtime vs Whisper 13x — non-autoregressive alternative
#233
by langgz - opened
For those looking for faster alternatives to Whisper, FunASR offers non-autoregressive models that are significantly faster:
| Model | Speed (GPU) | Speed (CPU) | Languages |
|---|---|---|---|
| FunASR SenseVoice | 170x realtime | 17x realtime | 5 |
| FunASR Paraformer | 120x realtime | 15x realtime | 2 |
| Whisper-large-v3 | 13x realtime | ❌ | 57 |
Key advantage: FunASR runs on CPU faster than Whisper runs on GPU.
Also includes built-in speaker diarization and emotion detection.
pip install funasr
from funasr import AutoModel
model = AutoModel(model="FunAudioLLM/SenseVoiceSmall", hub="hf", vad_model="funasr/fsmn-vad", device="cuda")
result = model.generate(input="audio.wav")
Benchmark details: https://modelscope.github.io/FunASR/benchmark.html