Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results (legacy)
Instructions to use openai/whisper-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-small")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-small") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-small") - Notebooks
- Google Colab
- Kaggle
Allow single quotes "'" and hyphens "-"
#4
by sanchit-gandhi - opened
Remove single quotes ' (id 6) and hyphens - (id 12) from suppress_tokens. These tokens should not be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258
Check that we're removing the right tokens:
from transformers import WhisperTokenizer
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small")
print(tokenizer.decode(6))
print(tokenizer.decode(12))
Print Output:
```
'
ArthurZ changed pull request status to merged