Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results
Instructions to use openai/whisper-large-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v2")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large-v2") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v2") - Notebooks
- Google Colab
- Kaggle
How to use a normal file with model?
#6
by eikosa - opened
How can I run this model with an mp3 for example I have?
eikosa changed discussion status to closed
Answer is:
speech, sr = torchaudio.load("asd.ogg")
sampling_rate = 16_000
resampler = torchaudio.transforms.Resample(sr, sampling_rate)
speech = speech.squeeze()
speech = resampler(speech)
input_speech = speech
You can use pipeline as per the demo at https://huggingface.co/spaces/sanchit-gandhi/whisper-large-v2
from transformers import pipeline
device = 0 if torch.cuda.is_available() else "cpu"
pipe = pipeline(
task="automatic-speech-recognition",
model="openai/whisper-large-v2",
chunk_length_s=30,
device=device,
)
out = pipe(audio)["text"]
where audio is the path to an audio file or a loaded audio array (see https://github.com/huggingface/transformers/blob/c1b9a11dd4be8af32b3274be7c9774d5a917c56d/src/transformers/pipelines/automatic_speech_recognition.py#L201)