Automatic Speech Recognition
Transformers
PyTorch
TensorFlow
JAX
Safetensors
whisper
audio
hf-asr-leaderboard
Eval Results
Instructions to use openai/whisper-large-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/whisper-large-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v2")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("openai/whisper-large-v2") model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v2") - Notebooks
- Google Colab
- Kaggle
How can I detect the language of the audio by loading the model named WhisperForConditionalGeneration?
#40
by lnpwcd68730 - opened
This is how I expect to load the model
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")
This is the language detection method mentioned in the README of whisper
import whisper
model = whisper.load_model("base")
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")
See the last code cell of this Colab: https://colab.research.google.com/drive/1rS1L4YSJqKUH_3YxIQHBI982zso23wor?usp=sharing#scrollTo=Mh_e6rV62QUM
sanchit-gandhi changed discussion status to closed
thank you!