Automatic Speech Recognition
Transformers
PyTorch
JAX
TensorBoard
ONNX
Safetensors
whisper
audio
asr
hf-asr-leaderboard
Instructions to use NbAiLab/nb-whisper-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NbAiLab/nb-whisper-large with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="NbAiLab/nb-whisper-large")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("NbAiLab/nb-whisper-large") model = AutoModelForSpeechSeq2Seq.from_pretrained("NbAiLab/nb-whisper-large") - Notebooks
- Google Colab
- Kaggle
nocaptions
#4
by mmichelli - opened
The <|nocaptions|> token is missing.
from faster_whisper import WhisperModel
model = WhisperModel("NbAiLab/nb-whisper-large", device="cuda")
nocaptions_token_id = model.hf_tokenizer.token_to_id("<|nocaptions|>")
print(f"<|nocaptions|> token ID: {nocaptions_token_id}")
<|nocaptions|> token ID: None
With the tiny model, which has been updated more recently:
<|nocaptions|> token ID: 50362
Hi,
It was renamed to <|nospeech|> in the later versions of the large Whisper.
...
{
"id": 50363,
"content": "<|nospeech|>",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": false,
"special": true
},
...
Cheers.
versae changed discussion status to closed
Thanks :)