Automatic Speech Recognition
NeMo
PyTorch
automatic-speech-translation
speech
audio
Transformer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results (legacy)
Eval Results
Instructions to use nvidia/canary-1b-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/canary-1b-v2 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/canary-1b-v2") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Is it possible to use "prompt" or "hotwords" to steer decoding similar to Whisper?
#8
by spashii - opened
^title
It should be possible to do with a corpus at least: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/asr_customization/ngpulm_language_modeling_and_customization.html#ngpulm-ngram-modeling
Training the n-gram model is really fast.
But any time I add it and try to transcribe on a sound file that's more than 40 seconds long (still less than a minute) it will drop a bunch of sentences.
And then they have word boosting which doesn't seem to work on an AER model like canary