Instructions to use nvidia/canary-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/canary-1b with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/canary-1b") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Transcription repeats the same word
Hi,
Thanks for making this model available!
I tried to implement it and it works like a charm with audios up to 1 min length. Unfortunately if I try to transcribe a longer than 1 minute recording, it only transcribes the first 2-3 sentences, then just repeats the word where it gets stuck for the rest of the text. I have a 1 hour long recording I'm trying to transcribe, and if I crop to 1 minute, it is perfect. 5 minutes already has the problem of repeating a word, and it can't even transcribe the first 1 minute properly. I used the basic example code for an English language transcription. Do you have an ide how to solve this issue?
I have an Nvidia RTX4090 GPU with 24 GB memory and I would like to infer only.
Thanks,
Agi
Hi, thanks for trying out the model! We have a special script for inference on longer samples here: https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed_chunked_infer.py. It should fix your issues.
Oh, perfect!
It took me a bit of time to figure out, that I need to build the environment from git via pip from source instead of just pip, but otherwise it worked smooth and could transcribe a 1 hour long recording.