Automatic Speech Recognition
NeMo
PyTorch
automatic-speech-translation
speech
audio
Transformer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results (legacy)
Eval Results
Instructions to use nvidia/canary-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/canary-1b with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/canary-1b") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Update README.md
#7
by steveheh - opened
README.md
CHANGED
|
@@ -402,7 +402,7 @@ The model outputs the transcribed/translated text corresponding to the input aud
|
|
| 402 |
## Training
|
| 403 |
|
| 404 |
Canary-1B is trained using the NVIDIA NeMo toolkit [4] for 150k steps with dynamic bucketing and a batch duration of 360s per GPU on 128 NVIDIA A100 80GB GPUs.
|
| 405 |
-
The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/
|
| 406 |
|
| 407 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
| 408 |
|
|
|
|
| 402 |
## Training
|
| 403 |
|
| 404 |
Canary-1B is trained using the NVIDIA NeMo toolkit [4] for 150k steps with dynamic bucketing and a batch duration of 360s per GPU on 128 NVIDIA A100 80GB GPUs.
|
| 405 |
+
The model can be trained using this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/speech_multitask/speech_to_text_aed.py) and [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/speech_multitask/fast-conformer_aed.yaml).
|
| 406 |
|
| 407 |
The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
|
| 408 |
|