Automatic Speech Recognition
NeMo
PyTorch
automatic-speech-translation
speech
audio
Transformer
FastConformer
Conformer
NeMo
hf-asr-leaderboard
Eval Results (legacy)
Eval Results
Instructions to use nvidia/canary-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nvidia/canary-1b with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/canary-1b") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -331,7 +331,7 @@ Another recommended option is to use a json manifest as input, where each line i
|
|
| 331 |
# Example of a line in input_manifest.json
|
| 332 |
{
|
| 333 |
"audio_filepath": "/path/to/audio.wav", # path to the audio file
|
| 334 |
-
"duration":
|
| 335 |
"taskname": "asr", # use "ast" for speech-to-text translation
|
| 336 |
"source_lang": "en", # language of the audio input, set `source_lang`==`target_lang` for ASR, choices=['en','de','es','fr']
|
| 337 |
"target_lang": "en", # language of the text output, choices=['en','de','es','fr']
|
|
@@ -364,7 +364,7 @@ An example manifest for transcribing English audios can be:
|
|
| 364 |
# Example of a line in input_manifest.json
|
| 365 |
{
|
| 366 |
"audio_filepath": "/path/to/audio.wav", # path to the audio file
|
| 367 |
-
"duration":
|
| 368 |
"taskname": "asr",
|
| 369 |
"source_lang": "en", # language of the audio input, set `source_lang`==`target_lang` for ASR, choices=['en','de','es','fr']
|
| 370 |
"target_lang": "en", # language of the text output, choices=['en','de','es','fr']
|
|
@@ -382,7 +382,7 @@ An example manifest for transcribing English audios into German text can be:
|
|
| 382 |
# Example of a line in input_manifest.json
|
| 383 |
{
|
| 384 |
"audio_filepath": "/path/to/audio.wav", # path to the audio file
|
| 385 |
-
"duration":
|
| 386 |
"taskname": "ast",
|
| 387 |
"source_lang": "en", # language of the audio input, choices=['en','de','es','fr']
|
| 388 |
"target_lang": "de", # language of the text output, choices=['en','de','es','fr']
|
|
|
|
| 331 |
# Example of a line in input_manifest.json
|
| 332 |
{
|
| 333 |
"audio_filepath": "/path/to/audio.wav", # path to the audio file
|
| 334 |
+
"duration": [SET TO THE ACTUAL DURATION OF AUDIO], # duration of the audio
|
| 335 |
"taskname": "asr", # use "ast" for speech-to-text translation
|
| 336 |
"source_lang": "en", # language of the audio input, set `source_lang`==`target_lang` for ASR, choices=['en','de','es','fr']
|
| 337 |
"target_lang": "en", # language of the text output, choices=['en','de','es','fr']
|
|
|
|
| 364 |
# Example of a line in input_manifest.json
|
| 365 |
{
|
| 366 |
"audio_filepath": "/path/to/audio.wav", # path to the audio file
|
| 367 |
+
"duration": [SET TO THE ACTUAL DURATION OF AUDIO], # duration of the audio
|
| 368 |
"taskname": "asr",
|
| 369 |
"source_lang": "en", # language of the audio input, set `source_lang`==`target_lang` for ASR, choices=['en','de','es','fr']
|
| 370 |
"target_lang": "en", # language of the text output, choices=['en','de','es','fr']
|
|
|
|
| 382 |
# Example of a line in input_manifest.json
|
| 383 |
{
|
| 384 |
"audio_filepath": "/path/to/audio.wav", # path to the audio file
|
| 385 |
+
"duration": [SET TO THE ACTUAL DURATION OF AUDIO], # duration of the audio
|
| 386 |
"taskname": "ast",
|
| 387 |
"source_lang": "en", # language of the audio input, choices=['en','de','es','fr']
|
| 388 |
"target_lang": "de", # language of the text output, choices=['en','de','es','fr']
|