Instructions to use nur-dev/multilingual-asr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use nur-dev/multilingual-asr with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("nur-dev/multilingual-asr") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
| language: | |
| - kk | |
| - ru | |
| - uz | |
| - en | |
| license: cc-by-nc-4.0 | |
| tags: | |
| - automatic-speech-recognition | |
| - nemo | |
| - fastconformer | |
| - ctc | |
| - multilingual | |
| library_name: nemo | |
| pipeline_tag: automatic-speech-recognition | |
| # FastConformer Multilingual ASR | |
| A multilingual automatic speech recognition model supporting **Kazakh, Russian, Uzbek, and English**. Built on NVIDIA NeMo's FastConformer-CTC architecture. | |
| ## Languages | |
| | Language | Code | | |
| |----------|------| | |
| | Kazakh | `kk` | | |
| | Russian | `ru` | | |
| | Uzbek (Latin) | `uz` | | |
| | English | `en` | | |
| ## Results | |
| Full test set (76,739 samples): | |
| | Language | Samples | CER | WER | | |
| |----------|---------|-----|-----| | |
| | Russian | 10,203 | 2.34% | 9.16% | | |
| | Kazakh | 33,964 | 8.27% | 14.09% | | |
| | Uzbek | 16,184 | 7.10% | 28.82% | | |
| | English | 16,388 | 9.53% | 22.29% | | |
| | **Overall** | **76,739** | **7.73%** | **16.86%** | | |
| ## Usage | |
| ### Requirements | |
| ```bash | |
| pip install nemo_toolkit[asr] | |
| ``` | |
| ### Transcribe Audio Files | |
| ```python | |
| import nemo.collections.asr as nemo_asr | |
| model = nemo_asr.models.ASRModel.restore_from("fastconformer_multilingual.nemo") | |
| model.freeze() | |
| transcriptions = model.transcribe(["audio.wav"]) | |
| print(transcriptions[0]) | |
| ``` | |
| ### Real-Time Streaming Transcription | |
| ```python | |
| import nemo.collections.asr as nemo_asr | |
| import sounddevice as sd | |
| import numpy as np | |
| import queue | |
| import tempfile | |
| import os | |
| import soundfile as sf | |
| SAMPLE_RATE = 16000 | |
| CHUNK_SEC = 3 | |
| CHUNK_SAMPLES = SAMPLE_RATE * CHUNK_SEC | |
| model = nemo_asr.models.ASRModel.restore_from("fastconformer_multilingual.nemo") | |
| model.freeze() | |
| audio_queue = queue.Queue() | |
| def audio_callback(indata, frames, time, status): | |
| audio_queue.put(indata[:, 0].copy()) | |
| def transcribe_stream(): | |
| buffer = np.array([], dtype=np.float32) | |
| with sd.InputStream( | |
| samplerate=SAMPLE_RATE, | |
| channels=1, | |
| callback=audio_callback, | |
| blocksize=SAMPLE_RATE, | |
| ): | |
| print("Listening... (Ctrl+C to stop)") | |
| while True: | |
| chunk = audio_queue.get() | |
| buffer = np.concatenate([buffer, chunk]) | |
| if len(buffer) >= CHUNK_SAMPLES: | |
| tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False) | |
| sf.write(tmp.name, buffer[:CHUNK_SAMPLES], SAMPLE_RATE) | |
| result = model.transcribe([tmp.name]) | |
| text = result[0] if isinstance(result[0], str) else result[0].text | |
| if text.strip(): | |
| print(f"> {text}") | |
| os.unlink(tmp.name) | |
| buffer = buffer[CHUNK_SAMPLES:] | |
| if __name__ == "__main__": | |
| transcribe_stream() | |
| ``` | |
| ### Batch Transcription | |
| ```python | |
| import nemo.collections.asr as nemo_asr | |
| from pathlib import Path | |
| model = nemo_asr.models.ASRModel.restore_from("fastconformer_multilingual.nemo") | |
| model.freeze() | |
| audio_files = list(Path("audio_dir").glob("*.wav")) | |
| transcriptions = model.transcribe([str(f) for f in audio_files], batch_size=32) | |
| for path, text in zip(audio_files, transcriptions): | |
| t = text if isinstance(text, str) else text.text | |
| print(f"{path.name}: {t}") | |
| ``` | |
| ## Model Details | |
| - **Architecture**: FastConformer-CTC | |
| - **Framework**: NVIDIA NeMo | |
| - **Audio**: 16kHz mono WAV | |
| ## Limitations | |
| - Optimized for clear speech; performance may degrade on noisy audio | |
| - No punctuation or capitalization in output | |
| - Language is auto-detected, not explicitly specified | |
| ## License | |
| This model is released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Non-commercial use only. | |