| --- |
| language: be |
| tags: |
| - conformer |
| - ctc |
| - mlx |
| - apple-silicon |
| - speech-recognition |
| - asr |
| - belarusian |
| - nemo |
| license: cc-by-4.0 |
| datasets: |
| - mozilla-foundation/common_voice_10_0 |
| metrics: |
| - wer |
| base_model: nvidia/stt_be_conformer_ctc_large |
| pipeline_tag: automatic-speech-recognition |
| --- |
| |
| # Conformer-CTC Belarusian (MLX) |
|
|
| **Code:** [molind/mlx-conformer](https://github.com/molind/mlx-conformer) |
|
|
| NVIDIA's Conformer-CTC Large model for Belarusian speech recognition, packaged for [MLX](https://github.com/ml-explore/mlx) inference on Apple Silicon. |
|
|
| Original model: [nvidia/stt_be_conformer_ctc_large](https://huggingface.co/nvidia/stt_be_conformer_ctc_large) |
|
|
| ## Results |
|
|
| | Dataset | WER | Speed | |
| |---------|-----|-------| |
| | CommonVoice 24.0 test (500 samples) | **7.58%** | 8.2 samples/s | |
|
|
| ## Usage |
|
|
| ```bash |
| pip install mlx numpy pyyaml torch |
| |
| git clone https://github.com/molind/mlx-conformer |
| cd mlx-conformer |
| |
| python mlx_conformer.py \ |
| --download nvidia/stt_be_conformer_ctc_large \ |
| --output models |
| |
| python mlx_conformer.py --model models/stt_be_conformer_ctc_large --audio test.mp3 |
| ``` |
|
|
| ## Architecture |
|
|
| - 18 Conformer layers, d_model=512, 8 heads |
| - Conv kernel size 31, 4x subsampling |
| - 128 BPE vocabulary + blank |
| - ~120M parameters |
| |
| ## License |
| |
| Original model by NVIDIA, licensed under CC-BY-4.0. |
| |