--- language: be tags: - conformer - ctc - mlx - apple-silicon - speech-recognition - asr - belarusian - nemo license: cc-by-4.0 datasets: - mozilla-foundation/common_voice_10_0 metrics: - wer base_model: nvidia/stt_be_conformer_ctc_large pipeline_tag: automatic-speech-recognition --- # Conformer-CTC Belarusian (MLX) **Code:** [molind/mlx-conformer](https://github.com/molind/mlx-conformer) NVIDIA's Conformer-CTC Large model for Belarusian speech recognition, packaged for [MLX](https://github.com/ml-explore/mlx) inference on Apple Silicon. Original model: [nvidia/stt_be_conformer_ctc_large](https://huggingface.co/nvidia/stt_be_conformer_ctc_large) ## Results | Dataset | WER | Speed | |---------|-----|-------| | CommonVoice 24.0 test (500 samples) | **7.58%** | 8.2 samples/s | ## Usage ```bash pip install mlx numpy pyyaml torch git clone https://github.com/molind/mlx-conformer cd mlx-conformer python mlx_conformer.py \ --download nvidia/stt_be_conformer_ctc_large \ --output models python mlx_conformer.py --model models/stt_be_conformer_ctc_large --audio test.mp3 ``` ## Architecture - 18 Conformer layers, d_model=512, 8 heads - Conv kernel size 31, 4x subsampling - 128 BPE vocabulary + blank - ~120M parameters ## License Original model by NVIDIA, licensed under CC-BY-4.0.