Conformer-CTC Belarusian (MLX)
Code: molind/mlx-conformer
NVIDIA's Conformer-CTC Large model for Belarusian speech recognition, packaged for MLX inference on Apple Silicon.
Original model: nvidia/stt_be_conformer_ctc_large
Results
| Dataset | WER | Speed |
|---|---|---|
| CommonVoice 24.0 test (500 samples) | 7.58% | 8.2 samples/s |
Usage
pip install mlx numpy pyyaml torch
git clone https://github.com/molind/mlx-conformer
cd mlx-conformer
python mlx_conformer.py \
--download nvidia/stt_be_conformer_ctc_large \
--output models
python mlx_conformer.py --model models/stt_be_conformer_ctc_large --audio test.mp3
Architecture
- 18 Conformer layers, d_model=512, 8 heads
- Conv kernel size 31, 4x subsampling
- 128 BPE vocabulary + blank
- ~120M parameters
License
Original model by NVIDIA, licensed under CC-BY-4.0.
Model tree for molind/conformer-ctc-be-mlx
Base model
nvidia/stt_be_conformer_ctc_large