File size: 1,313 Bytes
4242b13 1e559b5 4242b13 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | ---
language: be
tags:
- conformer
- ctc
- mlx
- apple-silicon
- speech-recognition
- asr
- belarusian
- nemo
license: cc-by-4.0
datasets:
- mozilla-foundation/common_voice_10_0
metrics:
- wer
base_model: nvidia/stt_be_conformer_ctc_large
pipeline_tag: automatic-speech-recognition
---
# Conformer-CTC Belarusian (MLX)
**Code:** [molind/mlx-conformer](https://github.com/molind/mlx-conformer)
NVIDIA's Conformer-CTC Large model for Belarusian speech recognition, packaged for [MLX](https://github.com/ml-explore/mlx) inference on Apple Silicon.
Original model: [nvidia/stt_be_conformer_ctc_large](https://huggingface.co/nvidia/stt_be_conformer_ctc_large)
## Results
| Dataset | WER | Speed |
|---------|-----|-------|
| CommonVoice 24.0 test (500 samples) | **7.58%** | 8.2 samples/s |
## Usage
```bash
pip install mlx numpy pyyaml torch
git clone https://github.com/molind/mlx-conformer
cd mlx-conformer
python mlx_conformer.py \
--download nvidia/stt_be_conformer_ctc_large \
--output models
python mlx_conformer.py --model models/stt_be_conformer_ctc_large --audio test.mp3
```
## Architecture
- 18 Conformer layers, d_model=512, 8 heads
- Conv kernel size 31, 4x subsampling
- 128 BPE vocabulary + blank
- ~120M parameters
## License
Original model by NVIDIA, licensed under CC-BY-4.0.
|