File size: 1,313 Bytes
4242b13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e559b5
 
4242b13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language: be
tags:
- conformer
- ctc
- mlx
- apple-silicon
- speech-recognition
- asr
- belarusian
- nemo
license: cc-by-4.0
datasets:
- mozilla-foundation/common_voice_10_0
metrics:
- wer
base_model: nvidia/stt_be_conformer_ctc_large
pipeline_tag: automatic-speech-recognition
---

# Conformer-CTC Belarusian (MLX)

**Code:** [molind/mlx-conformer](https://github.com/molind/mlx-conformer)

NVIDIA's Conformer-CTC Large model for Belarusian speech recognition, packaged for [MLX](https://github.com/ml-explore/mlx) inference on Apple Silicon.

Original model: [nvidia/stt_be_conformer_ctc_large](https://huggingface.co/nvidia/stt_be_conformer_ctc_large)

## Results

| Dataset | WER | Speed |
|---------|-----|-------|
| CommonVoice 24.0 test (500 samples) | **7.58%** | 8.2 samples/s |

## Usage

```bash
pip install mlx numpy pyyaml torch

git clone https://github.com/molind/mlx-conformer
cd mlx-conformer

python mlx_conformer.py \
    --download nvidia/stt_be_conformer_ctc_large \
    --output models

python mlx_conformer.py --model models/stt_be_conformer_ctc_large --audio test.mp3
```

## Architecture

- 18 Conformer layers, d_model=512, 8 heads
- Conv kernel size 31, 4x subsampling
- 128 BPE vocabulary + blank
- ~120M parameters

## License

Original model by NVIDIA, licensed under CC-BY-4.0.