molind's picture
Upload README.md with huggingface_hub
1e559b5 verified
metadata
language: be
tags:
  - conformer
  - ctc
  - mlx
  - apple-silicon
  - speech-recognition
  - asr
  - belarusian
  - nemo
license: cc-by-4.0
datasets:
  - mozilla-foundation/common_voice_10_0
metrics:
  - wer
base_model: nvidia/stt_be_conformer_ctc_large
pipeline_tag: automatic-speech-recognition

Conformer-CTC Belarusian (MLX)

Code: molind/mlx-conformer

NVIDIA's Conformer-CTC Large model for Belarusian speech recognition, packaged for MLX inference on Apple Silicon.

Original model: nvidia/stt_be_conformer_ctc_large

Results

Dataset WER Speed
CommonVoice 24.0 test (500 samples) 7.58% 8.2 samples/s

Usage

pip install mlx numpy pyyaml torch

git clone https://github.com/molind/mlx-conformer
cd mlx-conformer

python mlx_conformer.py \
    --download nvidia/stt_be_conformer_ctc_large \
    --output models

python mlx_conformer.py --model models/stt_be_conformer_ctc_large --audio test.mp3

Architecture

  • 18 Conformer layers, d_model=512, 8 heads
  • Conv kernel size 31, 4x subsampling
  • 128 BPE vocabulary + blank
  • ~120M parameters

License

Original model by NVIDIA, licensed under CC-BY-4.0.