Conformer-Transducer (RNN-T) Belarusian (MLX)

Code: molind/mlx-conformer

NVIDIA's Conformer-Transducer Large model for Belarusian speech recognition, packaged for MLX inference on Apple Silicon.

Original model: nvidia/stt_be_conformer_transducer_large

Results

Dataset WER Speed
CommonVoice 24.0 test (500 samples) 6.29% 7.6 samples/s

Best open-source Belarusian ASR result we are aware of on CommonVoice 24.0.

Usage

pip install mlx numpy pyyaml sentencepiece torch

git clone https://github.com/molind/mlx-conformer
cd mlx-conformer

python mlx_conformer.py \
    --download nvidia/stt_be_conformer_transducer_large \
    --output models

python mlx_conformer.py \
    --model models/stt_be_conformer_transducer_large \
    --type transducer \
    --audio test.mp3

Architecture

  • 17 Conformer encoder layers, d_model=512, 8 heads
  • LSTM prediction network (1 layer, 640 hidden)
  • Joint network (640 hidden, ReLU)
  • 1024 BPE vocabulary + blank
  • ~120M parameters

License

Original model by NVIDIA, licensed under CC-BY-4.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for molind/conformer-transducer-be-mlx

Finetuned
(1)
this model