metadata
license: cc-by-4.0
language:
- en
metrics:
- wer
base_model:
- nvidia/parakeet-tdt_ctc-110m
pipeline_tag: automatic-speech-recognition
tags:
- automatic-speech-recognition
- speech
- audio
- Transducer
- TDT
- FastConformer
- Conformer
- pytorch
- NeMo
- hf-asr-leaderboard
Parakeet-TDT-CTC 110M — CoreML
CoreML export of nvidia/parakeet-tdt_ctc-110m for on-device speech recognition on Apple Silicon via FluidAudio.
CoreML Components
| File | Size | Description |
|---|---|---|
Preprocessor.mlmodelc |
207 MB | Fused mel-spectrogram + FastConformer encoder |
Decoder.mlmodelc |
7.5 MB | 1-layer LSTM prediction network |
JointDecision.mlmodelc |
2.7 MB | Single-step joint network (token + duration) |
parakeet_vocab.json |
18 KB | 1024-token BPE vocabulary |
config.json |
2.5 KB | Model metadata and I/O contracts |
Input: 16 kHz mono audio, fixed 15-second window (240,000 samples). Output: Token IDs, probabilities, and TDT duration predictions per encoder frame.
Performance
Benchmarked with FluidAudio CLI on Apple M2 (release build):
| Benchmark | WER |
|---|---|
| LibriSpeech test-clean | 3.0% |
| RTFx (overall) | 102x real-time |
| Peak memory | 0.3 GB |
NVIDIA's reference WER (greedy, GPU):
| Benchmark | WER |
|---|---|
| LibriSpeech test-clean | 2.4% |
| LibriSpeech test-other | 5.2% |
| AMI | 15.88% |
| Earnings-22 | 12.42% |
| GigaSpeech | 10.52% |
| TEDLIUM-v3 | 4.16% |
Usage with FluidAudio
# Transcribe
fluidaudiocli transcribe audio.wav --model-version tdt-ctc-110m
# Benchmark
fluidaudiocli asr-benchmark --subset test-clean --model-version tdt-ctc-110m
Models auto-download from this repo on first use. To pre-fetch:
fluidaudiocli download --model-version tdt-ctc-110m
Conversion
Exported from NeMo using mobius/models/stt/parakeet-tdt-ctc-110m/coreml/convert-tdt-coreml.py:
- Preprocessor fuses mel-spectrogram extraction and the FastConformer encoder into a single CoreML model
- JointDecision is the single-step variant (encoder_step + decoder_step inputs) used by FluidAudio's TDT decoder
- All models exported as MLProgram (iOS 17+ / macOS 14+), float32 precision