Update README.md

9bc92ea verified about 1 month ago

2.64 kB

license: cc-by-4.0
language:
  - en
metrics:
  - wer
base_model:
  - nvidia/parakeet-tdt_ctc-110m
pipeline_tag: automatic-speech-recognition
tags:
  - automatic-speech-recognition
  - speech
  - audio
  - Transducer
  - TDT
  - FastConformer
  - Conformer
  - pytorch
  - NeMo
  - hf-asr-leaderboard

Parakeet-TDT-CTC 110M — CoreML

CoreML export of nvidia/parakeet-tdt_ctc-110m for on-device speech recognition on Apple Silicon via FluidAudio.

CoreML Components

File	Size	Description
`Preprocessor.mlmodelc`	207 MB	Fused mel-spectrogram + FastConformer encoder
`Decoder.mlmodelc`	7.5 MB	1-layer LSTM prediction network
`JointDecision.mlmodelc`	2.7 MB	Single-step joint network (token + duration)
`parakeet_vocab.json`	18 KB	1024-token BPE vocabulary
`config.json`	2.5 KB	Model metadata and I/O contracts

Input: 16 kHz mono audio, fixed 15-second window (240,000 samples). Output: Token IDs, probabilities, and TDT duration predictions per encoder frame.

Performance

Benchmarked with FluidAudio CLI on Apple M2 (release build):

Benchmark	WER
LibriSpeech test-clean	3.0%
RTFx (overall)	102x real-time
Peak memory	0.3 GB

NVIDIA's reference WER (greedy, GPU):

Benchmark	WER
LibriSpeech test-clean	2.4%
LibriSpeech test-other	5.2%
AMI	15.88%
Earnings-22	12.42%
GigaSpeech	10.52%
TEDLIUM-v3	4.16%

Usage with FluidAudio

# Transcribe
fluidaudiocli transcribe audio.wav --model-version tdt-ctc-110m

# Benchmark
fluidaudiocli asr-benchmark --subset test-clean --model-version tdt-ctc-110m

Models auto-download from this repo on first use. To pre-fetch:

fluidaudiocli download --model-version tdt-ctc-110m

Conversion

Exported from NeMo using mobius/models/stt/parakeet-tdt-ctc-110m/coreml/convert-tdt-coreml.py:

Preprocessor fuses mel-spectrogram extraction and the FastConformer encoder into a single CoreML model
JointDecision is the single-step variant (encoder_step + decoder_step inputs) used by FluidAudio's TDT decoder
All models exported as MLProgram (iOS 17+ / macOS 14+), float32 precision

FluidInference
/

parakeet-tdt-ctc-110m-coreml

Parakeet-TDT-CTC 110M — CoreML

CoreML Components

Performance

Usage with FluidAudio

Conversion

References