Parakeet TDT v3 β€” CoreML INT4

CoreML conversion of NVIDIA Parakeet-TDT 0.6B v2 with INT4-quantized encoder for Apple Neural Engine acceleration.

Models

Model Description Compute Quantization
encoder.mlmodelc FastConformer encoder (24L, 1024 hidden) CPU + Neural Engine INT4 palettized
decoder.mlmodelc LSTM prediction network (2L, 640 hidden) CPU + Neural Engine FP16
joint.mlmodelc TDT dual-head joint (token + duration logits) CPU + Neural Engine FP16

Additional Files

File Description
vocab.json SentencePiece vocabulary (1024 tokens)
config.json Model configuration

Notes

  • Mel preprocessing is done in Swift using Accelerate/vDSP (not CoreML) because torch.stft tracing bakes audio length as a constant, breaking per-feature normalization for variable-length inputs.
  • Encoder uses EnumeratedShapes (100–3000 mel frames, covering 1–30s audio) to avoid BNNS crashes with dynamic shapes.
  • Performance: ~110x RTF on M4 Pro via Neural Engine.

Usage

Used by qwen3-asr-swift ParakeetASR module:

let model = try await ParakeetASRModel.fromPretrained()
let text = try model.transcribeAudio(samples, sampleRate: 16000)
Downloads last month
105
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aufklarer/Parakeet-TDT-v3-CoreML-INT4

Finetuned
(22)
this model