CoreML Speech Models
Collection
Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. β’ 16 items β’ Updated β’ 1
CoreML conversion of NVIDIA Parakeet-TDT 0.6B v2 with INT8-quantized encoder for Apple Neural Engine acceleration.
| Model | Description | Compute | Quantization |
|---|---|---|---|
encoder.mlmodelc |
FastConformer encoder (24L, 1024 hidden) | CPU + Neural Engine | INT8 palettized |
decoder.mlmodelc |
LSTM prediction network (2L, 640 hidden) | CPU + Neural Engine | FP16 |
joint.mlmodelc |
TDT dual-head joint (token + duration logits) | CPU + Neural Engine | FP16 |
| File | Description |
|---|---|
vocab.json |
SentencePiece vocabulary (1024 tokens) |
config.json |
Model configuration |
torch.stft tracing bakes audio length as a constant, breaking per-feature normalization for variable-length inputs.EnumeratedShapes (100β3000 mel frames, covering 1β30s audio) to avoid BNNS crashes with dynamic shapes.Used by speech-swift ParakeetASR module:
let model = try await ParakeetASRModel.fromPretrained(modelId: ParakeetASRModel.int8ModelId)
let text = try model.transcribeAudio(samples, sampleRate: 16000)
Base model
nvidia/parakeet-tdt-0.6b-v2