Parakeet TDT 0.6B v3 β€” CoreML

CoreML conversion of nvidia/parakeet-tdt-0.6b-v3 for Apple Silicon (ANE + GPU).

Architecture

Split architecture optimized for Apple Neural Engine:

  • Encoder (encoder.mlmodelc): Conformer encoder compiled for ANE/GPU (~1.18 GB)
  • Predictor + Joint (predictor_joint.safetensors): LSTM predictor + Joint network as float32 safetensors (~69 MB)

The encoder runs on ANE/GPU via CoreML, while the predictor and joint networks run on CPU via Accelerate for optimal performance.

TDT (Token-and-Duration Transducer)

  • Conformer encoder: 24 layers, 512 hidden, 8 attention heads
  • LSTM predictor: bidirectional, 640 hidden
  • Joint network: 640 hidden, 5 duration classes (0–4)
  • Vocabulary: 8192 SentencePiece tokens

Audio Specifications

Parameter Value
Sample rate 16,000 Hz
FFT size 512
Hop length 160 (10 ms)
Mel bins 128
Max frequency 8,000 Hz
Window Hann

EnumeratedShapes (Encoder Buckets)

The encoder supports 4 input duration buckets for optimized ANE scheduling:

Bucket Duration
1 5 seconds
2 10 seconds
3 15 seconds
4 30 seconds

Size

~1.2 GB total (vs ~2.3 GB MLX float32)

Requirements

  • macOS 15+ (Sequoia)
  • Apple Silicon (M1+, ANE recommended)
  • CoreML framework

License

This model is licensed under CC-BY-4.0. Original model by NVIDIA β€” attribution required.

See nvidia/parakeet-tdt-0.6b-v3 for the original model.

Source

Converted from nvidia/parakeet-tdt-0.6b-v3 via MLX intermediate format using oriloq-mlx.

Conversion chain: NeMo (.nemo) β†’ MLX (safetensors) β†’ CoreML (.mlmodelc)

Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for NeoRoth/parakeet-tdt-0.6b-v3-coreml

Finetuned
(26)
this model