Parakeet TDT 0.6B v3 β CoreML
CoreML conversion of nvidia/parakeet-tdt-0.6b-v3 for Apple Silicon (ANE + GPU).
Architecture
Split architecture optimized for Apple Neural Engine:
- Encoder (
encoder.mlmodelc): Conformer encoder compiled for ANE/GPU (~1.18 GB) - Predictor + Joint (
predictor_joint.safetensors): LSTM predictor + Joint network as float32 safetensors (~69 MB)
The encoder runs on ANE/GPU via CoreML, while the predictor and joint networks run on CPU via Accelerate for optimal performance.
TDT (Token-and-Duration Transducer)
- Conformer encoder: 24 layers, 512 hidden, 8 attention heads
- LSTM predictor: bidirectional, 640 hidden
- Joint network: 640 hidden, 5 duration classes (0β4)
- Vocabulary: 8192 SentencePiece tokens
Audio Specifications
| Parameter | Value |
|---|---|
| Sample rate | 16,000 Hz |
| FFT size | 512 |
| Hop length | 160 (10 ms) |
| Mel bins | 128 |
| Max frequency | 8,000 Hz |
| Window | Hann |
EnumeratedShapes (Encoder Buckets)
The encoder supports 4 input duration buckets for optimized ANE scheduling:
| Bucket | Duration |
|---|---|
| 1 | 5 seconds |
| 2 | 10 seconds |
| 3 | 15 seconds |
| 4 | 30 seconds |
Size
~1.2 GB total (vs ~2.3 GB MLX float32)
Requirements
- macOS 15+ (Sequoia)
- Apple Silicon (M1+, ANE recommended)
- CoreML framework
License
This model is licensed under CC-BY-4.0. Original model by NVIDIA β attribution required.
See nvidia/parakeet-tdt-0.6b-v3 for the original model.
Source
Converted from nvidia/parakeet-tdt-0.6b-v3 via MLX intermediate format using oriloq-mlx.
Conversion chain: NeMo (.nemo) β MLX (safetensors) β CoreML (.mlmodelc)
- Downloads last month
- 44
Model tree for NeoRoth/parakeet-tdt-0.6b-v3-coreml
Base model
nvidia/parakeet-tdt-0.6b-v3