| --- |
| license: cc-by-4.0 |
| tags: |
| - speech-to-text |
| - parakeet |
| - coreml |
| - ane |
| - biasing |
| base_model: nvidia/parakeet-tdt-0.6b-v3 |
| --- |
| |
| # Parakeet TDT 0.6B v3 — CoreML with raw-logits single-step joint |
|
|
| CoreML export of NVIDIA's `nvidia/parakeet-tdt-0.6b-v3` for Apple Silicon (ANE). |
| Derived from [FluidInference/parakeet-tdt-0.6b-v3-coreml](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml), |
| with an additional `JointSingleStep.mlmodelc` that exposes raw token + duration |
| logits per step. |
|
|
| `JointSingleStep` lets host-side code (e.g. FluidAudio's |
| `ParakeetBooster`) add log-prob offsets to the token logits before argmax — |
| the shallow-fusion step needed for trie-based context/vocabulary biasing. |
| The standard `JointDecision.mlmodelc` argmaxes inside the CoreML graph and |
| can't be biased. |
|
|
| ## Files |
|
|
| | File | Purpose | |
| |---|---| |
| | `Preprocessor.mlmodelc` | Log-mel frontend, 15 s window, 16 kHz mono | |
| | `Encoder.mlmodelc` | FastConformer encoder | |
| | `Decoder.mlmodelc` | RNN-T prediction network (LSTM, U=1 streaming) | |
| | `JointDecision.mlmodelc` | Single-step joint + argmax (token_id, prob, duration) | |
| | `JointSingleStep.mlmodelc` | Single-step joint — **raw token_logits + duration_logits** (for biasing) | |
| | `parakeet_vocab.json` / `parakeet_v3_vocab.json` | SentencePiece vocabulary | |
| | `config.json` | Metadata | |
|
|
| All components use a fixed 15 s audio window (240 000 samples @ 16 kHz). |
| Deployment target: iOS 17 / macOS 14. |
|
|
| ## Usage with FluidAudio |
|
|
| Consume via |
| [github.com/vietanh125/FluidAudio](https://github.com/vietanh125/FluidAudio) |
| (branch `boosting`), which downloads this bundle by default for |
| `AsrModelVersion.v3`. See |
| `Sources/FluidAudio/ASR/Parakeet/SlidingWindow/CustomVocabulary/Boosting/` |
| for `ParakeetBooster` / `TokenBoostTrie`. |
|
|
| ## Source |
|
|
| `JointSingleStep` was generated with the script at |
| [mobius/models/stt/parakeet-tdt-v3-0.6b/coreml](https://github.com/FluidInference/mobius). |
| The wrapper adds a `JointSingleStep` class that returns `token_logits` and |
| `duration_logits` directly. |
|
|
| ## License |
|
|
| CC-BY-4.0 — matches the upstream NVIDIA NeMo Parakeet checkpoint. |
|
|