anhnv125's picture
Initial upload: full CoreML bundle + raw-logits JointSingleStep
90b445f verified
---
license: cc-by-4.0
tags:
- speech-to-text
- parakeet
- coreml
- ane
- biasing
base_model: nvidia/parakeet-tdt-0.6b-v3
---
# Parakeet TDT 0.6B v3 — CoreML with raw-logits single-step joint
CoreML export of NVIDIA's `nvidia/parakeet-tdt-0.6b-v3` for Apple Silicon (ANE).
Derived from [FluidInference/parakeet-tdt-0.6b-v3-coreml](https://huggingface.co/FluidInference/parakeet-tdt-0.6b-v3-coreml),
with an additional `JointSingleStep.mlmodelc` that exposes raw token + duration
logits per step.
`JointSingleStep` lets host-side code (e.g. FluidAudio's
`ParakeetBooster`) add log-prob offsets to the token logits before argmax —
the shallow-fusion step needed for trie-based context/vocabulary biasing.
The standard `JointDecision.mlmodelc` argmaxes inside the CoreML graph and
can't be biased.
## Files
| File | Purpose |
|---|---|
| `Preprocessor.mlmodelc` | Log-mel frontend, 15 s window, 16 kHz mono |
| `Encoder.mlmodelc` | FastConformer encoder |
| `Decoder.mlmodelc` | RNN-T prediction network (LSTM, U=1 streaming) |
| `JointDecision.mlmodelc` | Single-step joint + argmax (token_id, prob, duration) |
| `JointSingleStep.mlmodelc` | Single-step joint — **raw token_logits + duration_logits** (for biasing) |
| `parakeet_vocab.json` / `parakeet_v3_vocab.json` | SentencePiece vocabulary |
| `config.json` | Metadata |
All components use a fixed 15 s audio window (240 000 samples @ 16 kHz).
Deployment target: iOS 17 / macOS 14.
## Usage with FluidAudio
Consume via
[github.com/vietanh125/FluidAudio](https://github.com/vietanh125/FluidAudio)
(branch `boosting`), which downloads this bundle by default for
`AsrModelVersion.v3`. See
`Sources/FluidAudio/ASR/Parakeet/SlidingWindow/CustomVocabulary/Boosting/`
for `ParakeetBooster` / `TokenBoostTrie`.
## Source
`JointSingleStep` was generated with the script at
[mobius/models/stt/parakeet-tdt-v3-0.6b/coreml](https://github.com/FluidInference/mobius).
The wrapper adds a `JointSingleStep` class that returns `token_logits` and
`duration_logits` directly.
## License
CC-BY-4.0 — matches the upstream NVIDIA NeMo Parakeet checkpoint.