Parakeet TDT-CTC 0.6B Japanese โ€” Core ML (Streaming ASR)

Core ML conversion of nvidia/parakeet-tdt_ctc-0.6b-ja for SlidingWindow streaming ASR on iOS/macOS (TranslateBlue / FluidAudio).

Streaming ASR contract

This repo is not a batch-only file transcription bundle. It targets live microphone ASR:

Aspect Specification
Architecture SlidingWindow pseudo-streaming (stateless 15s encoder + overlapping windows)
Runtime config SlidingWindowAsrConfig.streaming โ€” 11s chunk, 2s left/right context
Decoder Decoderv2 with U=1 LSTM state I/O (state carried across windows)
Jointer Jointerv2 โ€” TDT duration bins (max 4 frames for ja)
Encoder window Fixed 15s mel input ([1, 80, 1501]). Short windows (e.g. 5s) are unsupported.
Vocab 3072 BPE tokens, blank id 3072
CTC CtcDecoder.mlpackage is tier-2 failover only (raw logits, no log_softmax)

Not included: cache-aware true streaming (Parakeet EOU / Nemotron), fused FullPipeline / MelEncoder batch paths.

Artifacts (aoiandroid lowercase layout)

File FluidAudio name Role
preprocessor.mlpackage Preprocessor 16 kHz mono โ†’ mel
encoder.mlpackage Encoder mel โ†’ encoder_output
decoder.mlpackage Decoderv2 LSTM decoder with state
joint.mlpackage Jointerv2 TDT joint step
vocab_ja.json vocab.json SentencePiece vocabulary
CtcDecoder.mlpackage CtcDecoder CTC tier-2 failover

Conversion

Pipeline: FluidInference/mobius
Script: models/stt/parakeet-ctc-0.6b-ja/coreml/conversion/export-tdt-ja-streaming.py

cd mobius/models/stt/parakeet-ctc-0.6b-ja/coreml
uv sync
uv run python conversion/export-tdt-ja-streaming.py --output-dir ./build
uv run python conversion/export-full-pipeline.py --output-dir ./build --no-fused

Validation

  • Streaming: fluidaudiocli transcribe <wav> --streaming --model-version tdt-ja --model-dir ./build
  • Accuracy: JSUT TDT CER via fluidaudiocli ja-benchmark --dataset jsut --samples 500

License

Converted weights follow the upstream NVIDIA model license. Model card metadata: CC-BY-4.0.

Downloads last month
89
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support