metadata
license: cc-by-4.0
library_name: mlx
tags:
- mlx
- nemo
- parakeet
- speech-recognition
- automatic-speech-recognition
- fp16
- apple-silicon
- ios
- nvidia
- rnnt
- tdt
language:
- bg
- cs
- da
- de
- el
- en
- es
- et
- fi
- fr
- hr
- hu
- it
- lt
- lv
- mt
- nl
- pl
- pt
- ro
- ru
- sk
- sl
- sv
- uk
pipeline_tag: automatic-speech-recognition
base_model: nvidia/parakeet-tdt-0.6b-v2
Parakeet TDT 0.6B v3 - MLX FP16
This is the NVIDIA Parakeet TDT 0.6B model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.
Parakeet TDT is an ASR model based on the Token-and-Duration Transducer (TDT) architecture, trained on large-scale speech data. The v3 variant adds support for 25 European languages beyond the original English-only v2.
Model Details
| Property | Value |
|---|---|
| Base Model | nvidia/parakeet-tdt-0.6b-v2 |
| Parameters | ~600M |
| Format | MLX SafeTensors (FP16) |
| Model Size | 1,196.08 MB |
| Sample Rate | 16,000 Hz |
| Architecture | FastConformer + TDT |
| Encoder Hidden | 1024 |
| Predictor Hidden | 640 |
| Joint Hidden | 640 |
| TDT Durations | [0, 1, 2, 3, 4] |
| Tokenizer | BPE |
| Language | 25 European languages (BG, CS, DA, DE, EL, EN, ES, ET, FI, FR, HR, HU, IT, LT, LV, MT, NL, PL, PT, RO, RU, SK, SL, SV, UK) |
Intended Use
This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It supports 25 European languages: BG, CS, DA, DE, EL, EN, ES, ET, FI, FR, HR, HU, IT, LT, LV, MT, NL, PL, PT, RO, RU, SK, SL, SV, UK.
Files
config.json- Model configurationmodel.safetensors- Model weights in SafeTensors format (FP16)tokenizer.model- SentencePiece tokenizer modeltokenizer.vocab- Tokenizer vocabularyvocab.txt- Vocabulary text file
Original Model
- Paper: Parakeet: A Toolkit for Training and Evaluating ASR Models
- Authors: NVIDIA NeMo Team
- License: CC-BY-4.0