metadata
license: cc-by-4.0
language:
- en
tags:
- asr
- speech
- coreml
- nemo
- parakeet
- nvidia
library_name: coremltools
pipeline_tag: automatic-speech-recognition
base_model: nvidia/parakeet-tdt-0.6b-v3
parakeet-tdt-0.6b-v3-coreml
CoreML conversion of nvidia/parakeet-tdt-0.6b-v3.
| Architecture | TDT (Token-and-Duration Transducer) |
| Language | English |
| Sample rate | 16000 Hz |
| Max audio | 15.0s |
| Vocab size | 8192 |
| Framework | NVIDIA NeMo → CoreML (coremltools) |
Components
| File | Component | Best compute |
|---|---|---|
parakeet_mel_encoder.mlpackage |
mel_encoder | ANE / GPU |
parakeet_decoder.mlpackage |
decoder | CPU only |
parakeet_joint_decision_single_step.mlpackage |
joint_decision_single_step | ANE / GPU |
Usage
pip install ovos-stt-plugin-coreml
from ovos_stt_plugin_coreml import CoremlSTT
from ovos_plugin_manager.utils.audio import AudioFile
stt = CoremlSTT(config={"metadata": "metadata.json"})
with AudioFile("speech.wav") as f:
audio = f.read()
print(stt.execute(audio))