Arabic Parakeet TDT β UAE Dialect
π§ Work in Progress β This model is under active development. Results will improve.
Model Description
Fine-tuned nvidia/parakeet-tdt-1.1b (English-only FastConformer + TDT) for Arabic UAE dialect speech recognition via cross-lingual transfer learning.
Training Details
- Base model: nvidia/parakeet-tdt-1.1b (1.1B params, FastConformer encoder + TDT decoder)
- Training data:
22k Arabic UAE dialect samples (39 hours) - Tokenizer: SentencePiece Unigram (1024 vocab) trained on Arabic text
- Strategy: Encoder frozen for 10 epochs, then unfrozen with differential LR (encoder 1e-5, decoder 3e-4)
- Text normalization: Diacritics removed, alef/teh marbuta normalized, punctuation stripped
- Epochs: 50
- Best val WER: 0.641
Current Results
| Metric | Value |
|---|---|
| Val WER | 0.641 |
Usage
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.restore_from("arabic-parakeet-tdt-uae.nemo")
transcriptions = model.transcribe(["audio.wav"])
print(transcriptions)
Limitations
- WER is still high (~64%) β cross-lingual transfer from English to Arabic is challenging with limited data
- Repetition artifacts in longer utterances (common RNNT issue)
- Trained on synthetic/generated Arabic speech data
- Not suitable for production use yet
Next Steps
- Pre-train on large Arabic dataset (MGB-2, 1200 hours) before dialect fine-tuning
- Address decoder repetition issues
- Evaluate on more diverse test sets
License
Apache 2.0 (same as base model)
- Downloads last month
- 2
Model tree for vadimbelsky/arabic-parakeet-tdt-uae
Base model
nvidia/parakeet-tdt-1.1bEvaluation results
- WER on UAE Arabic Validationself-reported0.641