Arabic Parakeet TDT β€” UAE Dialect

🚧 Work in Progress β€” This model is under active development. Results will improve.

Model Description

Fine-tuned nvidia/parakeet-tdt-1.1b (English-only FastConformer + TDT) for Arabic UAE dialect speech recognition via cross-lingual transfer learning.

Training Details

  • Base model: nvidia/parakeet-tdt-1.1b (1.1B params, FastConformer encoder + TDT decoder)
  • Training data: 22k Arabic UAE dialect samples (39 hours)
  • Tokenizer: SentencePiece Unigram (1024 vocab) trained on Arabic text
  • Strategy: Encoder frozen for 10 epochs, then unfrozen with differential LR (encoder 1e-5, decoder 3e-4)
  • Text normalization: Diacritics removed, alef/teh marbuta normalized, punctuation stripped
  • Epochs: 50
  • Best val WER: 0.641

Current Results

Metric Value
Val WER 0.641

Usage

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.restore_from("arabic-parakeet-tdt-uae.nemo")
transcriptions = model.transcribe(["audio.wav"])
print(transcriptions)

Limitations

  • WER is still high (~64%) β€” cross-lingual transfer from English to Arabic is challenging with limited data
  • Repetition artifacts in longer utterances (common RNNT issue)
  • Trained on synthetic/generated Arabic speech data
  • Not suitable for production use yet

Next Steps

  • Pre-train on large Arabic dataset (MGB-2, 1200 hours) before dialect fine-tuning
  • Address decoder repetition issues
  • Evaluate on more diverse test sets

License

Apache 2.0 (same as base model)

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for vadimbelsky/arabic-parakeet-tdt-uae

Finetuned
(6)
this model

Evaluation results