Parakeet TDT 0.6B v3 โ€” Basque (Euskara)

Basque ASR model obtained by fine-tuning nvidia/parakeet-tdt-0.6b-v3 on asierhv/composite_corpus_eu_v2.1.

What this model is

  • Base architecture: FastConformer RNNT-TDT (Parakeet TDT 0.6B v3)
  • Language: Basque (eu)
  • Target task: Automatic speech recognition (16 kHz speech)
  • Framework: NVIDIA NeMo
  • Artifact format: .nemo

Training data provenance

Fine-tuning used the composite Basque corpus:

This composite corpus includes Basque speech/transcript data from:

  • Mozilla Common Voice (Basque)
  • Basque Parliament
  • OpenSLR (Basque)

Reference datasets:

Evaluation summary

WER was computed on held-out test splits created from the same composite source family.

Split Baseline WER (base model) Fine-tuned WER Absolute gain
test_cv 108.47% 6.92% +101.55
test_parl 107.61% 4.36% +103.25
test_oslr 108.52% 14.52% +94.00

Notes:

  • Baseline is the original multilingual/English-oriented base model, which performs poorly on Basque (WER > 100% is expected).
  • Fine-tuned metrics above are from the final selected checkpoint/model export.

Quick start

1) Load from local .nemo

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.ASRModel.restore_from("parakeet-tdt-0.6b-v3-basque.nemo")
asr_model = asr_model.cuda().eval()  # optional if GPU is available

pred = asr_model.transcribe(
    audio=["/path/to/audio.wav"],
    use_lhotse=False,
    batch_size=1,
    num_workers=0,
    verbose=False,
)

# pred may be list[str] or list[Hypothesis], depending on settings/version
if len(pred) and hasattr(pred[0], "text"):
    print(pred[0].text)
else:
    print(pred[0])

2) Load from Hugging Face repo

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.ASRModel.from_pretrained(
    model_name="xezpeleta/parakeet-tdt-0.6b-v3-basque"
)

out = asr_model.transcribe(
    audio=["/path/to/audio.wav"],
    use_lhotse=False,
    batch_size=1,
    num_workers=0,
    verbose=False,
)

print(out[0].text if hasattr(out[0], "text") else out[0])

3) Batch transcription snippet

audio_paths = [
    "/path/a.wav",
    "/path/b.wav",
    "/path/c.wav",
]

outs = asr_model.transcribe(
    audio=audio_paths,
    use_lhotse=False,
    batch_size=16,
    num_workers=4,
    verbose=True,
)

texts = [o.text if hasattr(o, "text") else o for o in outs]
for path, txt in zip(audio_paths, texts):
    print(path, "=>", txt)

Recommended runtime settings

  • Use 16 kHz mono WAV when possible.
  • For NeMo 2.7 environments, use use_lhotse=False in transcribe(...) if you observe inference/runtime issues in constrained container setups.
  • Increase batch_size gradually to match available GPU memory.

Fine-tuning recipe (high-level)

  • Full-model fine-tuning from nvidia/parakeet-tdt-0.6b-v3
  • Optimizer: AdamW
  • LR: 1e-4
  • Scheduler: CosineAnnealing with warmup
  • Effective batch size: 64 (batch_size=8, gradient accumulation=8)
  • Precision: BF16 mixed
  • Hardware used: NVIDIA L40 (48 GB)

Limitations

  • Performance can degrade on strong code-switching, noisy far-field audio, or domains far from training data.
  • This model is optimized for Basque ASR; behavior on other languages is not the target.
  • Proper text normalization and punctuation/casing post-processing may still be needed for production use.

Citation and acknowledgements

If you use this model, please cite/credit:

  1. Base model: nvidia/parakeet-tdt-0.6b-v3
  2. Training dataset: asierhv/composite_corpus_eu_v2.1
  3. Underlying source collections: Mozilla Common Voice, Basque Parliament corpus, OpenSLR Basque resources

License

This derivative model follows the base model and dataset terms. Keep attribution and license obligations from all upstream assets.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for itzune/parakeet-tdt-0.6b-v3-basque

Finetuned
(21)
this model
Quantizations
2 models

Evaluation results

  • test_cv WER on Composite Basque test splits (CV/Parliament/OSLR)
    self-reported
    6.920
  • test_parl WER on Composite Basque test splits (CV/Parliament/OSLR)
    self-reported
    4.360
  • test_oslr WER on Composite Basque test splits (CV/Parliament/OSLR)
    self-reported
    14.520