Parakeet TDT 0.6B v3 — Basque (Euskara)

Basque ASR model obtained by fine-tuning nvidia/parakeet-tdt-0.6b-v3 on asierhv/composite_corpus_eu_v2.1.

What this model is

Base architecture: FastConformer RNNT-TDT (Parakeet TDT 0.6B v3)
Language: Basque (eu)
Target task: Automatic speech recognition (16 kHz speech)
Framework: NVIDIA NeMo
Artifact format: .nemo

Training data provenance

Fine-tuning used the composite Basque corpus:

asierhv/composite_corpus_eu_v2.1

This composite corpus includes Basque speech/transcript data from:

Mozilla Common Voice (Basque)
Basque Parliament
OpenSLR (Basque)

Reference datasets:

Mozilla Common Voice: https://commonvoice.mozilla.org/
Basque Parliament corpus (as included in the composite dataset above)
OpenSLR Basque resources: https://www.openslr.org/

Evaluation summary

WER was computed on held-out test splits created from the same composite source family.

Split	Baseline WER (base model)	Fine-tuned WER	Absolute gain
`test_cv`	108.47%	6.92%	+101.55
`test_parl`	107.61%	4.36%	+103.25
`test_oslr`	108.52%	14.52%	+94.00

Notes:

Baseline is the original multilingual/English-oriented base model, which performs poorly on Basque (WER > 100% is expected).
Fine-tuned metrics above are from the final selected checkpoint/model export.

Quick start

1) Load from local `.nemo`

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.ASRModel.restore_from("parakeet-tdt-0.6b-v3-basque.nemo")
asr_model = asr_model.cuda().eval()  # optional if GPU is available

pred = asr_model.transcribe(
    audio=["/path/to/audio.wav"],
    use_lhotse=False,
    batch_size=1,
    num_workers=0,
    verbose=False,
)

# pred may be list[str] or list[Hypothesis], depending on settings/version
if len(pred) and hasattr(pred[0], "text"):
    print(pred[0].text)
else:
    print(pred[0])

2) Load from Hugging Face repo

import nemo.collections.asr as nemo_asr

asr_model = nemo_asr.models.ASRModel.from_pretrained(
    model_name="xezpeleta/parakeet-tdt-0.6b-v3-basque"
)

out = asr_model.transcribe(
    audio=["/path/to/audio.wav"],
    use_lhotse=False,
    batch_size=1,
    num_workers=0,
    verbose=False,
)

print(out[0].text if hasattr(out[0], "text") else out[0])

3) Batch transcription snippet

audio_paths = [
    "/path/a.wav",
    "/path/b.wav",
    "/path/c.wav",
]

outs = asr_model.transcribe(
    audio=audio_paths,
    use_lhotse=False,
    batch_size=16,
    num_workers=4,
    verbose=True,
)

texts = [o.text if hasattr(o, "text") else o for o in outs]
for path, txt in zip(audio_paths, texts):
    print(path, "=>", txt)

Recommended runtime settings

Use 16 kHz mono WAV when possible.
For NeMo 2.7 environments, use use_lhotse=False in transcribe(...) if you observe inference/runtime issues in constrained container setups.
Increase batch_size gradually to match available GPU memory.

Fine-tuning recipe (high-level)

Full-model fine-tuning from nvidia/parakeet-tdt-0.6b-v3
Optimizer: AdamW
LR: 1e-4
Scheduler: CosineAnnealing with warmup
Effective batch size: 64 (batch_size=8, gradient accumulation=8)
Precision: BF16 mixed
Hardware used: NVIDIA L40 (48 GB)

Limitations

Performance can degrade on strong code-switching, noisy far-field audio, or domains far from training data.
This model is optimized for Basque ASR; behavior on other languages is not the target.
Proper text normalization and punctuation/casing post-processing may still be needed for production use.

Citation and acknowledgements

If you use this model, please cite/credit:

Base model: nvidia/parakeet-tdt-0.6b-v3
Training dataset: asierhv/composite_corpus_eu_v2.1
Underlying source collections: Mozilla Common Voice, Basque Parliament corpus, OpenSLR Basque resources

License

This derivative model follows the base model and dataset terms. Keep attribution and license obligations from all upstream assets.

Downloads last month: 16

Model tree for itzune/parakeet-tdt-0.6b-v3-basque

Base model

nvidia/parakeet-tdt-0.6b-v3

Finetuned

(44)

this model

Quantizations

2 models

Evaluation results

test_cv WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

6.920
test_parl WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

4.360
test_oslr WER on Composite Basque test splits (CV/Parliament/OSLR)
self-reported

14.520