Parakeet TDT 0.6B v3 โ Basque (Euskara)
Basque ASR model obtained by fine-tuning nvidia/parakeet-tdt-0.6b-v3 on asierhv/composite_corpus_eu_v2.1.
What this model is
- Base architecture: FastConformer RNNT-TDT (Parakeet TDT 0.6B v3)
- Language: Basque (
eu) - Target task: Automatic speech recognition (16 kHz speech)
- Framework: NVIDIA NeMo
- Artifact format:
.nemo
Training data provenance
Fine-tuning used the composite Basque corpus:
This composite corpus includes Basque speech/transcript data from:
- Mozilla Common Voice (Basque)
- Basque Parliament
- OpenSLR (Basque)
Reference datasets:
- Mozilla Common Voice: https://commonvoice.mozilla.org/
- Basque Parliament corpus (as included in the composite dataset above)
- OpenSLR Basque resources: https://www.openslr.org/
Evaluation summary
WER was computed on held-out test splits created from the same composite source family.
| Split | Baseline WER (base model) | Fine-tuned WER | Absolute gain |
|---|---|---|---|
test_cv |
108.47% | 6.92% | +101.55 |
test_parl |
107.61% | 4.36% | +103.25 |
test_oslr |
108.52% | 14.52% | +94.00 |
Notes:
- Baseline is the original multilingual/English-oriented base model, which performs poorly on Basque (WER > 100% is expected).
- Fine-tuned metrics above are from the final selected checkpoint/model export.
Quick start
1) Load from local .nemo
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.restore_from("parakeet-tdt-0.6b-v3-basque.nemo")
asr_model = asr_model.cuda().eval() # optional if GPU is available
pred = asr_model.transcribe(
audio=["/path/to/audio.wav"],
use_lhotse=False,
batch_size=1,
num_workers=0,
verbose=False,
)
# pred may be list[str] or list[Hypothesis], depending on settings/version
if len(pred) and hasattr(pred[0], "text"):
print(pred[0].text)
else:
print(pred[0])
2) Load from Hugging Face repo
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.ASRModel.from_pretrained(
model_name="xezpeleta/parakeet-tdt-0.6b-v3-basque"
)
out = asr_model.transcribe(
audio=["/path/to/audio.wav"],
use_lhotse=False,
batch_size=1,
num_workers=0,
verbose=False,
)
print(out[0].text if hasattr(out[0], "text") else out[0])
3) Batch transcription snippet
audio_paths = [
"/path/a.wav",
"/path/b.wav",
"/path/c.wav",
]
outs = asr_model.transcribe(
audio=audio_paths,
use_lhotse=False,
batch_size=16,
num_workers=4,
verbose=True,
)
texts = [o.text if hasattr(o, "text") else o for o in outs]
for path, txt in zip(audio_paths, texts):
print(path, "=>", txt)
Recommended runtime settings
- Use 16 kHz mono WAV when possible.
- For NeMo 2.7 environments, use
use_lhotse=Falseintranscribe(...)if you observe inference/runtime issues in constrained container setups. - Increase
batch_sizegradually to match available GPU memory.
Fine-tuning recipe (high-level)
- Full-model fine-tuning from
nvidia/parakeet-tdt-0.6b-v3 - Optimizer: AdamW
- LR:
1e-4 - Scheduler: CosineAnnealing with warmup
- Effective batch size: 64 (
batch_size=8, gradient accumulation=8) - Precision: BF16 mixed
- Hardware used: NVIDIA L40 (48 GB)
Limitations
- Performance can degrade on strong code-switching, noisy far-field audio, or domains far from training data.
- This model is optimized for Basque ASR; behavior on other languages is not the target.
- Proper text normalization and punctuation/casing post-processing may still be needed for production use.
Citation and acknowledgements
If you use this model, please cite/credit:
- Base model: nvidia/parakeet-tdt-0.6b-v3
- Training dataset: asierhv/composite_corpus_eu_v2.1
- Underlying source collections: Mozilla Common Voice, Basque Parliament corpus, OpenSLR Basque resources
License
This derivative model follows the base model and dataset terms. Keep attribution and license obligations from all upstream assets.
- Downloads last month
- 2
Model tree for itzune/parakeet-tdt-0.6b-v3-basque
Evaluation results
- test_cv WER on Composite Basque test splits (CV/Parliament/OSLR)self-reported6.920
- test_parl WER on Composite Basque test splits (CV/Parliament/OSLR)self-reported4.360
- test_oslr WER on Composite Basque test splits (CV/Parliament/OSLR)self-reported14.520