Speech Recognition Models
Collection
Models for Welsh language and bilingual speech recognition • 6 items • Updated
A fine-tuned openai/whisper-large-v2 model for Welsh and English automatic speech recognition, with Welsh-to-English speech translation capability.
| Task | Description |
|---|---|
| Welsh transcription | Welsh audio → Welsh text |
| English transcription | English audio (UK/Irish accents) → English text |
| Welsh→English translation | Welsh audio → English text |
| Test Set | WER | CER |
|---|---|---|
| cymen-arfor/lleisiau-arfor (spontaneous) | 29.79 | 11.67 |
| techiaith/banc-trawsgrifiadau-bangor (mixed) | 27.65 | 9.81 |
| techiaith/commonvoice-23-0-cy (read) | 14.97 | 4.26 |
| Test Set | WER | CER |
|---|---|---|
| techiaith/commonvoice-23-0-en/GB-IE (read, UK/Irish) | 9.92 | 3.47 |
| Test Set | BLEU | chrF |
|---|---|---|
| techiaith/commonvoice-23-0-cy-en | 18.17 | 38.13 |
Total training data: ~177 hours across 153,066 clips.
| Dataset | Language | Duration | Clips | Description |
|---|---|---|---|---|
| techiaith/banc-trawsgrifiadau-bangor | Welsh | 52:45h | 48,569 | Mixed spontaneous & read speech |
| techiaith/corpws-clllc-wlga | Welsh | 32:59h | 26,216 | Local government meetings |
| cymen-arfor/lleisiau-arfor | Welsh | 33:54h | 33,614 | Spontaneous conversational speech |
| techiaith/commonvoice_23_0_cy | Welsh | 31:11h | 20,018 | Read speech (CommonVoice 23.0) |
| techiaith/commonvoice_vad_cy | Welsh | 3:27h | 8,209 | VAD-segmented clips |
| techiaith/commonvoice_23_0_en__GB_IE | English | 22:26h | 16,440 | Read speech, UK/Irish accents (10% sample) |
Validation: techiaith/banc-trawsgrifiadau-bangor validation split (4:00h, 3,895 clips).
| Parameter | Value |
|---|---|
| Base model | openai/whisper-large-v2 |
| Learning rate | 1e-5 |
| LR scheduler | cosine |
| Warmup steps | 500 |
| Max steps | 8,000 |
| Weight decay | 0.05 |
| Batch size | 16 × 2 accumulation × 2 GPUs = 64 effective |
| FP16 | True |
| SpecAugment | False |
| Early stopping patience | 5 |
| Best checkpoint | step 7,000 |
| Best eval WER | 28.0% |
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="techiaith/whisper-large-ft-cy-en",
)
# Welsh transcription
result = pipe("welsh_audio.wav", generate_kwargs={"language": "cy", "task": "transcribe"})
# English transcription
result = pipe("english_audio.wav", generate_kwargs={"language": "en", "task": "transcribe"})
# Welsh to English translation
result = pipe("welsh_audio.wav", generate_kwargs={"language": "cy", "task": "translate"})
A CTranslate2 (int8 quantised) version is available at techiaith/whisper-large-ft-cy-en-ct2 for faster inference.
Developed by Uned Technolegau Iaith, Prifysgol Bangor / Language Technologies Unit, Bangor University.
Funded by the Welsh Government.
Base model
openai/whisper-large-v2