w2v-BERT 2.0 (Galician Fine-Tuned, CTC)

This model is a fine-tuned version of facebook/w2v-bert-2.0 for automatic speech recognition (ASR) in Galician (gl), trained using a CTC objective.

The model is optimised for Galician speech and evaluated across multiple domains, including read speech, broadcast-style audio and conversational content.

Training Data

The model was trained on a combined Galician ASR dataset built from several public and curated corpora.
All audio was normalised to 16 kHz, and all transcripts were standardised to a homogeneous text format.

Datasets Included

Common Voice v23 (Galician)
OpenSLR Speech Translation GL-EN (Galician side)
FLEURS GL-EN (Galician side)
FalAI (20% of validated split)
Transcrispeech (Galician)
RG-Podcast (Galician)

These datasets cover clean read speech, semi-spontaneous speech, and more challenging acoustic conditions.

Dataset Preparation

Audio resampled to 16 kHz
Removal of empty, corrupt or invalid audio
Minimum audio duration: 1 second
Text normalisation:
- Lowercasing
- Unicode normalisation
- Removal of punctuation
- Removal of empty transcripts

Tokenization and Vocabulary

A character-level CTC vocabulary was constructed specifically for Galician.

Supported characters:
abcdefghijklmnopqrstuvwxyzáéíóúñç
Word boundaries represented using the | token
Special tokens:
- [UNK]
- [PAD]

The final vocabulary is stored in vocab.json.

Training Procedure

Fine-tuning was performed using the 🤗 Transformers Trainer with a CTC loss.

Base model: facebook/w2v-bert-2.0
Architecture: Wav2Vec2BertForCTC
Adapters enabled: Yes

Training Configuration

Effective batch size: 16
Per-device batch size: 8
Gradient accumulation steps: 2
Learning rate: 5e-6
Training epochs: 5
Warmup ratio: 0.1
Precision: FP16
Gradient checkpointing: Enabled
Max gradient norm: 1.0
Evaluation & checkpointing: Every 2000 steps
Checkpoint limit: 2

Audio features were extracted using SeamlessM4TFeatureExtractor, and text was tokenized with a custom Wav2Vec2CTCTokenizer.

Evaluation Results

Evaluation was performed on held-out splits for each corpus and on a combined test set.
Metrics are reported as WER (Word Error Rate) and CER (Character Error Rate).

Fine-Tuned Model Results

Per-corpus results

Corpus	N	WER	CER
FalAI	4776	0.0445	0.0099
CommonVoice	14563	0.0628	0.0124
OpenSLR	282	0.1340	0.0406
FLEURS	212	0.1330	0.0447
Transcrispeech	1710	0.1410	0.0481
RG-Podcast	2015	0.1692	0.0654

Combined test set

Dataset	N	WER	CER
TOTAL	23558	0.1163	0.0383

Comparison with Whisper

WER comparison against Whisper-based models evaluated on the same datasets:

Corpus	w2v-BERT WER	Whisper WER
FalAI	0.0445	0.0097
CommonVoice	0.0628	0.0688
OpenSLR	0.1340	0.0808
FLEURS	0.1330	0.1980
Transcrispeech	0.1410	0.2097
RG-Podcast	0.1692	—

Intended Use and Limitations

This model is intended for Galician ASR research and transcription pipelines, particularly in CTC-based or streaming-friendly setups.

Performance may degrade on highly spontaneous speech or extremely noisy audio.
The model is monolingual (Galician-only) and not intended for multilingual ASR or speech translation.

Contact information

For further information, send an email to proxecto.nos@usc.gal

Licensing information

Apache License, Version 2.0

Acknowledgements

This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the project Desarrollo de Modelos ALIA. (Esta publicación del proyecto Desarrollo de Modelos ALIA está financiada por el Ministerio para la Transformación Digital y de la Función Pública y por el Plan de Recuperación, Transformación y Resiliencia – Financiado por la Unión Europea – NextGenerationEU).

Thanks also to Balidea for the technical development of this model.

Citation

@misc{proxectenos2026w2v-bert-2.0-gl,
  author       = {{Proxecto Nós}},
  title        = {{w2v-BERT 2.0} (Galician Fine-Tuned, CTC)},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/proxectonos/w2v-bert-2.0-gl/}},
}

Downloads last month: -

Model tree for proxectonos/w2v-bert-2.0-gl

Base model

facebook/w2v-bert-2.0

Finetuned

(480)

this model

Datasets used to train proxectonos/w2v-bert-2.0-gl

Collection including proxectonos/w2v-bert-2.0-gl

ASR Models

Collection

Automatic Speech Recognition models • 5 items • Updated May 13