Instructions to use BSC-LT/catalan-verification-model-pkt-b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use BSC-LT/catalan-verification-model-pkt-b with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("BSC-LT/catalan-verification-model-pkt-b") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
catalan-verification-model-pkt-b
Table of Contents
Click to expand
Model Summary
We define verification models as ASR models specifically designed to assess the reliability of transcriptions. These models are particularly useful when no reference transcription is available, as they can generate hypotheses with a certain degree of confidence.
The core idea behind verification models is to train or fine-tune two or more ASR models on different datasets. If these models produce identical transcriptions for the same audio input, the result is likely to be accurate. Furthermore, if a verification model agrees with an existing reference transcription, this agreement can also be interpreted as a signal of reliability.
In this model card, we present Verification Model B for Catalan, available as "catalan-verification-model-pkt-b". This acoustic model is based on "nvidia/parakeet-rnnt-1.1b" and is designed for Automatic Speech Recognition in Catalan. It is intended to be used in tandem with Verification Model A, "catalan-verification-model-pkt-a", to enable cross-verification and boost transcription confidence in unannotated or weakly supervised scenarios.
The datasets used to train models A and B were partitioned between the two models using the following pseudocode:
01: dataset_A = list
02: dataset_B = list
03: for index, recording in training_corpus:
04: {
05: if index is an even number:
06: {
07: dataset_A=dataset_A+recording[index]
08: }
09: else:
10: {
11: dataset_B=dataset_B+recording[index]
12: }
13: }
Intended Uses and Limitations
This model is designed for the following scenarios:
Verification of transcriptions: When two or more verification models produce the same output for a given audio segment, the transcription can be considered highly reliable. This is particularly useful in low-resource or weakly supervised settings.
Transcription without references: In situations where no reference transcription exists, this model can still produce a hypothesis that, when corroborated by a second verification model, may be considered trustworthy.
Data filtering and quality control: It can be used to automatically detect and retain high-confidence segments in large-scale speech datasets (e.g., for training or evaluation purposes).
Human-in-the-loop workflows: These models can assist human annotators by flagging reliable transcriptions, helping reduce manual verification time.
As limitations, we identify the following:
No ground-truth guarantee: Agreement between models does not guarantee correctness; it only increases the likelihood of reliability.
Domain sensitivity: The accuracy and agreement rate may drop if used on speech data that differs significantly from the training domain (e.g., different accents, topics, or recording conditions).
Designed for pairwise comparison: This model is intended to work in conjunction with at least one other verification model. Using it in isolation does not provide verification benefits.
Language and model-specific: This particular model is optimized for Catalan and based on the Parakeet RNNT architecture. Performance in other languages or under different acoustic models may vary significantly.
How to Get Started with the Model
To see an updated and functional version of this code, please visit NVIDIA's official repository
Installation
To use this model, you may install the NVIDIA NeMo Framework:
Create a virtual environment:
python -m venv /path/to/venv
Activate the environment:
source /path/to/venv/bin/activate
Install the modules:
BRANCH = 'main'
python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]
For Inference
To transcribe audio in Spanish using this model, you can follow this example:
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="projecte-aina/parakeet-rnnt-1.1b_cv17_es_ep18_1270h")
output = asr_model.transcribe(['YOUR_WAV_FILE.wav'])
print(output[0].text)
Training Details
Training data
The specific datasets used to create the model are:
- Mozilla Common Voice 17.0 (Catalan)
- 3CatParla (Soon to be published).
- Corts Valencianes
Training procedure
This model is the result of finetuning the model "parakeet-rnnt-1.1b" by following this tutorial
Training Hyperparameters
- language: Catalan
- hours of training audio: 1799
- learning rate: 2e-4
- devices=4
- num_nodes=8
- batch_size=8
- accelerator=accelerator
- strategy="ddp"
- max_epochs=20
- enable_checkpointing=True
- logger=False
- log_every_n_steps=100
- check_val_every_n_epoch=1
- precision='bf16-mixed'
- callbacks=[checkpoint_callback]
Citation
If this model contributes to your research, please cite the work:
@misc{bsc-catvermodel-pkt-a-2025,
title={Catalan Verification Model Parakeet B},
author={Hernandez Mena, Carlos Daniel; Messaoudi, Abir; España-Bonet, Cristina;},
organization={Barcelona Supercomputing Center},
url={https://huggingface.co/langtech-veu/catalan-verification-model-pkt-b},
year={2025}
}
Additional Information
Author
The fine-tuning process was performed during June (2025) in the Language Technologies Laboratory of the Barcelona Supercomputing Center by Carlos Daniel Hernández Mena supervised by Cristina España-Bonet. The validation of the model was performed by Abir Messaoudi.
Contact
For further information, please email bsc-lt@bsc.es.
Copyright
Copyright(c) 2025 by Language Technologies Laboratory, Barcelona Supercomputing Center.
License
Funding
This work/research has been promoted and financed by the Government of Catalonia through the Aina project.
The training of the model was possible thanks to the computing time provided by Barcelona Supercomputing Center through MareNostrum 5.
We acknowledge the EuroHPC Joint Undertaking for awarding us access to MareNostrum5 as BSC, Spain.
- Downloads last month
- 6
Evaluation results
- WER on Common Voice 17.0 Catalan (Test)test set self-reported3.735
- WER on Common Voice 17.0 Catalan (Dev)self-reported3.409