SBPN Multilingual Base

Model architecture | Model size | enyoigpcmha

Description.

SBPN_multilingual_base is among a family of models trained to acurately transcribe utterances in Yoruba, Nigerian Pidgin, Hausa, Igbo, and Nigerian Engish. SBPN models currently outperform other open-source models on these Nigerian languages. This is the SBPN base (120M parameter) version. Get the large (600M parameter) version of the model here: SBPN_multilingual_large.

NVIDIA NeMo: Training

To train, fine-tune or play with the model you will need to install NVIDIA NeMo. We recommend you install it after you've installed latest Pytorch version.

pip install nemo_toolkit['all']

How to Use this Model

The model is available for use in the NeMo toolkit [1], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another language, espcially other Nigerian languages.

Automatically instantiate the model

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(model_name="ogunlao/SBPN_multilingual_base")
### Transcribing using Python
First, let's get a sample
```
audio_path = "audio_sample_in_pidgin_english.wav"
```
Then simply do:
```
output = asr_model.transcribe([audio_path])
```

Remove the language tag
```
prediction = output[0].text
prediction = re.sub(r'<.*?>', '', prediction)

```

### For language identification
Take the prediction and get the first token
```
prediction = output[0].text
language_id = prediction.split()[0]
```

Input

This model accepts 16000 Hz mono-channel audio (wav files) as input.

Output

This model provides transcribed speech (including a language id tag) as a string for a given audio sample.

Training

See the written report (soon to be) available on Arxiv

Training Datasets

datasets:
- NaijaVoices
- SLR86
- GigaSpeech-L
- AfriSpeech-200
- Nigerian Pidgin Dataset
- Fleurs datasets in each language
- Igbo-asr
- BibleTTS

Performance

SBPN family of models outperform state-of-the-art monolingual and multilingual ASR models on Nigerian languages across major benchmarks, including Common Voice and Fleurs. SBPN was evaluated using the Word Error Rate metric.

SBPN_multilingual_base may also perform well on standard English, and other variants of Pidgin English, especially closely related langages like Ghanaian Pidgin and Cameroonian Pidgin, however this was not evaluated.

Limitations

This model may not be used for transcribing speech outside the list of languages it was trained on, especially when being used in real world cases. The model is also released for research purposes only, as it was trained on datasets which have varying licences.

License

SBPN family of models have been trained on datasets with different licenses. As such it cannot be released as a fully open-source model. Therefore, license to use this model is covered by the CC BY-NC-SA 4.0. Basically, It's free for research (non-commercial) use. By downloading the public and release version of the model, you accept the terms and conditions of the CC BY-NC-SA 4.0 license.

References

[1] NVIDIA NeMo Toolkit

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ogunlao/SBPN_multilingual_base

Finetuned
(5)
this model

Datasets used to train ogunlao/SBPN_multilingual_base