File size: 3,029 Bytes
af083aa 543f14b af083aa 71aa631 af083aa ef7bb14 af083aa ef7bb14 af083aa e66cbc2 af083aa ef7bb14 af083aa ef7bb14 af083aa ef7bb14 af083aa b8e7760 af083aa 35a525c a9738dd 35a525c af083aa 35a525c 7f61540 35a525c 675cfb3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | ---
license: cc-by-4.0
datasets:
- Ugiat/voxlingua107_IberLang
language:
- ca
- es
- oc
- gl
- eu
metrics:
- accuracy
base_model:
- openai/whisper-medium
pipeline_tag: audio-classification
tags:
- Language Recognition
---
# IberLang Classifier
## Model Overview
The IberLang classifier is a fine-tuned version of the Whisper Medium model, developed specifically for language recognition across the Iberian linguistic spectrum. Trained to accurately identify Spanish, Catalan, Galician, Euskera (Basque), and Occitan, this model enhances Whisper’s multilingual capabilities for regional language identification tasks.
The pre-trained base used for fine-tuning was: [openai/whisper-medium](https://huggingface.co/openai/whisper-medium).
## Quickstart
```py
from transformers import pipeline
import torch, librosa
classifier = pipeline(
"audio-classification",
model="Ugiat/IberLang",
device=0 if torch.cuda.is_available() else -1
)
audio_path = "sample.wav"
audio, _ = librosa.load(audio_path, sr=16000)
prediction = classifier(audio)
print(prediction[0]["label"])
```
## Performance Evaluation
We evaluated the fine-tuned IberLang classifier against Whisper Large V3 using a reserved subset of our custom [VoxLingua107 IberLang ](https://huggingface.co/datasets/Ugiat/voxlingua107_IberLang) dataset containing 1200 audios. The results show substantial performance gains, particularly in the recognition of minority Iberian languages.
| Model | Catalan | Basque | Galician | Occitan | Spanish |
|----------------|------------------|---------------|-----------|------------------|---------------|
| IberLang | 0.902 | 0.96 | 0.915 | 0.655 | 1.0 |
| Whisper-Large-V3 | 0.902 | 0.68 | 0.188 | 0.0 | 0.978 |
## Fine-Tuning Process
The fine-tuning process followed a structured approach, including dataset preparation, model training, and optimization:
- **Data Splitting:** The dataset was shuffled and split into training (90%) and testing (10%) subsets.
- **Training Setup:**
- Batch size: 4
- Gradient accumulation steps: 8
- Epoch: 3
- Learning rate: 1e-5
- Scheduler: Linear
- Evaluation frequency: Every 300 steps
- Checkpointing: Every 300 steps
## License
This model, **IberLang**, is a fine-tuned version of [Whisper Medium](https://huggingface.co/openai/whisper-medium)
by **OpenAI**, licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
Fine-tuning and additional modifications were performed by **Ugiat Technologies** to improve
multilingual language identification for **Catalan, Galician, Basque, Spanish, and Occitan**.
The resulting model and associated documentation are released under the
[Creative Commons Attribution 4.0 International License (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/).
When using this model, please cite both the original Whisper project and this fine-tuned version as appropriate.
|