|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- mozilla-foundation/common_voice_17_0 |
|
|
- CSTR-Edinburgh/vctk |
|
|
base_model: |
|
|
- openai/whisper-small |
|
|
--- |
|
|
# Model Card for Model ID |
|
|
|
|
|
Model name: Amirjab21/accent-classifier |
|
|
|
|
|
Task: Accent classification (audio → accent label) |
|
|
|
|
|
Supported input: 16 kHz mono audio waveform (float32 or int16) in NumPy array |
|
|
|
|
|
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). |
|
|
|
|
|
|
|
|
|
|
|
- **Developed by:** Amir Jabarivasal |
|
|
- **Finetuned from model [optional]:** Openai/whisper-small |
|
|
|
|
|
### Model Sources [optional] |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** https://github.com/Amirjab21/accents |
|
|
- **Paper [optional]:** https://amirjab21.github.io/?blog=0 |
|
|
- **Demo [optional]:** Accentgame.xyz |
|
|
|
|
|
## Uses |
|
|
|
|
|
Classify accents |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
``` |
|
|
ID_TO_ACCENT = { |
|
|
0: "Scottish", |
|
|
1: "English", |
|
|
2: "Indian", |
|
|
3: "Irish", |
|
|
4: "Welsh", |
|
|
5: "New Zealand", |
|
|
6: "Australian", |
|
|
7: "South African", |
|
|
8: "Canadian", |
|
|
9: "NorthernIrish", |
|
|
10: "American", |
|
|
11: "South East Asia", |
|
|
12: "Eastern Europe", |
|
|
13: "East Asia", |
|
|
14: "Nordic", |
|
|
15: "France", |
|
|
16: "Southern Europe", |
|
|
17: "Germany", |
|
|
18: "West Indies", |
|
|
19: "Western Africa", |
|
|
20: "South Asia", |
|
|
} |
|
|
|
|
|
import soundfile as sf |
|
|
import torch |
|
|
from scipy import signal |
|
|
|
|
|
audio_array, sr = sf.read(audio_path) |
|
|
if audio_array.ndim > 1: |
|
|
audio_array = audio_array.mean(axis=1) |
|
|
if sr != 16000: |
|
|
audio_array = signal.resample(audio_array, int(len(audio_array)*16000/sr)) |
|
|
input_features = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features |
|
|
output, pooled_embed = model(input_features) |
|
|
probabilities = torch.nn.functional.softmax(output, dim=1) |
|
|
predictions = torch.argmax(probabilities, dim=1) |
|
|
predicted_accent = ID_TO_ACCENT[predictions.item()] |
|
|
accent_probabilities = {ID_TO_ACCENT[i]: prob.item() for i, prob in enumerate(probabilities[0])} |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|