File size: 2,141 Bytes
780942f
 
 
 
 
 
 
2353b6d
 
 
b45172a
2353b6d
b45172a
2353b6d
b45172a
2353b6d
b45172a
2353b6d
 
 
b45172a
 
2353b6d
 
 
 
 
b45172a
 
 
2353b6d
 
 
b45172a
2353b6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: mit
datasets:
- mozilla-foundation/common_voice_17_0
- CSTR-Edinburgh/vctk
base_model:
- openai/whisper-small
---
# Model Card for Model ID

Model name: Amirjab21/accent-classifier

Task: Accent classification (audio → accent label)

Supported input: 16 kHz mono audio waveform (float32 or int16) in NumPy array

This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).



- **Developed by:** Amir Jabarivasal
- **Finetuned from model [optional]:** Openai/whisper-small

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/Amirjab21/accents
- **Paper [optional]:** https://amirjab21.github.io/?blog=0
- **Demo [optional]:** Accentgame.xyz

## Uses

Classify accents


## How to Get Started with the Model

Use the code below to get started with the model.
```
ID_TO_ACCENT = {
    0: "Scottish",
    1: "English",
    2: "Indian",
    3: "Irish",
    4: "Welsh",
    5: "New Zealand",
    6: "Australian",
    7: "South African",
    8: "Canadian",
    9: "NorthernIrish",
    10: "American",
    11: "South East Asia",
    12: "Eastern Europe",
    13: "East Asia",
    14: "Nordic",
    15: "France",
    16: "Southern Europe",
    17: "Germany",
    18: "West Indies",
    19: "Western Africa",
    20: "South Asia",
}

import soundfile as sf
import torch
from scipy import signal

audio_array, sr = sf.read(audio_path)
if audio_array.ndim > 1:
    audio_array = audio_array.mean(axis=1)
if sr != 16000:
    audio_array = signal.resample(audio_array, int(len(audio_array)*16000/sr))
input_features = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features
output, pooled_embed = model(input_features)
probabilities = torch.nn.functional.softmax(output, dim=1)
predictions = torch.argmax(probabilities, dim=1)
predicted_accent = ID_TO_ACCENT[predictions.item()]
accent_probabilities = {ID_TO_ACCENT[i]: prob.item() for i, prob in enumerate(probabilities[0])}

```