--- license: mit datasets: - mozilla-foundation/common_voice_17_0 - CSTR-Edinburgh/vctk base_model: - openai/whisper-small --- # Model Card for Model ID Model name: Amirjab21/accent-classifier Task: Accent classification (audio → accent label) Supported input: 16 kHz mono audio waveform (float32 or int16) in NumPy array This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). - **Developed by:** Amir Jabarivasal - **Finetuned from model [optional]:** Openai/whisper-small ### Model Sources [optional] - **Repository:** https://github.com/Amirjab21/accents - **Paper [optional]:** https://amirjab21.github.io/?blog=0 - **Demo [optional]:** Accentgame.xyz ## Uses Classify accents ## How to Get Started with the Model Use the code below to get started with the model. ``` ID_TO_ACCENT = { 0: "Scottish", 1: "English", 2: "Indian", 3: "Irish", 4: "Welsh", 5: "New Zealand", 6: "Australian", 7: "South African", 8: "Canadian", 9: "NorthernIrish", 10: "American", 11: "South East Asia", 12: "Eastern Europe", 13: "East Asia", 14: "Nordic", 15: "France", 16: "Southern Europe", 17: "Germany", 18: "West Indies", 19: "Western Africa", 20: "South Asia", } import soundfile as sf import torch from scipy import signal audio_array, sr = sf.read(audio_path) if audio_array.ndim > 1: audio_array = audio_array.mean(axis=1) if sr != 16000: audio_array = signal.resample(audio_array, int(len(audio_array)*16000/sr)) input_features = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features output, pooled_embed = model(input_features) probabilities = torch.nn.functional.softmax(output, dim=1) predictions = torch.argmax(probabilities, dim=1) predicted_accent = ID_TO_ACCENT[predictions.item()] accent_probabilities = {ID_TO_ACCENT[i]: prob.item() for i, prob in enumerate(probabilities[0])} ```