YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Wav2Vec2 Accent Classifier

Overview

This model is a fine-tuned version of facebook/wav2vec2-base for accent classification. It has been trained on the Speech Accent Archive for 14 epochs, achieving ~97% accuracy on validation set. The model generalizes well to unseen data and can classify speech into 11 different accents.

Model Repository

Repository: vrund1346/wav2vec2_accent_classification_v2
Developer: Vrund Dobariya

Supported Accents

  • Arabic, Dutch, English, French, German, Korean, Mandarin, Portuguese, Russian, Spanish, Turkish

Label to Accent Mappings

Label Accent
0 Arabic
1 Dutch
2 English
3 French
4 German
5 Korean
6 Mandarin
7 Portuguese
8 Russian
9 Spanish
10 Turkish

Usage

You can use this model to classify accents in spoken audio. Given an input speech waveform, it outputs confidence scores for each accent along with the most probable accent prediction.

Performance

  • Training Data: Speech Accent Archive
  • Epochs: 14
  • Accuracy: ~98%
  • Model Architecture: Wav2Vec2
  • Generalization: Works well on unseen data

Installation & Inference

To use this model, install the necessary dependencies:

pip install torch librosa transformers

Then, run inference using transformers:

from transformers import Wav2Vec2Processor, Wav2Vec2ForSequenceClassification
import torch
import librosa

# Load model and processor
model_name = "vrund1346/wav2vec2_accent_classification_v2"
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_name)

# Load audio
audio, sr = librosa.load("audio.wav", sr=16000)
input_values = processor(audio, return_tensors="pt", sampling_rate=16000).input_values

# Predict
with torch.no_grad():
    logits = model(input_values).logits
predicted_accent = torch.argmax(logits, dim=-1).item()
print("Predicted Accent:", predicted_accent)

Applications

  • Speech Analysis: Identify accents for linguistic studies
  • Personalization: Adapt speech systems to user accents

License

This project is open-source and available under the Apache License 2.0.

Downloads last month
12
Safetensors
Model size
94.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support