metadata
language:
- en
tags:
- audio
- language-identification
- speech
- indian-languages
datasets:
- hmsolanki/indian-languages-audio-dataset
metrics:
- accuracy
- f1
Indian Language Identification Model
This model identifies the language spoken in an audio clip from a set of 10 Indian languages.
Model Details
- Model Type: Audio Language Classifier
- Languages Supported: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu, Urdu
- Framework: PyTorch
- Training Dataset: Indian Languages Audio Dataset
- Audio Sampling Rate: 16kHz
Performance
- Accuracy: 0.8465
- Precision: 0.8457
- Recall: 0.8465
- F1 Score: 0.8452
Usage
import torch
import torchaudio
import json
from transformers import pipeline
# Load the model
pipe = pipeline("audio-classification", model="prithvirajjadhav2266/indian-language-identifier")
# Or use it directly
waveform, sample_rate = torchaudio.load("path/to/audio.wav")
if sample_rate != 16000:
resampler = torchaudio.transforms.Resample(sample_rate, 16000)
waveform = resampler(waveform)
# Get prediction
prediction = pipe(waveform)
print(f"Detected language: {prediction[0]['label']}")
Limitations
- Works best with clear audio without background noise
- Audio should be sampled at 16kHz for optimal performance
Training Details
This model was trained on a dataset of Indian language audio samples. The model architecture combines CNN layers for feature extraction with transformer layers for classification.
