Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
tags:
|
| 5 |
+
- audio
|
| 6 |
+
- language-identification
|
| 7 |
+
- speech
|
| 8 |
+
- indian-languages
|
| 9 |
+
datasets:
|
| 10 |
+
- hmsolanki/indian-languages-audio-dataset
|
| 11 |
+
metrics:
|
| 12 |
+
- accuracy
|
| 13 |
+
- f1
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Indian Language Identification Model
|
| 17 |
+
|
| 18 |
+
This model identifies the language spoken in an audio clip from a set of 10 Indian languages.
|
| 19 |
+
|
| 20 |
+
## Model Details
|
| 21 |
+
|
| 22 |
+
- **Model Type:** Audio Language Classifier
|
| 23 |
+
- **Languages Supported:** Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Punjabi, Tamil, Telugu, Urdu
|
| 24 |
+
- **Framework:** PyTorch
|
| 25 |
+
- **Training Dataset:** [Indian Languages Audio Dataset](https://www.kaggle.com/datasets/hmsolanki/indian-languages-audio-dataset/)
|
| 26 |
+
- **Audio Sampling Rate:** 16kHz
|
| 27 |
+
|
| 28 |
+
## Performance
|
| 29 |
+
|
| 30 |
+
- **Accuracy:** 0.8465
|
| 31 |
+
- **Precision:** 0.8457
|
| 32 |
+
- **Recall:** 0.8465
|
| 33 |
+
- **F1 Score:** 0.8452
|
| 34 |
+
|
| 35 |
+
## Usage
|
| 36 |
+
|
| 37 |
+
```python
|
| 38 |
+
import torch
|
| 39 |
+
import torchaudio
|
| 40 |
+
import json
|
| 41 |
+
from transformers import pipeline
|
| 42 |
+
|
| 43 |
+
# Load the model
|
| 44 |
+
pipe = pipeline("audio-classification", model="prithvirajjadhav2266/indian-language-identifier")
|
| 45 |
+
|
| 46 |
+
# Or use it directly
|
| 47 |
+
waveform, sample_rate = torchaudio.load("path/to/audio.wav")
|
| 48 |
+
if sample_rate != 16000:
|
| 49 |
+
resampler = torchaudio.transforms.Resample(sample_rate, 16000)
|
| 50 |
+
waveform = resampler(waveform)
|
| 51 |
+
|
| 52 |
+
# Get prediction
|
| 53 |
+
prediction = pipe(waveform)
|
| 54 |
+
print(f"Detected language: {prediction[0]['label']}")
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Limitations
|
| 58 |
+
|
| 59 |
+
- Works best with clear audio without background noise
|
| 60 |
+
- Audio should be sampled at 16kHz for optimal performance
|
| 61 |
+
|
| 62 |
+
## Training Details
|
| 63 |
+
|
| 64 |
+
This model was trained on a dataset of Indian language audio samples. The model architecture combines CNN layers for feature extraction with transformer layers for classification.
|
| 65 |
+
|
| 66 |
+
## Confusion Matrix
|
| 67 |
+
|
| 68 |
+

|