---
language:
- en
- hi
- or
- bn
- ta
- te
- kn
- ml
- mr
- gu
- pa
- as
license: apache-2.0
pipeline_tag: audio-classification
library_name: transformers
tags:
- language-identification
- indian-languages
- multilingual
- speech
- asr-preprocessing
- callcenter-ai
- speech-analytics
- audio-classification
- wav2vec2
- transformers
- pytorch
- huggingface
---

# **Vakgyata**

**Language Identification for Indian Languages from Speech**

---

## **Model Overview**

`vakgyata` is an open-source language identification model specifically designed to classify Indian languages from raw speech audio. It is built upon the pretrained [`Harveenchadha/wav2vec2-pretrained-clsril-23-10k`](https://huggingface.co/Harveenchadha/wav2vec2-pretrained-clsril-23-10k) with additional **Layer Normalization** integrated to improve stability and performance for audio classification tasks.

---

## **Variants and Model Sizes**

| Variant          | Parameters | Accuracy |
| ---------------- | ---------- | -------- |
| `vakgyata-base`  | 95M        | 95.88%   |
| `vakgyata-small` | 52M        | 95.06%   |
| `vakgyata-mini`  | 38M        | 95.06%   |
| `vakgyata-tiny`  | 24M        | 93.63%   |

---

## **Supported Languages**

| Language        | Code  |
| --------------- | ----- |
| English (India) | en-IN |
| Hindi           | hi-IN |
| Odia            | or-IN |
| Bengali         | bn-IN |
| Tamil           | ta-IN |
| Telugu          | te-IN |
| Kannada         | kn-IN |
| Malayalam       | ml-IN |
| Marathi         | mr-IN |
| Gujarati        | gu-IN |
| Punjabi         | pa-IN |
| Assamese        | as-IN |

---

## **Specifications**

* **Supported Sampling Rate:** 16000 Hz
* **Recommended Audio Format:** 16kHz, 16bit PCM (Mono)

---

## **Installation**

```bash
pip install transformers torchaudio
```

---

## **Usage**

```python
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model_id = "onecxi/vakgyata-tiny"

processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id).to(device)
```

---

## **Inference Example**

```python
import torchaudio

# Load the audio (ensure it's 16kHz mono)
audio, sr = torchaudio.load("path/to/audio.wav")

# Preprocess
inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt").to(device)

# Inference
with torch.no_grad():
    logits = model(**inputs).logits

# Softmax to get probabilities
probs = logits.softmax(dim=-1).cpu().numpy()

# Predicted language
language = model.config.id2label.get(probs.argmax())
print("Predicted Language:", language)
```

---

## **Citation**

If you use this model in your research or application, please consider citing the model and its base source:

```
@misc{vakgyata2024,
  title={vakgyata: Language Identification for Indian Speech},
  author={OneCXI},
  year={2024},
  url={https://huggingface.co/onecxi/vakgyata-tiny}
}
```

---