--- language: - en - hi - or - bn - ta - te - kn - ml - mr - gu - pa - as license: apache-2.0 pipeline_tag: audio-classification library_name: transformers tags: - language-identification - indian-languages - multilingual - speech - asr-preprocessing - callcenter-ai - speech-analytics - audio-classification - wav2vec2 - transformers - pytorch - huggingface --- # **Vakgyata** **Language Identification for Indian Languages from Speech** --- ## **Model Overview** `vakgyata` is an open-source language identification model specifically designed to classify Indian languages from raw speech audio. It is built upon the pretrained [`Harveenchadha/wav2vec2-pretrained-clsril-23-10k`](https://huggingface.co/Harveenchadha/wav2vec2-pretrained-clsril-23-10k) with additional **Layer Normalization** integrated to improve stability and performance for audio classification tasks. --- ## **Variants and Model Sizes** | Variant | Parameters | Accuracy | | ---------------- | ---------- | -------- | | `vakgyata-base` | 95M | 95.88% | | `vakgyata-small` | 52M | 95.06% | | `vakgyata-mini` | 38M | 95.06% | | `vakgyata-tiny` | 24M | 93.63% | --- ## **Supported Languages** | Language | Code | | --------------- | ----- | | English (India) | en-IN | | Hindi | hi-IN | | Odia | or-IN | | Bengali | bn-IN | | Tamil | ta-IN | | Telugu | te-IN | | Kannada | kn-IN | | Malayalam | ml-IN | | Marathi | mr-IN | | Gujarati | gu-IN | | Punjabi | pa-IN | | Assamese | as-IN | --- ## **Specifications** * **Supported Sampling Rate:** 16000 Hz * **Recommended Audio Format:** 16kHz, 16bit PCM (Mono) --- ## **Installation** ```bash pip install transformers torchaudio ``` --- ## **Usage** ```python from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor import torch device = "cuda" if torch.cuda.is_available() else "cpu" model_id = "onecxi/vakgyata-tiny" processor = AutoFeatureExtractor.from_pretrained(model_id) model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id).to(device) ``` --- ## **Inference Example** ```python import torchaudio # Load the audio (ensure it's 16kHz mono) audio, sr = torchaudio.load("path/to/audio.wav") # Preprocess inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt").to(device) # Inference with torch.no_grad(): logits = model(**inputs).logits # Softmax to get probabilities probs = logits.softmax(dim=-1).cpu().numpy() # Predicted language language = model.config.id2label.get(probs.argmax()) print("Predicted Language:", language) ``` --- ## **Citation** If you use this model in your research or application, please consider citing the model and its base source: ``` @misc{vakgyata2024, title={vakgyata: Language Identification for Indian Speech}, author={OneCXI}, year={2024}, url={https://huggingface.co/onecxi/vakgyata-tiny} } ``` ---