README.md · onecxi/vakgyata-tiny at main

vakgyata-tiny / README.md

onecxi

Update README.md

ce4c142 verified 6 months ago

preview code

raw

history blame contribute delete

3.01 kB

	---
	language:
	- en
	- hi
	- or
	- bn
	- ta
	- te
	- kn
	- ml
	- mr
	- gu
	- pa
	- as
	license: apache-2.0
	pipeline_tag: audio-classification
	library_name: transformers
	tags:
	- language-identification
	- indian-languages
	- multilingual
	- speech
	- asr-preprocessing
	- callcenter-ai
	- speech-analytics
	- audio-classification
	- wav2vec2
	- transformers
	- pytorch
	- huggingface
	---

	# Vakgyata

	Language Identification for Indian Languages from Speech

	---

	## Model Overview

	`vakgyata` is an open-source language identification model specifically designed to classify Indian languages from raw speech audio. It is built upon the pretrained [`Harveenchadha/wav2vec2-pretrained-clsril-23-10k`](https://huggingface.co/Harveenchadha/wav2vec2-pretrained-clsril-23-10k) with additional Layer Normalization integrated to improve stability and performance for audio classification tasks.

	---

	## Variants and Model Sizes

	\| Variant \| Parameters \| Accuracy \|
	\| ---------------- \| ---------- \| -------- \|
	\| `vakgyata-base` \| 95M \| 95.88% \|
	\| `vakgyata-small` \| 52M \| 95.06% \|
	\| `vakgyata-mini` \| 38M \| 95.06% \|
	\| `vakgyata-tiny` \| 24M \| 93.63% \|

	---

	## Supported Languages

	\| Language \| Code \|
	\| --------------- \| ----- \|
	\| English (India) \| en-IN \|
	\| Hindi \| hi-IN \|
	\| Odia \| or-IN \|
	\| Bengali \| bn-IN \|
	\| Tamil \| ta-IN \|
	\| Telugu \| te-IN \|
	\| Kannada \| kn-IN \|
	\| Malayalam \| ml-IN \|
	\| Marathi \| mr-IN \|
	\| Gujarati \| gu-IN \|
	\| Punjabi \| pa-IN \|
	\| Assamese \| as-IN \|

	---

	## Specifications

	* Supported Sampling Rate: 16000 Hz
	* Recommended Audio Format: 16kHz, 16bit PCM (Mono)

	---

	## Installation

	```bash
	pip install transformers torchaudio
	```

	---

	## Usage

	```python
	from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
	import torch

	device = "cuda" if torch.cuda.is_available() else "cpu"

	model_id = "onecxi/vakgyata-tiny"

	processor = AutoFeatureExtractor.from_pretrained(model_id)
	model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id).to(device)
	```

	---

	## Inference Example

	```python
	import torchaudio

	# Load the audio (ensure it's 16kHz mono)
	audio, sr = torchaudio.load("path/to/audio.wav")

	# Preprocess
	inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt").to(device)

	# Inference
	with torch.no_grad():
	logits = model(**inputs).logits

	# Softmax to get probabilities
	probs = logits.softmax(dim=-1).cpu().numpy()

	# Predicted language
	language = model.config.id2label.get(probs.argmax())
	print("Predicted Language:", language)
	```

	---

	## Citation

	If you use this model in your research or application, please consider citing the model and its base source:

	```
	@misc{vakgyata2024,
	title={vakgyata: Language Identification for Indian Speech},
	author={OneCXI},
	year={2024},
	url={https://huggingface.co/onecxi/vakgyata-tiny}
	}
	```

	---