w2vindia
w2vindia is a self-supervised speech representation model based on the Wav2Vec 2.0 Base architecture, trained from scratch on a multilingual corpus of Indian languages.
This model serves as a foundation acoustic model and does not generate text directly. It is intended for fine-tuning on downstream speech tasks such as ASR, phoneme recognition, or language identification.
Load the model
from transformers import Wav2Vec2Model, Wav2Vec2Processor
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")
model = Wav2Vec2Model.from_pretrained("balaragavesh/w2vindia")
π Model Description
- Architecture: Wav2Vec 2.0 Base
- Training Type: Self-supervised pretraining
- Languages: Hindi, Tamil, Bengali, Marathi, Telugu, Kannada, Malayalam, Gujarati (and others)
- Sampling Rate: 16 kHz
- Framework: PyTorch / Hugging Face Transformers
Unlike language-specific models, this model was trained on a blind mixture of Indian languages without language identifiers, allowing it to learn shared phonetic and acoustic representations across languages.
π Dataset
The model was pre-trained on the IndicTTS dataset collection released by SPRING Lab, available on Hugging Face.
- Source: SPRINGLab / IndicTTS Datasets
- Total Duration: ~200 Hours
- Data Characteristics: Mixed Indian languages (Hindi, Tamil, Marathi, Bengali, etc.)
- Preprocessing:
- Sampling Rate: 16kHz
- Audio files were filtered to be between 2s and 15s in length to optimize attention mechanisms.
- Combined into a unified training set to encourage cross-lingual acoustic transfer.
π Intended Use
β Supported Tasks
- Automatic Speech Recognition (ASR)
- Phoneme Recognition
- Low-resource language modeling
- Cross-lingual transfer learning
β Not Intended For
- Direct speech-to-text inference without fine-tuning
- Speaker identification without adaptation
π Link
- GitHub Repository:
π https://github.com/balaragavesh/w2vindia
- Downloads last month
- 49
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support