w2vindia

w2vindia is a self-supervised speech representation model based on the Wav2Vec 2.0 Base architecture, trained from scratch on a multilingual corpus of Indian languages.

This model serves as a foundation acoustic model and does not generate text directly. It is intended for fine-tuning on downstream speech tasks such as ASR, phoneme recognition, or language identification.

Load the model

from transformers import Wav2Vec2Model, Wav2Vec2Processor

processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")
model = Wav2Vec2Model.from_pretrained("balaragavesh/w2vindia")

πŸ” Model Description

  • Architecture: Wav2Vec 2.0 Base
  • Training Type: Self-supervised pretraining
  • Languages: Hindi, Tamil, Bengali, Marathi, Telugu, Kannada, Malayalam, Gujarati (and others)
  • Sampling Rate: 16 kHz
  • Framework: PyTorch / Hugging Face Transformers

Unlike language-specific models, this model was trained on a blind mixture of Indian languages without language identifiers, allowing it to learn shared phonetic and acoustic representations across languages.

πŸ“š Dataset

The model was pre-trained on the IndicTTS dataset collection released by SPRING Lab, available on Hugging Face.

  • Source: SPRINGLab / IndicTTS Datasets
  • Total Duration: ~200 Hours
  • Data Characteristics: Mixed Indian languages (Hindi, Tamil, Marathi, Bengali, etc.)
  • Preprocessing:
    • Sampling Rate: 16kHz
    • Audio files were filtered to be between 2s and 15s in length to optimize attention mechanisms.
    • Combined into a unified training set to encourage cross-lingual acoustic transfer.

πŸš€ Intended Use

βœ… Supported Tasks

  • Automatic Speech Recognition (ASR)
  • Phoneme Recognition
  • Low-resource language modeling
  • Cross-lingual transfer learning

❌ Not Intended For

  • Direct speech-to-text inference without fine-tuning
  • Speaker identification without adaptation

πŸ”— Link


Downloads last month
3
Safetensors
Model size
95.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support