w2vindia

w2vindia is a self-supervised speech representation model based on the Wav2Vec 2.0 Base architecture, trained from scratch on a multilingual corpus of Indian languages.

This model serves as a foundation acoustic model and does not generate text directly. It is intended for fine-tuning on downstream speech tasks such as ASR, phoneme recognition, or language identification.

Load the model

from transformers import Wav2Vec2Model, Wav2Vec2Processor

processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")
model = Wav2Vec2Model.from_pretrained("balaragavesh/w2vindia")

πŸ” Model Description

  • Architecture: Wav2Vec 2.0 Base
  • Training Type: Self-supervised pretraining
  • Languages: Hindi, Tamil, Bengali, Marathi, Telugu, Kannada, Malayalam, Gujarati (and others)
  • Sampling Rate: 16 kHz
  • Framework: PyTorch / Hugging Face Transformers

Unlike language-specific models, this model was trained on a blind mixture of Indian languages without language identifiers, allowing it to learn shared phonetic and acoustic representations across languages.

πŸ“š Dataset

The model was pre-trained on the IndicTTS dataset collection released by SPRING Lab, available on Hugging Face.

  • Source: SPRINGLab / IndicTTS Datasets
  • Total Duration: ~200 Hours
  • Data Characteristics: Mixed Indian languages (Hindi, Tamil, Marathi, Bengali, etc.)
  • Preprocessing:
    • Sampling Rate: 16kHz
    • Audio files were filtered to be between 2s and 15s in length to optimize attention mechanisms.
    • Combined into a unified training set to encourage cross-lingual acoustic transfer.

πŸš€ Intended Use

βœ… Supported Tasks

  • Automatic Speech Recognition (ASR)
  • Phoneme Recognition
  • Low-resource language modeling
  • Cross-lingual transfer learning

❌ Not Intended For

  • Direct speech-to-text inference without fine-tuning
  • Speaker identification without adaptation

πŸ”— Link


Downloads last month
49
Safetensors
Model size
95.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support