IndicConformer Hindi ASR (HuggingFace Pipeline)

HuggingFace-compatible conversion of the AI4Bharat IndicConformer model. This enables using the model with standard HuggingFace Transformers patterns instead of NeMo.

Installation

pip install transformers torch torchaudio sentencepiece librosa

Quick Start (Pipeline API)

from huggingface_hub import snapshot_download
import sys

# Download model
model_path = snapshot_download("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf")
sys.path.insert(0, model_path)

# Import pipeline and model
from pipeline_indicconformer import IndicConformerASRPipeline
from modeling_indicconformer import IndicConformerForCTC

# Create pipeline
model = IndicConformerForCTC.from_pretrained(model_path)
pipe = IndicConformerASRPipeline(model=model, model_path=model_path)

# Transcribe
result = pipe("audio.wav")
print(result["text"])

Alternative: Auto Classes

import torch
import torchaudio
from transformers import AutoModel, AutoTokenizer, AutoFeatureExtractor

# Load components
model = AutoModel.from_pretrained("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf", trust_remote_code=True)
model.eval()

feature_extractor = AutoFeatureExtractor.from_pretrained("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf", trust_remote_code=True)

# Load audio
waveform, sample_rate = torchaudio.load("audio.wav")

# Extract features and transcribe
inputs = feature_extractor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    predicted_ids = model.generate(
        input_features=inputs['input_features'],
        language='hi',
        decoder_mode='rnnt'
    )
    text = tokenizer.decode(predicted_ids[0])

print(text)

GPU Inference

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

input_features = inputs['input_features'].to(device)

with torch.no_grad():
    predicted_ids = model.generate(input_features=input_features, language='hi', decoder_mode='rnnt')
    text = tokenizer.decode(predicted_ids[0])

CTC Decoding

with torch.no_grad():
    predicted_ids = model.generate(
        input_features=inputs['input_features'],
        language='hi',
        decoder_mode='ctc'
    )
    text = tokenizer.decode(predicted_ids[0], use_ctc=True)

Supported Languages

Code Language Code Language
hi Hindi te Telugu
bn Bengali ta Tamil
gu Gujarati ml Malayalam
mr Marathi kn Kannada
pa Punjabi or Odia
as Assamese ur Urdu
ne Nepali sa Sanskrit
sd Sindhi kok Konkani
doi Dogri mai Maithili
mni Manipuri brx Bodo
sat Santali ks Kashmiri

Citation

@misc{indicconformer2024,
  title={IndicConformer: Conformer-based ASR for Indian Languages},
  author={AI4Bharat},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large}
}

License

Apache 2.0

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf

Finetuned
(1)
this model