IndicConformer Hindi ASR (HuggingFace Pipeline)
HuggingFace-compatible conversion of the AI4Bharat IndicConformer model. This enables using the model with standard HuggingFace Transformers patterns instead of NeMo.
Installation
pip install transformers torch torchaudio sentencepiece librosa
Quick Start (Pipeline API)
from huggingface_hub import snapshot_download
import sys
# Download model
model_path = snapshot_download("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf")
sys.path.insert(0, model_path)
# Import pipeline and model
from pipeline_indicconformer import IndicConformerASRPipeline
from modeling_indicconformer import IndicConformerForCTC
# Create pipeline
model = IndicConformerForCTC.from_pretrained(model_path)
pipe = IndicConformerASRPipeline(model=model, model_path=model_path)
# Transcribe
result = pipe("audio.wav")
print(result["text"])
Alternative: Auto Classes
import torch
import torchaudio
from transformers import AutoModel, AutoTokenizer, AutoFeatureExtractor
# Load components
model = AutoModel.from_pretrained("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf", trust_remote_code=True)
model.eval()
feature_extractor = AutoFeatureExtractor.from_pretrained("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Anshul1212/indicconformer-hi-hybrid-rnnt-large-hf", trust_remote_code=True)
# Load audio
waveform, sample_rate = torchaudio.load("audio.wav")
# Extract features and transcribe
inputs = feature_extractor(waveform.squeeze().numpy(), sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
predicted_ids = model.generate(
input_features=inputs['input_features'],
language='hi',
decoder_mode='rnnt'
)
text = tokenizer.decode(predicted_ids[0])
print(text)
GPU Inference
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
input_features = inputs['input_features'].to(device)
with torch.no_grad():
predicted_ids = model.generate(input_features=input_features, language='hi', decoder_mode='rnnt')
text = tokenizer.decode(predicted_ids[0])
CTC Decoding
with torch.no_grad():
predicted_ids = model.generate(
input_features=inputs['input_features'],
language='hi',
decoder_mode='ctc'
)
text = tokenizer.decode(predicted_ids[0], use_ctc=True)
Supported Languages
| Code | Language | Code | Language |
|---|---|---|---|
| hi | Hindi | te | Telugu |
| bn | Bengali | ta | Tamil |
| gu | Gujarati | ml | Malayalam |
| mr | Marathi | kn | Kannada |
| pa | Punjabi | or | Odia |
| as | Assamese | ur | Urdu |
| ne | Nepali | sa | Sanskrit |
| sd | Sindhi | kok | Konkani |
| doi | Dogri | mai | Maithili |
| mni | Manipuri | brx | Bodo |
| sat | Santali | ks | Kashmiri |
Citation
@misc{indicconformer2024,
title={IndicConformer: Conformer-based ASR for Indian Languages},
author={AI4Bharat},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large}
}
License
Apache 2.0
- Downloads last month
- -