MWirelabs/ne-asr
A multilingual automatic speech recognition (ASR) model for eight Northeast Indian languages, fine-tuned from openai/whisper-medium.
Languages
| Language | Code | Test WER |
|---|---|---|
| Khasi | kha | 16.89% |
| Garo | grt | 9.31% |
| Mizo | lus | 23.85% |
| Nagamese | nag | 49.13% |
| Kokborok | trp | 44.79% |
| Assamese | asm | 20.98% |
| Chakma | ccp | 54.25% |
| Wancho | wao | 68.37% |
| Overall | 36.06% |
Training Data
- Vaani (ARTPARK-IISc/Vaani-transcription-part): 121,960 training samples across 7 languages
- Proprietary MWire corpus: 28,524 training samples across 5 languages (Khasi, Garo, Mizo, Nagamese, Kokborok)
- Total: 150,483 training samples
Training Details
- Base model: openai/whisper-medium
- Learning rate: 1e-5 with 500 warmup steps
- Steps: 8,000
- Batch size: 16 (gradient accumulation 2, effective 32)
- Mixed precision: fp16
- Language token: welsh proxy for all languages except Assamese
Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
import torch
processor = WhisperProcessor.from_pretrained("MWirelabs/ne-asr")
model = WhisperForConditionalGeneration.from_pretrained("MWirelabs/ne-asr")
audio, sr = sf.read("audio.wav")
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Force language token (use "welsh" for all except Assamese)
forced_ids = processor.get_decoder_prompt_ids(language="welsh", task="transcribe")
with torch.no_grad():
predicted_ids = model.generate(inputs.input_features, forced_decoder_ids=forced_ids)
transcription = processor.decode(predicted_ids[0], skip_special_tokens=True)
print(transcription)
Citation
Paper forthcoming. If you use this model, please cite:
@misc{mwirelabs2026nearsr, title={NE-MultiSpeech: Multilingual ASR for Northeast Indian Languages}, author={MWire Labs}, year={2026} }
License
CC-BY-4.0. Developed by MWire Labs, Shillong, Meghalaya.
- Downloads last month
- 13