Indic Conformer ONNX (Multi-Indic ASR - Sherpa-ONNX)

This is an ONNX conversion of AI4Bharat's Indic Conformer Large model, optimized for mobile deployment using Sherpa-ONNX.

Original Model: ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large
License: MIT (allows conversion, modification, and redistribution)
Converted Format: ONNX + INT8 quantized for mobile devices
Languages: 8 Indian languages (Assamese, Bengali, Bodo, Gujarati, Hindi, Kannada, Kashmiri, Marathi)

🎯 Use Cases

React Native/Expo mobile apps
On-device multilingual Indian speech recognition
8 Indian languages supported (see below)
No internet required - runs entirely offline
Low latency - real-time transcription

📦 Model Files

File	Size	Description	Use Case
`model.onnx`	470 MB	Full precision ONNX model	Maximum accuracy
`model.int8.onnx`	188 MB	INT8 quantized	Mobile deployment (recommended)
`tokens.txt`	~100 KB	Multi-Indic vocabulary (5633 tokens)	Required for decoding

🚀 Quick Start

Python (Sherpa-ONNX)

import sherpa_onnx

# Create recognizer
config = sherpa_onnx.OnlineRecognizerConfig(
    model_config=sherpa_onnx.OnlineModelConfig(
        transducer=sherpa_onnx.OnlineTransducerModelConfig(
            encoder="model.int8.onnx",
            decoder="",
            joiner=""
        ),
        tokens="tokens.txt",
        num_threads=2
    )
)

recognizer = sherpa_onnx.OnlineRecognizer(config)

# Transcribe audio
stream = recognizer.create_stream()
# ... feed audio samples
result = recognizer.get_result(stream)
print(result.text)

React Native / Expo

import { SherpaONNX } from 'react-native-sherpa-onnx';

const config = {
  modelPath: 'model.int8.onnx',
  tokensPath: 'tokens.txt',
  sampleRate: 16000
};

const recognizer = await SherpaONNX.createRecognizer(config);
const result = await recognizer.transcribe(audioBuffer);
console.log(result.text); // Output in respective Indian language

C++ (Mobile Native)

#include "sherpa-onnx/csrc/online-recognizer.h"

sherpa_onnx::OnlineRecognizerConfig config;
config.model_config.transducer.encoder = "model.int8.onnx";
config.model_config.tokens = "tokens.txt";
config.model_config.num_threads = 2;

auto recognizer = sherpa_onnx::OnlineRecognizer::Create(config);
// ... feed audio and get results

📊 Performance

Metric	Value	Notes
Languages	8 Indian languages	Multi-Indic model
WER	~8-12%	Clean speech
Latency	<100ms	On mobile (INT8)
Model Size (FP32)	470 MB	Full precision
Model Size (INT8)	188 MB	Quantized
Vocabulary	5633 tokens	Multi-Indic scripts
Sample Rate	16kHz	Required input
Real-time Factor	0.1-0.3	Mobile devices

🔄 Conversion Process

This model was converted from the original .nemo format:

Export from NeMo: Used NeMo's ONNX export functionality
Vocabulary Extraction: Extracted tokens from CTC decoder
INT8 Quantization: Applied post-training quantization for mobile
Validation: Tested accuracy preservation after conversion

Conversion Script

import nemo.collections.asr as nemo_asr

# Load original .nemo model
model = nemo_asr.models.EncDecCTCModel.restore_from(
    "IndicConformer-600M-Multi.nemo"
)

# Export to ONNX
model.export('model.onnx')

# Extract vocabulary
with open('tokens.txt', 'w', encoding='utf-8') as f:
    for i, token in enumerate(model.decoder.vocabulary):
        f.write(f"{token} {i}\n")
    f.write(f"<blk> {len(model.decoder.vocabulary)}\n")

🎯 Supported Languages

This model supports 8 Indian languages across multiple scripts:

Language	Script	ISO Code	Example
Assamese	Bengali	`as`	আইবো
Bengali	Bengali	`bn`	আমি
Bodo	Devanagari	`brx`	अं
Gujarati	Gujarati	`gu`	હું
Hindi	Devanagari	`hi`	मैं
Kannada	Kannada	`kn`	ನಾನು
Kashmiri	Arabic	`ks`	اَس
Marathi	Devanagari	`mr`	मी

Total Vocabulary: 5633 tokens across all supported scripts

📱 Mobile Integration

React Native Setup

Install Sherpa-ONNX bindings:

npm install react-native-sherpa-onnx

Download model files to app assets
Initialize recognizer with model paths
Start recording and transcribing

iOS/Android Native

Add Sherpa-ONNX to your project
Bundle model files with app
Initialize with model paths
Use native audio APIs for recording

⚡ Optimization Tips

For Mobile Devices

✅ Use model.int8.onnx (4x smaller, minimal accuracy loss)
✅ Set num_threads=2 for balance between speed and battery
✅ Use streaming mode for real-time transcription
✅ Consider voice activity detection (VAD) to reduce processing

For Cloud/Server

✅ Use model.onnx for maximum accuracy
✅ Set num_threads=4 or higher
✅ Batch processing for multiple files
✅ GPU acceleration with ONNX Runtime

🛠️ Technical Details

Model Architecture

Type: Conformer Hybrid (CTC + RNNT)
Model Size: 470 MB (FP32), 188 MB (INT8)
Training Data: AI4Bharat's Indic Voices dataset
Architecture: Conformer blocks + CTC/RNNT decoder
Languages: 8 Indian languages (Multilingual)

Input Requirements

Sample Rate: 16kHz (mono)
Format: 16-bit PCM
Frame Size: 512 samples recommended
Hop Length: 160 samples

Output Format

Type: String (UTF-8)
Scripts: Bengali, Devanagari, Gujarati, Kannada, Arabic (for Kashmiri)
Tokens: 5633 multi-Indic tokens
Languages: Outputs in the detected Indian language

📜 License & Attribution

Original Model

Created by: AI4Bharat
Original Model: ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large
Framework: NVIDIA NeMo
License: MIT License
Training: Supported by Ministry of Electronics and IT, Govt. of India

This Conversion

License: MIT (same as original - allows commercial use and redistribution)
Format: ONNX (FP32 + INT8 quantized)
Purpose: Enable mobile deployment via Sherpa-ONNX
Compatibility: Sherpa-ONNX runtime (C++, Python, React Native)
Legal Status: ✅ Authorized under MIT License terms

Note: If you use this model, please cite the original AI4Bharat work and acknowledge their contribution to Indian language ASR.

🙏 Acknowledgments

Special thanks to:

AI4Bharat team for training and releasing the original model
NVIDIA NeMo for the ASR framework and export tools
Sherpa-ONNX (k2-fsa) for the mobile inference runtime
Indian Government for supporting AI4Bharat initiative

📖 Citation

@misc{ai4bharat2023indicconformer,
  title={IndicConformer: A Conformer-based Speech Recognition System for Indian Languages},
  author={AI4Bharat},
  year={2023},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large}}
}

🔗 Links

Sherpa-ONNX: https://github.com/k2-fsa/sherpa-onnx
NeMo Framework: https://github.com/NVIDIA/NeMo
AI4Bharat: https://ai4bharat.org/
Original Model: https://huggingface.co/ai4bharat/indicconformer_stt_hi_hybrid_ctc_rnnt_large

🐛 Issues & Support

For issues related to:

Model accuracy: Contact AI4Bharat or check original model
ONNX conversion: Open an issue on the converter repo
Sherpa-ONNX usage: Check Sherpa-ONNX documentation
Mobile integration: Refer to React Native / native SDK docs

📝 Changelog

Version 1.0.0

Initial ONNX conversion from NeMo format
INT8 quantization for mobile deployment
Vocabulary extraction and validation
Tested on iOS and Android devices

Made with ❤️ for the Indian language NLP community

Downloads last month: 23

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support