YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
IndicTrans2 - INT8 Quantized ONNX Models
Quantized version of AI4Bharat's IndicTrans2 for efficient on-device Indian language translation.
๐ Model Details
- Original Model: ai4bharat/indictrans2-indic-en-1B
- Quantization: INT8 (via ONNX Runtime quantization)
- Framework: ONNX Runtime
- Task: Translation from 22 Indian languages to English
- Use Case: Offline translation on mobile/edge devices
๐๏ธ Files Included
Core Models
encoder_int8.onnx(116 MB) - Quantized encoder- Input: Token IDs [batch, seq_len]
- Output: Hidden states [batch, seq_len, 1024]
decoder_int8.onnx(92 MB) - Quantized decoder with self-attention- Input: Decoder token IDs + encoder hidden states
- Output: Logits [batch, seq_len, vocab_size]
Supporting Files
vocab_src.json- Source vocabulary (Indian languages)vocab_tgt.json- Target vocabulary (English)special_tokens.json- Special tokens mapping
๐ Compression Stats
| Model | Original (FP32) | Quantized (INT8) | Reduction |
|---|---|---|---|
| Encoder | ~464 MB | 116 MB | ~75% |
| Decoder | ~368 MB | 92 MB | ~75% |
| Total | ~832 MB | 208 MB | ~75% |
๐ Quick Start
Python (ONNX Runtime)
import onnxruntime as ort
import json
import numpy as np
# Load models
encoder_session = ort.InferenceSession("encoder_int8.onnx")
decoder_session = ort.InferenceSession("decoder_int8.onnx")
# Load vocabularies
with open("vocab_src.json") as f:
src_vocab = json.load(f)
with open("vocab_tgt.json") as f:
tgt_vocab = json.load(f)
# Tokenize input (add language tags)
text = "เคจเคฎเคธเฅเคคเฅ เคฆเฅเคจเคฟเคฏเคพ" # Hello world in Hindi
tokens = tokenize(f"<2en> <hin> {text}", src_vocab)
input_ids = np.array([tokens], dtype=np.int64)
# Run encoder
encoder_output = encoder_session.run(
["hidden_states"],
{"input_ids": input_ids}
)[0]
# Autoregressive decoding
generated_ids = [2] # Start token
max_length = 50
for _ in range(max_length):
decoder_ids = np.array([generated_ids], dtype=np.int64)
logits = decoder_session.run(
["logits"],
{
"input_ids": decoder_ids,
"encoder_hidden_states": encoder_output
}
)[0]
next_token = np.argmax(logits[0, -1, :])
if next_token == 2: # EOS token
break
generated_ids.append(int(next_token))
# Decode output
translation = detokenize(generated_ids, tgt_vocab)
print(translation) # "Hello world"
๐ฏ Supported Languages
Translation from any of these 22 languages to English:
Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu
๐ Model Architecture
Input Text (+ lang tags)
โ
Tokenization
โ
Encoder (Transformer, 1B params)
โ [hidden_states]
Decoder (Autoregressive)
โ
Output Tokens
โ
English Translation
โ๏ธ Performance
Tested on Android (Pixel 7):
- Encoder: ~50-150ms
- Decoder (per token): ~20-40ms
- Total (20 tokens output): ~600ms-1s
- Memory: ~400MB peak
๐ Language Tags
Use these tags for source language:
<2en>- Translate to English<hin>- Hindi<ben>- Bengali<tam>- Tamil<tel>- Telugu- etc.
Example: <2en> <hin> เคฏเคน เคเค เคชเคฐเฅเคเฅเคทเคฃ เคนเฅ
๐ Citation
@article{gala2023indictrans,
title={IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
author={Gala, Jay and others},
journal={Transactions on Machine Learning Research},
year={2023}
}
๐๏ธ Original Creators
AI4Bharat - IIT Madras
Original model: https://huggingface.co/ai4bharat/indictrans2-indic-en-1B
๐ License
MIT License (same as original model)
๐ Related
- Original FP32 model: ai4bharat/indictrans2-indic-en-1B
- ASR model: Indic Conformer INT8
Quantized for mobile deployment | January 2026