YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

IndicTrans2 - INT8 Quantized ONNX Models

Quantized version of AI4Bharat's IndicTrans2 for efficient on-device Indian language translation.

๐Ÿ“‹ Model Details

  • Original Model: ai4bharat/indictrans2-indic-en-1B
  • Quantization: INT8 (via ONNX Runtime quantization)
  • Framework: ONNX Runtime
  • Task: Translation from 22 Indian languages to English
  • Use Case: Offline translation on mobile/edge devices

๐Ÿ—‚๏ธ Files Included

Core Models

  • encoder_int8.onnx (116 MB) - Quantized encoder
    • Input: Token IDs [batch, seq_len]
    • Output: Hidden states [batch, seq_len, 1024]
  • decoder_int8.onnx (92 MB) - Quantized decoder with self-attention
    • Input: Decoder token IDs + encoder hidden states
    • Output: Logits [batch, seq_len, vocab_size]

Supporting Files

  • vocab_src.json - Source vocabulary (Indian languages)
  • vocab_tgt.json - Target vocabulary (English)
  • special_tokens.json - Special tokens mapping

๐Ÿ“Š Compression Stats

Model Original (FP32) Quantized (INT8) Reduction
Encoder ~464 MB 116 MB ~75%
Decoder ~368 MB 92 MB ~75%
Total ~832 MB 208 MB ~75%

๐Ÿš€ Quick Start

Python (ONNX Runtime)

import onnxruntime as ort
import json
import numpy as np

# Load models
encoder_session = ort.InferenceSession("encoder_int8.onnx")
decoder_session = ort.InferenceSession("decoder_int8.onnx")

# Load vocabularies
with open("vocab_src.json") as f:
    src_vocab = json.load(f)
with open("vocab_tgt.json") as f:
    tgt_vocab = json.load(f)

# Tokenize input (add language tags)
text = "เคจเคฎเคธเฅเคคเฅ‡ เคฆเฅเคจเคฟเคฏเคพ"  # Hello world in Hindi
tokens = tokenize(f"<2en> <hin> {text}", src_vocab)
input_ids = np.array([tokens], dtype=np.int64)

# Run encoder
encoder_output = encoder_session.run(
    ["hidden_states"],
    {"input_ids": input_ids}
)[0]

# Autoregressive decoding
generated_ids = [2]  # Start token
max_length = 50

for _ in range(max_length):
    decoder_ids = np.array([generated_ids], dtype=np.int64)
    logits = decoder_session.run(
        ["logits"],
        {
            "input_ids": decoder_ids,
            "encoder_hidden_states": encoder_output
        }
    )[0]

    next_token = np.argmax(logits[0, -1, :])
    if next_token == 2:  # EOS token
        break
    generated_ids.append(int(next_token))

# Decode output
translation = detokenize(generated_ids, tgt_vocab)
print(translation)  # "Hello world"

๐ŸŽฏ Supported Languages

Translation from any of these 22 languages to English:

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu

๐Ÿ“ Model Architecture

Input Text (+ lang tags)
    โ†“
Tokenization
    โ†“
Encoder (Transformer, 1B params)
    โ†“ [hidden_states]
Decoder (Autoregressive)
    โ†“
Output Tokens
    โ†“
English Translation

โš™๏ธ Performance

Tested on Android (Pixel 7):

  • Encoder: ~50-150ms
  • Decoder (per token): ~20-40ms
  • Total (20 tokens output): ~600ms-1s
  • Memory: ~400MB peak

๐Ÿ“ Language Tags

Use these tags for source language:

  • <2en> - Translate to English
  • <hin> - Hindi
  • <ben> - Bengali
  • <tam> - Tamil
  • <tel> - Telugu
  • etc.

Example: <2en> <hin> เคฏเคน เคเค• เคชเคฐเฅ€เค•เฅเคทเคฃ เคนเฅˆ

๐Ÿ“ Citation

@article{gala2023indictrans,
  title={IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
  author={Gala, Jay and others},
  journal={Transactions on Machine Learning Research},
  year={2023}
}

๐Ÿ—๏ธ Original Creators

AI4Bharat - IIT Madras
Original model: https://huggingface.co/ai4bharat/indictrans2-indic-en-1B

๐Ÿ“„ License

MIT License (same as original model)

๐Ÿ”— Related


Quantized for mobile deployment | January 2026

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support