metadata
license: apache-2.0
language:
- ne
- en
tags:
- translation
- nepali
- english
- multilingual
- code-mixed
- romanized
- devanagari
- onnx
pipeline_tag: translation
widget:
- text: mero name ramesh ho
example_title: Romanized Nepali
- text: सामाजिक मिडिया र ग्राउण्ड वास्तविकता फरक छ।
example_title: Devanagari Nepali
- text: what is your nam
example_title: Informal English
model-index:
- name: SETU
results:
- task:
type: translation
name: Translation
dataset:
type: custom
name: Nepali-English Mixed Dataset
metrics:
- type: bleu
value: 49.5
name: BLEU
library_name: transformers
SETU - Script-agnostic English Translation Unifier
SETU is a neural translation model that unifies multiscript, multilingual, and informal text into clean, formal English.
Model Description
The SETU model can handle:
- Romanized Nepali to English translation
- Devanagari Nepali to English translation
- Code-mixed text to English translation
- Informal/slang to formal English translation
Try It Out
🚀 Interactive Demo: Try SETU in Google Colab: https://colab.research.google.com/drive/1KdLiLtAKGK8_XLyFlEwSqGFPZZqGwl4n?usp=sharing
Installation
Ensure that you have transformers and onnx installed:
pip install transformers onnxruntime
Usage
from transformers import AutoModel
# Load the model
model = AutoModel.from_pretrained("santoshdahal/setu", trust_remote_code=True)
# Translate text
result = model("mero name ramesh ho")
print("Translation:", result)
# Output: "My name is Ramesh."
# Works with Devanagari script too
result = model("सामाजिक मिडिया र ग्राउण्ड वास्तविकता फरक छ।")
print("Translation:", result)
# Output: "Social media and reality are different."
# Handles informal text
result = model("what is your nam")
print("Translation:", result)
# Output: "what's your name"
Model Details
- Model Type: Neural Machine Translation
- Architecture: Transformer
- Vocabulary Size: 40,253 tokens
- Languages Supported: Nepali (Romanized & Devanagari), English, Code-mixed text
- Model Format: ONNX for efficient inference
Technical Implementation
The model uses:
- ONNX Runtime for efficient inference
- SentencePiece for tokenization
- Beam search decoding with configurable beam size
- Separate encoder and decoder ONNX models
Files Included
encoder.onnx: ONNX encoder modeldecoder.onnx: ONNX decoder modelspm.model: SentencePiece tokenizer modelspm.vocab: SentencePiece vocabularyconfig.json: Model configurationmodeling_setu_translation.py: Model implementationconfiguration_setu_translation.py: Configuration class
Citation
If you use this model, please cite:
@misc{setu2025,
title={SETU: Script-agnostic English Translation Unifier},
author={Santosh Dahal},
year={2025}
}