File size: 3,144 Bytes
f523352
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229c511
f523352
229c511
f523352
229c511
f523352
 
 
 
 
 
 
 
 
 
 
 
229c511
f523352
 
 
 
785f55b
 
 
 
 
 
 
 
 
 
 
 
23c2bdb
 
 
 
e56b95a
 
785b95e
e56b95a
 
6ba2860
e56b95a
 
785f55b
 
 
 
 
 
 
 
 
6dac960
785f55b
6dac960
785f55b
 
6dac960
785f55b
6dac960
785f55b
 
6dac960
785f55b
6dac960
26d3219
785f55b
 
 
 
 
f523352
785f55b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6ba2860
785f55b
 
6ba2860
785f55b
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: apache-2.0
language:
- ne
- en
tags:
- translation
- nepali
- english
- multilingual
- code-mixed
- romanized
- devanagari
- onnx
pipeline_tag: translation
widget:
- text: "mero name ramesh  ho"
  example_title: "Romanized Nepali"
- text: "सामाजिक मिडिया र ग्राउण्ड वास्तविकता फरक छ।"
  example_title: "Devanagari Nepali"
- text: "what is your nam"
  example_title: "Informal English"
model-index:
- name: SETU
  results:
  - task:
      type: translation
      name: Translation
    dataset:
      type: custom
      name: Nepali-English Mixed Dataset
    metrics:
    - type: bleu
      value: 49.5
      name: BLEU
library_name: transformers
---

# SETU - Script-agnostic English Translation Unifier

SETU is a neural translation model that unifies multiscript, multilingual, and informal text into clean, formal English.

## Model Description

The SETU model can handle:
- Romanized Nepali to English translation
- Devanagari Nepali to English translation  
- Code-mixed text to English translation
- Informal/slang to formal English translation

## Try It Out

🚀 **Interactive Demo**: Try SETU in Google Colab: [https://colab.research.google.com/drive/1KdLiLtAKGK8_XLyFlEwSqGFPZZqGwl4n?usp=sharing](https://colab.research.google.com/drive/1KdLiLtAKGK8_XLyFlEwSqGFPZZqGwl4n?usp=sharing)

## Installation

Ensure that you have transformers and onnx installed:

```bash
pip install transformers  onnxruntime 
```

## Usage

```python
from transformers import AutoModel

# Load the model
model = AutoModel.from_pretrained("santoshdahal/setu", trust_remote_code=True)

# Translate text
result = model("mero name ramesh  ho")
print("Translation:", result)
# Output: "My name is Ramesh."

# Works with Devanagari script too
result = model("सामाजिक मिडिया र ग्राउण्ड वास्तविकता फरक छ।")
print("Translation:", result) 
# Output: "Social media and reality are different."

# Handles informal text
result = model("what is your nam")
print("Translation:", result)
# Output: "what's your name"

```

## Model Details

- **Model Type**: Neural Machine Translation
- **Architecture**: Transformer 
- **Vocabulary Size**: 40,253 tokens
- **Languages Supported**: Nepali (Romanized & Devanagari), English, Code-mixed text
- **Model Format**: ONNX for efficient inference

## Technical Implementation

The model uses:
- ONNX Runtime for efficient inference
- SentencePiece for tokenization
- Beam search decoding with configurable beam size
- Separate encoder and decoder ONNX models

## Files Included

- `encoder.onnx`: ONNX encoder model
- `decoder.onnx`: ONNX decoder model  
- `spm.model`: SentencePiece tokenizer model
- `spm.vocab`: SentencePiece vocabulary
- `config.json`: Model configuration
- `modeling_setu_translation.py`: Model implementation
- `configuration_setu_translation.py`: Configuration class

## Citation

If you use this model, please cite:

```
@misc{setu2025,
  title={SETU: Script-agnostic English Translation Unifier},
  author={Santosh Dahal},
  year={2025}
}
```