code-with-zeeshan/UTS-Datasets
Updated β’ 42
A compact, production-ready multilingual neural machine translation model supporting 20 languages (190 language pairs). Trained on curated OPUS-100 data with synthetic augmentation, knowledge distillation, and neural quality filtering.
| Component | Configuration |
|---|---|
| Encoder | 6-layer Transformer, 512 hidden dim, 8 heads |
| Decoder | 8-layer Transformer, 768 hidden dim, 12 heads |
| Vocab | 32K tokens, script-grouped (latin, cjk, arabic, devanagari, cyrillic, thai) |
| Params | ~40MB (compact), ~150M total |
| Precision | BF16 mixed-precision training |
| Group | Languages |
|---|---|
| Latin | en, es, fr, de, it, pt, nl, sv, pl, id, vi, tr |
| CJK | zh, ja, ko |
| Arabic | ar |
| Devanagari | hi |
| Cyrillic | ru, uk |
| Thai | th |
uts)
# Translate a sentence
uts serve --config config/base.yaml
curl -X POST http://localhost:8000/translate \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "source": "en", "target": "es"}'
from runtime.encoder.universal_encoder import UniversalEncoder
from runtime.cloud_decoder import OptimizedUniversalDecoder
encoder = UniversalEncoder.from_pretrained("code-with-zeeshan/Universal-Translation-System")
decoder = OptimizedUniversalDecoder.from_pretrained("code-with-zeeshan/Universal-Translation-System")
# See docs/API.md for full inference examples
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("code-with-zeeshan/Universal-Translation-System")
tokenizer = AutoTokenizer.from_pretrained("code-with-zeeshan/Universal-Translation-System")
The model was trained using the Universal Translation System pipeline:
| Metric | Score |
|---|---|
| BLEU (average across 190 pairs) | Coming soon |
| COMET (average) | Coming soon |
encoder/ β Universal encoder weightsdecoder/ β Optimized decoder weightsvocab/ β Script-grouped vocabulary packsconfig.yaml β Training configuration used for this modelApache 2.0