Universal Translation System

A compact, production-ready multilingual neural machine translation model supporting 20 languages (190 language pairs). Trained on curated OPUS-100 data with synthetic augmentation, knowledge distillation, and neural quality filtering.

Model Architecture

Component	Configuration
Encoder	6-layer Transformer, 512 hidden dim, 8 heads
Decoder	8-layer Transformer, 768 hidden dim, 12 heads
Vocab	32K tokens, script-grouped (latin, cjk, arabic, devanagari, cyrillic, thai)
Params	~40MB (compact), ~150M total
Precision	BF16 mixed-precision training

Supported Languages

Group	Languages
Latin	en, es, fr, de, it, pt, nl, sv, pl, id, vi, tr
CJK	zh, ja, ko
Arabic	ar
Devanagari	hi
Cyrillic	ru, uk
Thai	th

Usage

Via the CLI (`uts`)

# Translate a sentence
uts serve --config config/base.yaml
curl -X POST http://localhost:8000/translate \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "source": "en", "target": "es"}'

Via Python

from runtime.encoder.universal_encoder import UniversalEncoder
from runtime.cloud_decoder import OptimizedUniversalDecoder

encoder = UniversalEncoder.from_pretrained("code-with-zeeshan/Universal-Translation-System")
decoder = OptimizedUniversalDecoder.from_pretrained("code-with-zeeshan/Universal-Translation-System")
# See docs/API.md for full inference examples

Via Hugging Face Hub

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("code-with-zeeshan/Universal-Translation-System")
tokenizer = AutoTokenizer.from_pretrained("code-with-zeeshan/Universal-Translation-System")

Training

The model was trained using the Universal Translation System pipeline:

Data pipeline — OPUS-100 download, sampling, augmentation (false friends, idioms, backtranslation), COMET quality filtering
Knowledge distillation — NLLB-3.3B teacher → compact student
Vocabulary — Script-grouped SentencePiece tokenizer (32K per group)
Training — BF16 mixed-precision, dynamic batch sizing, gradient checkpointing. ~10 epochs with cosine LR schedule.

Evaluation

Metric	Score
BLEU (average across 190 pairs)	Coming soon
COMET (average)	Coming soon

Files

encoder/ — Universal encoder weights
decoder/ — Optimized decoder weights
vocab/ — Script-grouped vocabulary packs
config.yaml — Training configuration used for this model

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

code-with-zeeshan
/

Universal-Translation-System

Universal Translation System

Model Architecture

Supported Languages

Usage

Via the CLI (`uts`)

Via Python

Via Hugging Face Hub

Training

Evaluation

Files

License

Dataset used to train code-with-zeeshan/Universal-Translation-System

Universal Translation System

Model Architecture

Supported Languages

Usage

Via the CLI (uts)

Via Python

Via Hugging Face Hub

Training

Evaluation

Files

License

Dataset used to train code-with-zeeshan/Universal-Translation-System

Via the CLI (`uts`)