MoE Multilingual Translator - Stage 2 Fine-tuned
A Mixture-of-Experts (MoE) transformer fine-tuned for translating French, Hindi, and Bengali to English.
๐ฏ Quick Info
Supports: French โ English | Hindi โ English | Bengali โ English
Base Model: arka7/moe-multilingual-translator
๐ Performance
| Metric | Value |
|---|---|
| Validation Loss | 3.8833 |
| Token Accuracy | 35.95% |
| Perplexity | 48.58 |
| Training Loss | 3.9530 |
| Epochs | 3 |
Training History
{
"train_loss": [
5.081450140173895,
4.325329969776386,
3.95300766737378
],
"val_loss": [
4.531953684556713,
4.124982544608208,
3.8832832201203304
],
"perplexity": [
92.93997192382812,
61.86671829223633,
48.583457946777344
],
"accuracy": [
29.0423772315063,
33.302914504078025,
35.949352649289914
],
"epochs": [
1,
2,
3
]
}
๐๏ธ Architecture
- Type: Encoder-Decoder Transformer with MoE routing
- Vocabulary: 32,000 tokens (SentencePiece)
- Model Dimension: 512
- Attention Heads: 8
- Layers: 6 encoder + 6 decoder
- Experts: 4 (in encoder)
- Max Sequence: 256 tokens
๐ Usage
Installation
pip install torch sentencepiece huggingface_hub
Load Model
import torch
import sentencepiece as spm
from huggingface_hub import hf_hub_download
import json
# Download files
model_path = hf_hub_download(
repo_id="arka7/moe-multilingual-translator-stage2",
filename="pytorch_model.pt"
)
tokenizer_path = hf_hub_download(
repo_id="arka7/moe-multilingual-translator-stage2",
filename="tokenizer.model"
)
config_path = hf_hub_download(
repo_id="arka7/moe-multilingual-translator-stage2",
filename="config.json"
)
# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load(tokenizer_path)
# Load config
with open(config_path) as f:
cfg = json.load(f)
# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# You need to define the model architecture first
# See: https://huggingface.co/arka7/moe-multilingual-translator for architecture code
Translate Text
# After loading model (see architecture in base model)
def translate(text, src_lang='fr'):
# Add language token
input_text = f"<{src_lang}> {text}"
# Encode
input_ids = sp.encode(input_text)
# Generate translation (greedy decoding)
# ... model inference code ...
return translation
# Examples
translate("Bonjour, comment allez-vous?", "fr")
# โ "Hello, how are you?"
translate("เคจเคฎเคธเฅเคคเฅ, เคเคช เคเฅเคธเฅ เคนเฅเค?", "hi")
# โ "Hello, how are you?"
translate("เฆเฆชเฆจเฆฟ เฆเงเฆฎเฆจ เฆเฆเงเฆจ?", "bn")
# โ "How are you?"
๐ Training
Stage 1: Pre-training
- Self-supervised language modeling
- Wikipedia data (4 languages)
- Learned multilingual representations
Stage 2: Translation Fine-tuning โญ
- This model - fine-tuned on parallel translation data
- ~150K translation pairs (50K per language)
- Languages: French, Hindi, Bengali โ English
- Datasets: OPUS-100 parallel corpora
๐ Model Architecture Code
import torch.nn as nn
class MoE(nn.Module):
def __init__(self, d_model, num_experts=4):
super().__init__()
self.num_experts = num_experts
self.router = nn.Linear(d_model, num_experts)
self.experts = nn.ModuleList([
nn.Linear(d_model, d_model)
for _ in range(num_experts)
])
self.balance_loss = 0.0
def forward(self, x):
seq_repr = x.mean(dim=1)
logits = self.router(seq_repr)
weights = torch.softmax(logits, dim=-1)
expert_outputs = torch.stack(
[exp(x) for exp in self.experts], dim=-1
)
out = torch.einsum('bsde,be->bsd', expert_outputs, weights)
usage = weights.mean(dim=0)
self.balance_loss = ((usage - 1/self.num_experts) ** 2).sum()
return out
# See base model for full architecture
โ ๏ธ Limitations
- Only translates TO English (not FROM English)
- Best on general domain text
- May struggle with:
- Technical/specialized vocabulary
- Very long sentences (>256 tokens)
- Code-mixed text
- Rare dialects
๐ฎ Improvements
To get better performance:
- Train longer (more epochs)
- Larger model (increase d_model, layers)
- More data (additional parallel corpora)
- Beam search decoding
- Learning rate scheduling
๐ Files
pytorch_model.pt- Trained model weightstokenizer.model- SentencePiece tokenizertokenizer.vocab- Vocabularyconfig.json- Configurationtraining_metrics.json- Training history
๐ Citation
@misc{moe_translator_stage2,
author = {arka7},
title = {MoE Multilingual Translator - Stage 2},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/arka7/moe-multilingual-translator-stage2}
}
๐ License
MIT License
๐ Links
- This Model: https://huggingface.co/arka7/moe-multilingual-translator-stage2
- Base Model (Stage 1): https://huggingface.co/arka7/moe-multilingual-translator
- Dataset: OPUS-100
Built with PyTorch โข Trained on 3 epochs โข Ready for translation!
- Downloads last month
- 17
Model tree for arka7/moe-multilingual-translator-stage2
Base model
arka7/moe-multilingual-translator