Helsinki-NLP/opus-100
Viewer • Updated • 55.1M • 28.6k • 235
A Mixture-of-Experts (MoE) transformer fine-tuned for translating French, Hindi, and Bengali to English.
Supports: French → English | Hindi → English | Bengali → English
Base Model: arka7/moe-multilingual-translator
| Metric | Value |
|---|---|
| Validation Loss | 3.8833 |
| Token Accuracy | 35.95% |
| Perplexity | 48.58 |
| Training Loss | 3.9530 |
| Epochs | 3 |
{
"train_loss": [
5.081450140173895,
4.325329969776386,
3.95300766737378
],
"val_loss": [
4.531953684556713,
4.124982544608208,
3.8832832201203304
],
"perplexity": [
92.93997192382812,
61.86671829223633,
48.583457946777344
],
"accuracy": [
29.0423772315063,
33.302914504078025,
35.949352649289914
],
"epochs": [
1,
2,
3
]
}
pip install torch sentencepiece huggingface_hub
import torch
import sentencepiece as spm
from huggingface_hub import hf_hub_download
import json
# Download files
model_path = hf_hub_download(
repo_id="arka7/moe-multilingual-translator-stage2",
filename="pytorch_model.pt"
)
tokenizer_path = hf_hub_download(
repo_id="arka7/moe-multilingual-translator-stage2",
filename="tokenizer.model"
)
config_path = hf_hub_download(
repo_id="arka7/moe-multilingual-translator-stage2",
filename="config.json"
)
# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load(tokenizer_path)
# Load config
with open(config_path) as f:
cfg = json.load(f)
# Load checkpoint
checkpoint = torch.load(model_path, map_location='cpu')
# You need to define the model architecture first
# See: https://huggingface.co/arka7/moe-multilingual-translator for architecture code
# After loading model (see architecture in base model)
def translate(text, src_lang='fr'):
# Add language token
input_text = f"<{src_lang}> {text}"
# Encode
input_ids = sp.encode(input_text)
# Generate translation (greedy decoding)
# ... model inference code ...
return translation
# Examples
translate("Bonjour, comment allez-vous?", "fr")
# → "Hello, how are you?"
translate("नमस्ते, आप कैसे हैं?", "hi")
# → "Hello, how are you?"
translate("আপনি কেমন আছেন?", "bn")
# → "How are you?"
import torch.nn as nn
class MoE(nn.Module):
def __init__(self, d_model, num_experts=4):
super().__init__()
self.num_experts = num_experts
self.router = nn.Linear(d_model, num_experts)
self.experts = nn.ModuleList([
nn.Linear(d_model, d_model)
for _ in range(num_experts)
])
self.balance_loss = 0.0
def forward(self, x):
seq_repr = x.mean(dim=1)
logits = self.router(seq_repr)
weights = torch.softmax(logits, dim=-1)
expert_outputs = torch.stack(
[exp(x) for exp in self.experts], dim=-1
)
out = torch.einsum('bsde,be->bsd', expert_outputs, weights)
usage = weights.mean(dim=0)
self.balance_loss = ((usage - 1/self.num_experts) ** 2).sum()
return out
# See base model for full architecture
To get better performance:
pytorch_model.pt - Trained model weightstokenizer.model - SentencePiece tokenizertokenizer.vocab - Vocabularyconfig.json - Configurationtraining_metrics.json - Training history@misc{moe_translator_stage2,
author = {arka7},
title = {MoE Multilingual Translator - Stage 2},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/arka7/moe-multilingual-translator-stage2}
}
MIT License
Built with PyTorch • Trained on 3 epochs • Ready for translation!
Base model
arka7/moe-multilingual-translator