---
language:
- en
- fr
- hi
- bn
license: mit
tags:
- pytorch
- transformer
- mixture-of-experts
- multilingual
- translation
---

# Multilingual MoE Transformer

A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.

## Model Details

- **Architecture**: Encoder-Decoder Transformer with MoE routing
- **Languages**: English, French, Hindi, Bengali
- **Vocabulary Size**: 32,000 tokens
- **Model Dimension**: 512
- **Number of Experts**: 4
- **Number of Layers**: 6
- **Attention Heads**: 8

## Training

- **Stage**: Self-supervised pre-training (Stage 1)
- **Task**: Next-token prediction (language modeling)
- **Dataset**: Wikipedia data for all 4 languages
- **Final Loss**: 2.0218

## Usage

```python
import torch
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
checkpoint = torch.load(model_path)

# Load model (you'll need to define the architecture)
model.load_state_dict(checkpoint['model_state_dict'])
```

## Next Steps

This model is ready for Stage 2: fine-tuning on parallel translation data.

## Citation

If you use this model, please cite:
```
@misc{moe-multilingual-translator,
  author = {arka7},
  title = {Multilingual MoE Transformer},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/arka7/moe-multilingual-translator}
}
```