arka7's picture
Upload Stage 1 model - Loss: 2.0218
780b318 verified
---
language:
- en
- fr
- hi
- bn
license: mit
tags:
- pytorch
- transformer
- mixture-of-experts
- multilingual
- translation
---
# Multilingual MoE Transformer
A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.
## Model Details
- **Architecture**: Encoder-Decoder Transformer with MoE routing
- **Languages**: English, French, Hindi, Bengali
- **Vocabulary Size**: 32,000 tokens
- **Model Dimension**: 512
- **Number of Experts**: 4
- **Number of Layers**: 6
- **Attention Heads**: 8
## Training
- **Stage**: Self-supervised pre-training (Stage 1)
- **Task**: Next-token prediction (language modeling)
- **Dataset**: Wikipedia data for all 4 languages
- **Final Loss**: 2.0218
## Usage
```python
import torch
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
checkpoint = torch.load(model_path)
# Load model (you'll need to define the architecture)
model.load_state_dict(checkpoint['model_state_dict'])
```
## Next Steps
This model is ready for Stage 2: fine-tuning on parallel translation data.
## Citation
If you use this model, please cite:
```
@misc{moe-multilingual-translator,
author = {arka7},
title = {Multilingual MoE Transformer},
year = {2024},
publisher = {Hugging Face},
url = {https://huggingface.co/arka7/moe-multilingual-translator}
}
```