|
|
--- |
|
|
language: |
|
|
- en |
|
|
- fr |
|
|
- hi |
|
|
- bn |
|
|
license: mit |
|
|
tags: |
|
|
- pytorch |
|
|
- transformer |
|
|
- mixture-of-experts |
|
|
- multilingual |
|
|
- translation |
|
|
--- |
|
|
|
|
|
# Multilingual MoE Transformer |
|
|
|
|
|
A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture**: Encoder-Decoder Transformer with MoE routing |
|
|
- **Languages**: English, French, Hindi, Bengali |
|
|
- **Vocabulary Size**: 32,000 tokens |
|
|
- **Model Dimension**: 512 |
|
|
- **Number of Experts**: 4 |
|
|
- **Number of Layers**: 6 |
|
|
- **Attention Heads**: 8 |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Stage**: Self-supervised pre-training (Stage 1) |
|
|
- **Task**: Next-token prediction (language modeling) |
|
|
- **Dataset**: Wikipedia data for all 4 languages |
|
|
- **Final Loss**: 2.0218 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download model |
|
|
model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt") |
|
|
checkpoint = torch.load(model_path) |
|
|
|
|
|
# Load model (you'll need to define the architecture) |
|
|
model.load_state_dict(checkpoint['model_state_dict']) |
|
|
``` |
|
|
|
|
|
## Next Steps |
|
|
|
|
|
This model is ready for Stage 2: fine-tuning on parallel translation data. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
``` |
|
|
@misc{moe-multilingual-translator, |
|
|
author = {arka7}, |
|
|
title = {Multilingual MoE Transformer}, |
|
|
year = {2024}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/arka7/moe-multilingual-translator} |
|
|
} |
|
|
``` |
|
|
|