File size: 1,431 Bytes
780b318
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
language:
- en
- fr
- hi
- bn
license: mit
tags:
- pytorch
- transformer
- mixture-of-experts
- multilingual
- translation
---

# Multilingual MoE Transformer

A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.

## Model Details

- **Architecture**: Encoder-Decoder Transformer with MoE routing
- **Languages**: English, French, Hindi, Bengali
- **Vocabulary Size**: 32,000 tokens
- **Model Dimension**: 512
- **Number of Experts**: 4
- **Number of Layers**: 6
- **Attention Heads**: 8

## Training

- **Stage**: Self-supervised pre-training (Stage 1)
- **Task**: Next-token prediction (language modeling)
- **Dataset**: Wikipedia data for all 4 languages
- **Final Loss**: 2.0218

## Usage

```python
import torch
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
checkpoint = torch.load(model_path)

# Load model (you'll need to define the architecture)
model.load_state_dict(checkpoint['model_state_dict'])
```

## Next Steps

This model is ready for Stage 2: fine-tuning on parallel translation data.

## Citation

If you use this model, please cite:
```
@misc{moe-multilingual-translator,
  author = {arka7},
  title = {Multilingual MoE Transformer},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/arka7/moe-multilingual-translator}
}
```