arka7 commited on
Commit
baa5f24
·
verified ·
1 Parent(s): 52316a5

Stage 2 model - Loss: 3.8833, Acc: 35.95%

Browse files
Files changed (6) hide show
  1. README.md +257 -0
  2. config.json +30 -0
  3. pytorch_model.pt +3 -0
  4. tokenizer.model +3 -0
  5. tokenizer.vocab +0 -0
  6. training_metrics.json +27 -0
README.md ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - hi
6
+ - bn
7
+ license: mit
8
+ tags:
9
+ - pytorch
10
+ - transformer
11
+ - mixture-of-experts
12
+ - multilingual
13
+ - translation
14
+ - french
15
+ - hindi
16
+ - bengali
17
+ datasets:
18
+ - Helsinki-NLP/opus-100
19
+ - musfiqdehan/opus100-Bengali-to-English
20
+ base_model: arka7/moe-multilingual-translator
21
+ metrics:
22
+ - accuracy
23
+ - perplexity
24
+ pipeline_tag: translation
25
+ ---
26
+
27
+ # MoE Multilingual Translator - Stage 2 Fine-tuned
28
+
29
+ A Mixture-of-Experts (MoE) transformer fine-tuned for translating French, Hindi, and Bengali to English.
30
+
31
+ ## 🎯 Quick Info
32
+
33
+ **Supports:** French → English | Hindi → English | Bengali → English
34
+
35
+ **Base Model:** [arka7/moe-multilingual-translator](https://huggingface.co/arka7/moe-multilingual-translator)
36
+
37
+ ## 📊 Performance
38
+
39
+ | Metric | Value |
40
+ |--------|-------|
41
+ | **Validation Loss** | **3.8833** |
42
+ | **Token Accuracy** | **35.95%** |
43
+ | **Perplexity** | **48.58** |
44
+ | **Training Loss** | 3.9530 |
45
+ | **Epochs** | 3 |
46
+
47
+ ### Training History
48
+
49
+ ```json
50
+ {
51
+ "train_loss": [
52
+ 5.081450140173895,
53
+ 4.325329969776386,
54
+ 3.95300766737378
55
+ ],
56
+ "val_loss": [
57
+ 4.531953684556713,
58
+ 4.124982544608208,
59
+ 3.8832832201203304
60
+ ],
61
+ "perplexity": [
62
+ 92.93997192382812,
63
+ 61.86671829223633,
64
+ 48.583457946777344
65
+ ],
66
+ "accuracy": [
67
+ 29.0423772315063,
68
+ 33.302914504078025,
69
+ 35.949352649289914
70
+ ],
71
+ "epochs": [
72
+ 1,
73
+ 2,
74
+ 3
75
+ ]
76
+ }
77
+ ```
78
+
79
+ ## 🏗️ Architecture
80
+
81
+ - **Type**: Encoder-Decoder Transformer with MoE routing
82
+ - **Vocabulary**: 32,000 tokens (SentencePiece)
83
+ - **Model Dimension**: 512
84
+ - **Attention Heads**: 8
85
+ - **Layers**: 6 encoder + 6 decoder
86
+ - **Experts**: 4 (in encoder)
87
+ - **Max Sequence**: 256 tokens
88
+
89
+ ## 🚀 Usage
90
+
91
+ ### Installation
92
+
93
+ ```bash
94
+ pip install torch sentencepiece huggingface_hub
95
+ ```
96
+
97
+ ### Load Model
98
+
99
+ ```python
100
+ import torch
101
+ import sentencepiece as spm
102
+ from huggingface_hub import hf_hub_download
103
+ import json
104
+
105
+ # Download files
106
+ model_path = hf_hub_download(
107
+ repo_id="arka7/moe-multilingual-translator-stage2",
108
+ filename="pytorch_model.pt"
109
+ )
110
+ tokenizer_path = hf_hub_download(
111
+ repo_id="arka7/moe-multilingual-translator-stage2",
112
+ filename="tokenizer.model"
113
+ )
114
+ config_path = hf_hub_download(
115
+ repo_id="arka7/moe-multilingual-translator-stage2",
116
+ filename="config.json"
117
+ )
118
+
119
+ # Load tokenizer
120
+ sp = spm.SentencePieceProcessor()
121
+ sp.load(tokenizer_path)
122
+
123
+ # Load config
124
+ with open(config_path) as f:
125
+ cfg = json.load(f)
126
+
127
+ # Load checkpoint
128
+ checkpoint = torch.load(model_path, map_location='cpu')
129
+
130
+ # You need to define the model architecture first
131
+ # See: https://huggingface.co/arka7/moe-multilingual-translator for architecture code
132
+ ```
133
+
134
+ ### Translate Text
135
+
136
+ ```python
137
+ # After loading model (see architecture in base model)
138
+
139
+ def translate(text, src_lang='fr'):
140
+ # Add language token
141
+ input_text = f"<{src_lang}> {text}"
142
+
143
+ # Encode
144
+ input_ids = sp.encode(input_text)
145
+
146
+ # Generate translation (greedy decoding)
147
+ # ... model inference code ...
148
+
149
+ return translation
150
+
151
+ # Examples
152
+ translate("Bonjour, comment allez-vous?", "fr")
153
+ # → "Hello, how are you?"
154
+
155
+ translate("नमस्ते, आप कैसे हैं?", "hi")
156
+ # → "Hello, how are you?"
157
+
158
+ translate("আপনি কেমন আছেন?", "bn")
159
+ # → "How are you?"
160
+ ```
161
+
162
+ ## 📚 Training
163
+
164
+ ### Stage 1: Pre-training
165
+ - Self-supervised language modeling
166
+ - Wikipedia data (4 languages)
167
+ - Learned multilingual representations
168
+
169
+ ### Stage 2: Translation Fine-tuning ⭐
170
+ - **This model** - fine-tuned on parallel translation data
171
+ - ~150K translation pairs (50K per language)
172
+ - Languages: French, Hindi, Bengali → English
173
+ - Datasets: OPUS-100 parallel corpora
174
+
175
+ ## 🎓 Model Architecture Code
176
+
177
+ ```python
178
+ import torch.nn as nn
179
+
180
+ class MoE(nn.Module):
181
+ def __init__(self, d_model, num_experts=4):
182
+ super().__init__()
183
+ self.num_experts = num_experts
184
+ self.router = nn.Linear(d_model, num_experts)
185
+ self.experts = nn.ModuleList([
186
+ nn.Linear(d_model, d_model)
187
+ for _ in range(num_experts)
188
+ ])
189
+ self.balance_loss = 0.0
190
+
191
+ def forward(self, x):
192
+ seq_repr = x.mean(dim=1)
193
+ logits = self.router(seq_repr)
194
+ weights = torch.softmax(logits, dim=-1)
195
+ expert_outputs = torch.stack(
196
+ [exp(x) for exp in self.experts], dim=-1
197
+ )
198
+ out = torch.einsum('bsde,be->bsd', expert_outputs, weights)
199
+ usage = weights.mean(dim=0)
200
+ self.balance_loss = ((usage - 1/self.num_experts) ** 2).sum()
201
+ return out
202
+
203
+ # See base model for full architecture
204
+ ```
205
+
206
+ ## ⚠️ Limitations
207
+
208
+ - Only translates **TO English** (not FROM English)
209
+ - Best on general domain text
210
+ - May struggle with:
211
+ - Technical/specialized vocabulary
212
+ - Very long sentences (>256 tokens)
213
+ - Code-mixed text
214
+ - Rare dialects
215
+
216
+ ## 🔮 Improvements
217
+
218
+ To get better performance:
219
+ - Train longer (more epochs)
220
+ - Larger model (increase d_model, layers)
221
+ - More data (additional parallel corpora)
222
+ - Beam search decoding
223
+ - Learning rate scheduling
224
+
225
+ ## 📄 Files
226
+
227
+ - `pytorch_model.pt` - Trained model weights
228
+ - `tokenizer.model` - SentencePiece tokenizer
229
+ - `tokenizer.vocab` - Vocabulary
230
+ - `config.json` - Configuration
231
+ - `training_metrics.json` - Training history
232
+
233
+ ## 📖 Citation
234
+
235
+ ```bibtex
236
+ @misc{moe_translator_stage2,
237
+ author = {arka7},
238
+ title = {MoE Multilingual Translator - Stage 2},
239
+ year = {2024},
240
+ publisher = {Hugging Face},
241
+ url = {https://huggingface.co/arka7/moe-multilingual-translator-stage2}
242
+ }
243
+ ```
244
+
245
+ ## 📜 License
246
+
247
+ MIT License
248
+
249
+ ## 🔗 Links
250
+
251
+ - **This Model**: https://huggingface.co/arka7/moe-multilingual-translator-stage2
252
+ - **Base Model (Stage 1)**: https://huggingface.co/arka7/moe-multilingual-translator
253
+ - **Dataset**: [OPUS-100](https://huggingface.co/datasets/Helsinki-NLP/opus-100)
254
+
255
+ ---
256
+
257
+ *Built with PyTorch • Trained on 3 epochs • Ready for translation!*
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "moe_translation",
3
+ "task": "translation",
4
+ "architectures": [
5
+ "MoETranslationModel"
6
+ ],
7
+ "source_languages": [
8
+ "fr",
9
+ "hi",
10
+ "bn"
11
+ ],
12
+ "target_language": "en",
13
+ "vocab_size": 32000,
14
+ "d_model": 512,
15
+ "nhead": 8,
16
+ "num_experts": 4,
17
+ "num_layers": 6,
18
+ "max_seq_len": 256,
19
+ "training": {
20
+ "stage": "stage2_translation_finetuning",
21
+ "epochs_completed": 3,
22
+ "best_val_loss": 3.8832832201203304,
23
+ "train_loss": 3.95300766737378,
24
+ "token_accuracy": 35.949352649289914,
25
+ "perplexity": 48.583457946777344
26
+ },
27
+ "framework": "pytorch",
28
+ "tokenizer": "sentencepiece",
29
+ "base_model": "arka7/moe-multilingual-translator"
30
+ }
pytorch_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbde62cbfd8b667e8535837d0f0b660731763df589d1fef2dccaa0ed93ad39b5
3
+ size 1096733562
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2804e2016a4862e980034f2db6e99fe028e617503f1faea7f6ff7f2487bc3fe8
3
+ size 919076
tokenizer.vocab ADDED
The diff for this file is too large to render. See raw diff
 
training_metrics.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "train_loss": [
3
+ 5.081450140173895,
4
+ 4.325329969776386,
5
+ 3.95300766737378
6
+ ],
7
+ "val_loss": [
8
+ 4.531953684556713,
9
+ 4.124982544608208,
10
+ 3.8832832201203304
11
+ ],
12
+ "perplexity": [
13
+ 92.93997192382812,
14
+ 61.86671829223633,
15
+ 48.583457946777344
16
+ ],
17
+ "accuracy": [
18
+ 29.0423772315063,
19
+ 33.302914504078025,
20
+ 35.949352649289914
21
+ ],
22
+ "epochs": [
23
+ 1,
24
+ 2,
25
+ 3
26
+ ]
27
+ }