Stage 2 model - Loss: 3.8833, Acc: 35.95%

Browse files

Files changed (6) hide show

README.md +257 -0
config.json +30 -0
pytorch_model.pt +3 -0
tokenizer.model +3 -0
tokenizer.vocab +0 -0
training_metrics.json +27 -0

README.md ADDED Viewed

	@@ -0,0 +1,257 @@

+---
+language:
+- en
+- fr
+- hi
+- bn
+license: mit
+tags:
+- pytorch
+- transformer
+- mixture-of-experts
+- multilingual
+- translation
+- french
+- hindi
+- bengali
+datasets:
+- Helsinki-NLP/opus-100
+- musfiqdehan/opus100-Bengali-to-English
+base_model: arka7/moe-multilingual-translator
+metrics:
+- accuracy
+- perplexity
+pipeline_tag: translation
+---
+# MoE Multilingual Translator - Stage 2 Fine-tuned
+A Mixture-of-Experts (MoE) transformer fine-tuned for translating French, Hindi, and Bengali to English.
+## 🎯 Quick Info
+**Supports:** French → English | Hindi → English | Bengali → English
+**Base Model:** [arka7/moe-multilingual-translator](https://huggingface.co/arka7/moe-multilingual-translator)
+## 📊 Performance
+| Metric | Value |
+|--------|-------|
+| **Validation Loss** | **3.8833** |
+| **Token Accuracy** | **35.95%** |
+| **Perplexity** | **48.58** |
+| **Training Loss** | 3.9530 |
+| **Epochs** | 3 |
+### Training History
+```json
+{
+  "train_loss": [
+    5.081450140173895,
+    4.325329969776386,
+    3.95300766737378
+  ],
+  "val_loss": [
+    4.531953684556713,
+    4.124982544608208,
+    3.8832832201203304
+  ],
+  "perplexity": [
+    92.93997192382812,
+    61.86671829223633,
+    48.583457946777344
+  ],
+  "accuracy": [
+    29.0423772315063,
+    33.302914504078025,
+    35.949352649289914
+  ],
+  "epochs": [
+    1,
+    2,
+    3
+  ]
+}
+```
+## 🏗️ Architecture
+- **Type**: Encoder-Decoder Transformer with MoE routing
+- **Vocabulary**: 32,000 tokens (SentencePiece)
+- **Model Dimension**: 512
+- **Attention Heads**: 8
+- **Layers**: 6 encoder + 6 decoder
+- **Experts**: 4 (in encoder)
+- **Max Sequence**: 256 tokens
+## 🚀 Usage
+### Installation
+```bash
+pip install torch sentencepiece huggingface_hub
+```
+### Load Model
+```python
+import torch
+import sentencepiece as spm
+from huggingface_hub import hf_hub_download
+import json
+# Download files
+model_path = hf_hub_download(
+    repo_id="arka7/moe-multilingual-translator-stage2",
+    filename="pytorch_model.pt"
+)
+tokenizer_path = hf_hub_download(
+    repo_id="arka7/moe-multilingual-translator-stage2",
+    filename="tokenizer.model"
+)
+config_path = hf_hub_download(
+    repo_id="arka7/moe-multilingual-translator-stage2",
+    filename="config.json"
+)
+# Load tokenizer
+sp = spm.SentencePieceProcessor()
+sp.load(tokenizer_path)
+# Load config
+with open(config_path) as f:
+    cfg = json.load(f)
+# Load checkpoint
+checkpoint = torch.load(model_path, map_location='cpu')
+# You need to define the model architecture first
+# See: https://huggingface.co/arka7/moe-multilingual-translator for architecture code
+```
+### Translate Text
+```python
+# After loading model (see architecture in base model)
+def translate(text, src_lang='fr'):
+    # Add language token
+    input_text = f"<{src_lang}> {text}"
+    # Encode
+    input_ids = sp.encode(input_text)
+    # Generate translation (greedy decoding)
+    # ... model inference code ...
+    return translation
+# Examples
+translate("Bonjour, comment allez-vous?", "fr")
+# → "Hello, how are you?"
+translate("नमस्ते, आप कैसे हैं?", "hi")
+# → "Hello, how are you?"
+translate("আপনি কেমন আছেন?", "bn")
+# → "How are you?"
+```
+## 📚 Training
+### Stage 1: Pre-training
+- Self-supervised language modeling
+- Wikipedia data (4 languages)
+- Learned multilingual representations
+### Stage 2: Translation Fine-tuning ⭐
+- **This model** - fine-tuned on parallel translation data
+- ~150K translation pairs (50K per language)
+- Languages: French, Hindi, Bengali → English
+- Datasets: OPUS-100 parallel corpora
+## 🎓 Model Architecture Code
+```python
+import torch.nn as nn
+class MoE(nn.Module):
+    def __init__(self, d_model, num_experts=4):
+        super().__init__()
+        self.num_experts = num_experts
+        self.router = nn.Linear(d_model, num_experts)
+        self.experts = nn.ModuleList([
+            nn.Linear(d_model, d_model)
+            for _ in range(num_experts)
+        ])
+        self.balance_loss = 0.0
+    def forward(self, x):
+        seq_repr = x.mean(dim=1)
+        logits = self.router(seq_repr)
+        weights = torch.softmax(logits, dim=-1)
+        expert_outputs = torch.stack(
+            [exp(x) for exp in self.experts], dim=-1
+        )
+        out = torch.einsum('bsde,be->bsd', expert_outputs, weights)
+        usage = weights.mean(dim=0)
+        self.balance_loss = ((usage - 1/self.num_experts) ** 2).sum()
+        return out
+# See base model for full architecture
+```
+## ⚠️ Limitations
+- Only translates **TO English** (not FROM English)
+- Best on general domain text
+- May struggle with:
+  - Technical/specialized vocabulary
+  - Very long sentences (>256 tokens)
+  - Code-mixed text
+  - Rare dialects
+## 🔮 Improvements
+To get better performance:
+- Train longer (more epochs)
+- Larger model (increase d_model, layers)
+- More data (additional parallel corpora)
+- Beam search decoding
+- Learning rate scheduling
+## 📄 Files
+- `pytorch_model.pt` - Trained model weights
+- `tokenizer.model` - SentencePiece tokenizer
+- `tokenizer.vocab` - Vocabulary
+- `config.json` - Configuration
+- `training_metrics.json` - Training history
+## 📖 Citation
+```bibtex
+@misc{moe_translator_stage2,
+  author = {arka7},
+  title = {MoE Multilingual Translator - Stage 2},
+  year = {2024},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/arka7/moe-multilingual-translator-stage2}
+}
+```
+## 📜 License
+MIT License
+## 🔗 Links
+- **This Model**: https://huggingface.co/arka7/moe-multilingual-translator-stage2
+- **Base Model (Stage 1)**: https://huggingface.co/arka7/moe-multilingual-translator
+- **Dataset**: [OPUS-100](https://huggingface.co/datasets/Helsinki-NLP/opus-100)
+---
+*Built with PyTorch • Trained on 3 epochs • Ready for translation!*

config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "model_type": "moe_translation",
+  "task": "translation",
+  "architectures": [
+    "MoETranslationModel"
+  ],
+  "source_languages": [
+    "fr",
+    "hi",
+    "bn"
+  ],
+  "target_language": "en",
+  "vocab_size": 32000,
+  "d_model": 512,
+  "nhead": 8,
+  "num_experts": 4,
+  "num_layers": 6,
+  "max_seq_len": 256,
+  "training": {
+    "stage": "stage2_translation_finetuning",
+    "epochs_completed": 3,
+    "best_val_loss": 3.8832832201203304,
+    "train_loss": 3.95300766737378,
+    "token_accuracy": 35.949352649289914,
+    "perplexity": 48.583457946777344
+  },
+  "framework": "pytorch",
+  "tokenizer": "sentencepiece",
+  "base_model": "arka7/moe-multilingual-translator"
+}

pytorch_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dbde62cbfd8b667e8535837d0f0b660731763df589d1fef2dccaa0ed93ad39b5
+size 1096733562

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2804e2016a4862e980034f2db6e99fe028e617503f1faea7f6ff7f2487bc3fe8
+size 919076

tokenizer.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

training_metrics.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "train_loss": [
+    5.081450140173895,
+    4.325329969776386,
+    3.95300766737378
+  ],
+  "val_loss": [
+    4.531953684556713,
+    4.124982544608208,
+    3.8832832201203304
+  ],
+  "perplexity": [
+    92.93997192382812,
+    61.86671829223633,
+    48.583457946777344
+  ],
+  "accuracy": [
+    29.0423772315063,
+    33.302914504078025,
+    35.949352649289914
+  ],
+  "epochs": [
+    1,
+    2,
+    3
+  ]
+}