Upload Stage 1 model - Loss: 2.0218

Files changed (6) hide show

README.md ADDED Viewed

+---
+language:
+- en
+- fr
+- hi
+- bn
+license: mit
+tags:
+- pytorch
+- transformer
+- mixture-of-experts
+- multilingual
+- translation
+---
+# Multilingual MoE Transformer
+A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.
+## Model Details
+- **Architecture**: Encoder-Decoder Transformer with MoE routing
+- **Languages**: English, French, Hindi, Bengali
+- **Vocabulary Size**: 32,000 tokens
+- **Model Dimension**: 512
+- **Number of Experts**: 4
+- **Number of Layers**: 6
+- **Attention Heads**: 8
+## Training
+- **Stage**: Self-supervised pre-training (Stage 1)
+- **Task**: Next-token prediction (language modeling)
+- **Dataset**: Wikipedia data for all 4 languages
+- **Final Loss**: 2.0218
+## Usage
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download model
+model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
+checkpoint = torch.load(model_path)
+# Load model (you'll need to define the architecture)
+model.load_state_dict(checkpoint['model_state_dict'])
+```
+## Next Steps
+This model is ready for Stage 2: fine-tuning on parallel translation data.
+## Citation
+If you use this model, please cite:
+```
+@misc{moe-multilingual-translator,
+  author = {arka7},
+  title = {Multilingual MoE Transformer},
+  year = {2024},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/arka7/moe-multilingual-translator}
+}
+```

config.json ADDED Viewed

+{
+  "model_type": "moe_transformer",
+  "vocab_size": 32000,
+  "d_model": 512,
+  "nhead": 8,
+  "num_experts": 4,
+  "num_layers": 6,
+  "max_seq_len": 256,
+  "languages": [
+    "en",
+    "fr",
+    "hi",
+    "bn"
+  ],
+  "training_stage": "stage1_pretraining",
+  "final_loss": 2.02175643123963,
+  "final_balance_loss": 0.010806717754429852
+}

pytorch_model.pt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:79ff45ac2a932c916036f62782b57179cfd0a164c7c3eae39778069168bc6a41
+size 399190942

tokenizer.model ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:2804e2016a4862e980034f2db6e99fe028e617503f1faea7f6ff7f2487bc3fe8
+size 919076

tokenizer.vocab ADDED Viewed

The diff for this file is too large to render. See raw diff

training_log.txt ADDED Viewed

+Training Completed Successfully!
+Epoch: 1
+Total Batches: 3743
+Average Loss: 2.0218
+Average Balance Loss: 0.0108
+Expert Usage per Language:
+en: [[0.20985517 0.16751863 0.31998625 0.30264   ]]
+fr: [[0.24961634 0.21768875 0.26282057 0.26987436]]
+hi: [[0.21246533 0.14122878 0.33271343 0.31359246]]
+bn: [[0.24983221 0.22729187 0.25725418 0.26562175]]