--- language: - en - fr - hi - bn license: mit tags: - pytorch - transformer - mixture-of-experts - multilingual - translation --- # Multilingual MoE Transformer A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali. ## Model Details - **Architecture**: Encoder-Decoder Transformer with MoE routing - **Languages**: English, French, Hindi, Bengali - **Vocabulary Size**: 32,000 tokens - **Model Dimension**: 512 - **Number of Experts**: 4 - **Number of Layers**: 6 - **Attention Heads**: 8 ## Training - **Stage**: Self-supervised pre-training (Stage 1) - **Task**: Next-token prediction (language modeling) - **Dataset**: Wikipedia data for all 4 languages - **Final Loss**: 2.0218 ## Usage ```python import torch from huggingface_hub import hf_hub_download # Download model model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt") checkpoint = torch.load(model_path) # Load model (you'll need to define the architecture) model.load_state_dict(checkpoint['model_state_dict']) ``` ## Next Steps This model is ready for Stage 2: fine-tuning on parallel translation data. ## Citation If you use this model, please cite: ``` @misc{moe-multilingual-translator, author = {arka7}, title = {Multilingual MoE Transformer}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/arka7/moe-multilingual-translator} } ```