arka7
/

moe-multilingual-translator

moe_transformer

mixture-of-experts

Model card Files Files and versions

moe-multilingual-translator / README.md

arka7's picture

Upload Stage 1 model - Loss: 2.0218

780b318 verified about 1 month ago

|

history blame contribute delete

1.43 kB

	---
	language:
	- en
	- fr
	- hi
	- bn
	license: mit
	tags:
	- pytorch
	- transformer
	- mixture-of-experts
	- multilingual
	- translation
	---

	# Multilingual MoE Transformer

	A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.

	## Model Details

	- Architecture: Encoder-Decoder Transformer with MoE routing
	- Languages: English, French, Hindi, Bengali
	- Vocabulary Size: 32,000 tokens
	- Model Dimension: 512
	- Number of Experts: 4
	- Number of Layers: 6
	- Attention Heads: 8

	## Training

	- Stage: Self-supervised pre-training (Stage 1)
	- Task: Next-token prediction (language modeling)
	- Dataset: Wikipedia data for all 4 languages
	- Final Loss: 2.0218

	## Usage

	```python
	import torch
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
	checkpoint = torch.load(model_path)

	# Load model (you'll need to define the architecture)
	model.load_state_dict(checkpoint['model_state_dict'])
	```

	## Next Steps

	This model is ready for Stage 2: fine-tuning on parallel translation data.

	## Citation

	If you use this model, please cite:
	```
	@misc{moe-multilingual-translator,
	author = {arka7},
	title = {Multilingual MoE Transformer},
	year = {2024},
	publisher = {Hugging Face},
	url = {https://huggingface.co/arka7/moe-multilingual-translator}
	}
	```