arka7 commited on
Commit
780b318
·
verified ·
1 Parent(s): 63fc399

Upload Stage 1 model - Loss: 2.0218

Browse files
Files changed (6) hide show
  1. README.md +66 -0
  2. config.json +18 -0
  3. pytorch_model.pt +3 -0
  4. tokenizer.model +3 -0
  5. tokenizer.vocab +0 -0
  6. training_log.txt +13 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - hi
6
+ - bn
7
+ license: mit
8
+ tags:
9
+ - pytorch
10
+ - transformer
11
+ - mixture-of-experts
12
+ - multilingual
13
+ - translation
14
+ ---
15
+
16
+ # Multilingual MoE Transformer
17
+
18
+ A Mixture-of-Experts (MoE) transformer trained on English, French, Hindi, and Bengali.
19
+
20
+ ## Model Details
21
+
22
+ - **Architecture**: Encoder-Decoder Transformer with MoE routing
23
+ - **Languages**: English, French, Hindi, Bengali
24
+ - **Vocabulary Size**: 32,000 tokens
25
+ - **Model Dimension**: 512
26
+ - **Number of Experts**: 4
27
+ - **Number of Layers**: 6
28
+ - **Attention Heads**: 8
29
+
30
+ ## Training
31
+
32
+ - **Stage**: Self-supervised pre-training (Stage 1)
33
+ - **Task**: Next-token prediction (language modeling)
34
+ - **Dataset**: Wikipedia data for all 4 languages
35
+ - **Final Loss**: 2.0218
36
+
37
+ ## Usage
38
+
39
+ ```python
40
+ import torch
41
+ from huggingface_hub import hf_hub_download
42
+
43
+ # Download model
44
+ model_path = hf_hub_download(repo_id="arka7/moe-multilingual-translator", filename="pytorch_model.pt")
45
+ checkpoint = torch.load(model_path)
46
+
47
+ # Load model (you'll need to define the architecture)
48
+ model.load_state_dict(checkpoint['model_state_dict'])
49
+ ```
50
+
51
+ ## Next Steps
52
+
53
+ This model is ready for Stage 2: fine-tuning on parallel translation data.
54
+
55
+ ## Citation
56
+
57
+ If you use this model, please cite:
58
+ ```
59
+ @misc{moe-multilingual-translator,
60
+ author = {arka7},
61
+ title = {Multilingual MoE Transformer},
62
+ year = {2024},
63
+ publisher = {Hugging Face},
64
+ url = {https://huggingface.co/arka7/moe-multilingual-translator}
65
+ }
66
+ ```
config.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "moe_transformer",
3
+ "vocab_size": 32000,
4
+ "d_model": 512,
5
+ "nhead": 8,
6
+ "num_experts": 4,
7
+ "num_layers": 6,
8
+ "max_seq_len": 256,
9
+ "languages": [
10
+ "en",
11
+ "fr",
12
+ "hi",
13
+ "bn"
14
+ ],
15
+ "training_stage": "stage1_pretraining",
16
+ "final_loss": 2.02175643123963,
17
+ "final_balance_loss": 0.010806717754429852
18
+ }
pytorch_model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79ff45ac2a932c916036f62782b57179cfd0a164c7c3eae39778069168bc6a41
3
+ size 399190942
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2804e2016a4862e980034f2db6e99fe028e617503f1faea7f6ff7f2487bc3fe8
3
+ size 919076
tokenizer.vocab ADDED
The diff for this file is too large to render. See raw diff
 
training_log.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Training Completed Successfully!
2
+
3
+ Epoch: 1
4
+ Total Batches: 3743
5
+ Average Loss: 2.0218
6
+ Average Balance Loss: 0.0108
7
+
8
+ Expert Usage per Language:
9
+
10
+ en: [[0.20985517 0.16751863 0.31998625 0.30264 ]]
11
+ fr: [[0.24961634 0.21768875 0.26282057 0.26987436]]
12
+ hi: [[0.21246533 0.14122878 0.33271343 0.31359246]]
13
+ bn: [[0.24983221 0.22729187 0.25725418 0.26562175]]