Upload folder using huggingface_hub

Browse files

Files changed (11) hide show

.gitattributes +1 -0
README.md +51 -0
history.csv +6 -0
lora_adapters.pt +3 -0
lora_moe_training.png +3 -0
metrics.json +69 -0
model.pt +3 -0
tokenizer/special_tokens_map.json +7 -0
tokenizer/tokenizer.json +0 -0
tokenizer/tokenizer_config.json +55 -0
tokenizer/vocab.txt +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+lora_moe_training.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+# Bonus 3: LoRA for MoE Experts
+## Model
+Parameter-efficient fine-tuning of Mixture-of-Experts using **LoRA (Low-Rank Adaptation)**.
+## Architecture
+- 4 transformer layers with MoE
+- 8 experts per layer
+- Top-2 routing
+- LoRA rank: 16, alpha: 32
+## Parameter Efficiency
+- **Total Parameters**: 55,228,676
+- **Trainable (LoRA)**: 21,625,092 (39.16%)
+- **Frozen (Base)**: 33,603,584 (60.84%)
+- **Reduction**: 2.6x fewer trainable parameters
+## Performance
+- **Validation Accuracy**: 0.6400
+- **Dataset**: XSum (topic classification)
+- **Training Samples**: 4,000
+## LoRA Benefits
+1. **Memory Efficient**: Only store small adapter matrices
+2. **Fast Training**: Fewer parameters to update
+3. **Task Switching**: Swap LoRA adapters for different tasks
+4. **Merge Friendly**: Can merge adapters back into base weights
+## Files
+- `model.pt`: Full model checkpoint
+- `lora_adapters.pt`: Only LoRA parameters (smaller file)
+- `metrics.json`: Training metrics and config
+- `history.csv`: Training history
+## Usage
+```python
+# Load full model
+checkpoint = torch.load('model.pt')
+model.load_state_dict(checkpoint['model_state_dict'])
+# Or load only LoRA adapters (requires base model)
+lora_checkpoint = torch.load('lora_adapters.pt')
+model.load_state_dict(lora_checkpoint['lora_state_dict'], strict=False)
+```

history.csv ADDED Viewed

	@@ -0,0 +1,6 @@

+epoch,train_loss,train_accuracy,val_loss,val_accuracy
+1,0.8074325952529907,0.62525,0.8184478509426117,0.64
+2,0.7937552418708801,0.637,0.7908735847473145,0.64
+3,0.7901616661548615,0.6455,0.798002507686615,0.64
+4,0.7901241521835327,0.6365,0.8332968425750732,0.64
+5,0.7865016897916793,0.6465,0.7994629460573196,0.64

lora_adapters.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1cc75c50d7fd0e92374fc126b34a569515c0b2dca5e282ccd8466ce563c41d31
+size 6334282

lora_moe_training.png ADDED Viewed

Git LFS Details

SHA256: 46131a0ccd7250ffecb578c2f6450d229741d14c7be5ed6de2863e6fa196b542
Pointer size: 131 Bytes
Size of remote file: 121 kB

metrics.json ADDED Viewed

	@@ -0,0 +1,69 @@

+{
+  "history": [
+    {
+      "epoch": 1,
+      "train_loss": 0.8074325952529907,
+      "train_accuracy": 0.62525,
+      "val_loss": 0.8184478509426117,
+      "val_accuracy": 0.64
+    },
+    {
+      "epoch": 2,
+      "train_loss": 0.7937552418708801,
+      "train_accuracy": 0.637,
+      "val_loss": 0.7908735847473145,
+      "val_accuracy": 0.64
+    },
+    {
+      "epoch": 3,
+      "train_loss": 0.7901616661548615,
+      "train_accuracy": 0.6455,
+      "val_loss": 0.798002507686615,
+      "val_accuracy": 0.64
+    },
+    {
+      "epoch": 4,
+      "train_loss": 0.7901241521835327,
+      "train_accuracy": 0.6365,
+      "val_loss": 0.8332968425750732,
+      "val_accuracy": 0.64
+    },
+    {
+      "epoch": 5,
+      "train_loss": 0.7865016897916793,
+      "train_accuracy": 0.6465,
+      "val_loss": 0.7994629460573196,
+      "val_accuracy": 0.64
+    }
+  ],
+  "config": {
+    "tokenizer": "bert-base-uncased",
+    "max_seq_len": 128,
+    "hidden_dim": 512,
+    "num_experts": 8,
+    "top_k": 2,
+    "lora_rank": 16,
+    "lora_alpha": 32,
+    "batch_size": 16,
+    "learning_rate": 0.001,
+    "num_epochs": 5,
+    "seed": 42,
+    "device": "cuda",
+    "hf_repo": "Deepu1965/bonus3-lora-moe"
+  },
+  "param_counts": {
+    "trainable": 21625092,
+    "frozen": 33603584,
+    "total": 55228676
+  },
+  "expert_usage": [
+    270.3500061035156,
+    583.625,
+    598.9650268554688,
+    359.67999267578125,
+    425.7900085449219,
+    603.489990234375,
+    1022.885009765625,
+    231.21499633789062
+  ]
+}

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:53be695e8fa32b0f3f871d66bcf00394b365fd99e0c747ad7a2db73979d059cd
+size 221009538

tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

tokenizer/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff