Hemanth-thunder/aimo-lora

Files changed (4) hide show

README.md CHANGED Viewed

@@ -18,12 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
-- eval_loss: 0.3813
-- eval_runtime: 140.2175
-- eval_samples_per_second: 15.74
-- eval_steps_per_second: 7.873
-- epoch: 0.16
-- step: 400
 ## Model description
@@ -53,6 +48,29 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 3
 ### Framework versions
 - PEFT 0.8.2

 This model is a fine-tuned version of [microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.3525
 ## Model description
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 3
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.3351        | 0.17  | 500  | 0.3755          |
+| 0.3312        | 0.34  | 1000 | 0.3644          |
+| 0.3079        | 0.51  | 1500 | 0.3597          |
+| 0.3195        | 0.68  | 2000 | 0.3577          |
+| 0.3218        | 0.85  | 2500 | 0.3557          |
+| 0.3034        | 1.02  | 3000 | 0.3553          |
+| 0.296         | 1.19  | 3500 | 0.3543          |
+| 0.3175        | 1.36  | 4000 | 0.3539          |
+| 0.3257        | 1.53  | 4500 | 0.3533          |
+| 0.3263        | 1.7   | 5000 | 0.3526          |
+| 0.3209        | 1.87  | 5500 | 0.3522          |
+| 0.3221        | 2.04  | 6000 | 0.3528          |
+| 0.2927        | 2.21  | 6500 | 0.3526          |
+| 0.2922        | 2.38  | 7000 | 0.3527          |
+| 0.2968        | 2.55  | 7500 | 0.3525          |
+| 0.2968        | 2.72  | 8000 | 0.3526          |
+| 0.3094        | 2.89  | 8500 | 0.3525          |
 ### Framework versions
 - PEFT 0.8.2

adapter_config.json CHANGED Viewed

@@ -15,17 +15,17 @@
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
-  "r": 16,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "gate_proj",
-    "q_proj",
     "v_proj",
-    "down_proj",
     "o_proj",
     "up_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_rslora": false

   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
+  "r": 32,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
     "o_proj",
+    "down_proj",
+    "k_proj",
     "up_proj",
+    "gate_proj",
+    "q_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_rslora": false

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bb719aca714fb08f3f84d97e05fe34092e37e0ab7702ee346bb35c8f2df16d41
-size 17842848

 version https://git-lfs.github.com/spec/v1
+oid sha256:c72cc8322717038ed30a3b43fb773df23d987343b384623f55d7f661f69356ab
+size 35668720

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:81f2706a6ada6e12e9c5df739346756516ee6933f736fae592267ec499850a9b
 size 4856

 version https://git-lfs.github.com/spec/v1
+oid sha256:69f1163156a6e093294761f498ed7b169cc46914b0048e784e7b099836d24e5a
 size 4856