jjovalle/llama3.1_8binstruct-summary-100s2

Files changed (4) hide show

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the generator dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.9178
 ## Model description
@@ -39,10 +39,12 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 1
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: constant
 - lr_scheduler_warmup_steps: 3
@@ -50,12 +52,12 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 1.4039        | 1.1364 | 25   | 1.3024          |
-| 0.5807        | 2.2727 | 50   | 1.5027          |
-| 0.3786        | 3.4091 | 75   | 1.6451          |
-| 0.1085        | 4.5455 | 100  | 1.9178          |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.7064
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 2
 - eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 8
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: constant
 - lr_scheduler_warmup_steps: 3
 ### Training results
+| Training Loss | Epoch   | Step | Validation Loss |
+|:-------------:|:-------:|:----:|:---------------:|
+| 1.5991        | 9.0909  | 25   | 1.4645          |
+| 1.0686        | 18.1818 | 50   | 1.3393          |
+| 0.8047        | 27.2727 | 75   | 1.3950          |
+| 0.3302        | 36.3636 | 100  | 1.7064          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
-    "gate_proj",
-    "o_proj",
     "up_proj",
     "q_proj",
-    "down_proj",
-    "k_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "k_proj",
+    "down_proj",
     "v_proj",
     "up_proj",
     "q_proj",
+    "o_proj",
+    "gate_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f38b45fbb374874d20b2b32e850aa43976e31a6912bfeeb0be68f3b2644c009b
 size 167832240

 version https://git-lfs.github.com/spec/v1
+oid sha256:e8cd810fa28e2e74abbd3dcfc1c32d35aef08587620b3a262ee5227f63d34222
 size 167832240

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f295726a12065896a0e97bb0e2aab7827f5fd0dc6dbbab4834b9c4cc08869c20
 size 5496

 version https://git-lfs.github.com/spec/v1
+oid sha256:717f5ae2bf8f1448b3b9f7e36907876c9b403f4e3b818d6aba9b90a95adf646f
 size 5496