End of training

Files changed (5) hide show

README.md CHANGED Viewed

@@ -51,7 +51,7 @@ fp16: false
 fsdp: null
 fsdp_config: null
 gradient_accumulation_steps: 4
-gradient_checkpointing: false
 group_by_length: true
 hub_model_id: baby-dev/test-default-01
 hub_repo: null
@@ -114,7 +114,7 @@ xformers_attention: null
 This model is a fine-tuned version of [HuggingFaceM4/tiny-random-LlamaForCausalLM](https://huggingface.co/HuggingFaceM4/tiny-random-LlamaForCausalLM) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 10.0461
 ## Model description
@@ -148,12 +148,12 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| No log        | 0.0017 | 1    | 10.3632         |
-| 10.1451       | 0.0846 | 50   | 10.1555         |
-| 10.0091       | 0.1693 | 100  | 10.0546         |
-| 10.0116       | 0.2539 | 150  | 10.0513         |
-| 10.0141       | 0.3386 | 200  | 10.0488         |
-| 10.0093       | 0.4232 | 250  | 10.0461         |
 ### Framework versions

 fsdp: null
 fsdp_config: null
 gradient_accumulation_steps: 4
+gradient_checkpointing: true
 group_by_length: true
 hub_model_id: baby-dev/test-default-01
 hub_repo: null
 This model is a fine-tuned version of [HuggingFaceM4/tiny-random-LlamaForCausalLM](https://huggingface.co/HuggingFaceM4/tiny-random-LlamaForCausalLM) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 10.0453
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| No log        | 0.0017 | 1    | 10.3635         |
+| 10.1304       | 0.0846 | 50   | 10.1295         |
+| 10.0067       | 0.1693 | 100  | 10.0532         |
+| 10.012        | 0.2539 | 150  | 10.0511         |
+| 10.0145       | 0.3386 | 200  | 10.0490         |
+| 10.0083       | 0.4232 | 250  | 10.0453         |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,13 +20,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "v_proj",
     "down_proj",
-    "o_proj",
-    "k_proj",
     "gate_proj",
-    "q_proj",
-    "up_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "q_proj",
     "down_proj",
     "gate_proj",
+    "up_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:173082152d06d9881a0ba836787780672922d74b240bc452817a1a9e1dd65a88
 size 104322

 version https://git-lfs.github.com/spec/v1
+oid sha256:3e75e617f9e6840d8288418829673005f2fe868ad7d77c870670479a4900aa5e
 size 104322

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:787c931a61dcef9a09f4865f585ca9a1676ee6c8557bc725f6b3d862616f39ad
 size 97728

 version https://git-lfs.github.com/spec/v1
+oid sha256:a44d204943dcb33adf4a57f7ef22094330b22960900c28db21b98ccba913b9f6
 size 97728

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:247807b9fbae7bb66e11564d2bbc1ddd55c45a3a7ad93a60eaea6cb9b7e34d32
 size 6776

 version https://git-lfs.github.com/spec/v1
+oid sha256:83d8db823c389ea7d7b23fe13e45ed7c229764879e8d6d9de4bcbc82529ea944
 size 6776