rbos/ACLED_DeepSeek_Fine-Tuned_Classifier-unsloth-DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit

Browse files

Files changed (4) hide show

README.md +18 -18
adapter_config.json +6 -6
adapter_model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.3521
 ## Model description
@@ -46,29 +46,29 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 32
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 10
 - num_epochs: 3
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 0.8251        | 0.1773 | 50   | 0.6503          |
-| 0.5281        | 0.3546 | 100  | 0.4812          |
-| 0.452         | 0.5319 | 150  | 0.4312          |
-| 0.4297        | 0.7092 | 200  | 0.4092          |
-| 0.4191        | 0.8865 | 250  | 0.3957          |
-| 0.3891        | 1.0638 | 300  | 0.3862          |
-| 0.3824        | 1.2411 | 350  | 0.3793          |
-| 0.3762        | 1.4184 | 400  | 0.3729          |
-| 0.3661        | 1.5957 | 450  | 0.3690          |
-| 0.3778        | 1.7730 | 500  | 0.3652          |
-| 0.3724        | 1.9504 | 550  | 0.3612          |
-| 0.351         | 2.1277 | 600  | 0.3591          |
-| 0.3407        | 2.3050 | 650  | 0.3581          |
-| 0.3485        | 2.4823 | 700  | 0.3547          |
-| 0.3411        | 2.6596 | 750  | 0.3532          |
-| 0.343         | 2.8369 | 800  | 0.3521          |
 ### Framework versions

 This model is a fine-tuned version of [unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit](https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.3463
 ## Model description
 - total_train_batch_size: 32
 - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 20
 - num_epochs: 3
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 0.9219        | 0.1773 | 50   | 0.6719          |
+| 0.529         | 0.3546 | 100  | 0.4809          |
+| 0.448         | 0.5319 | 150  | 0.4260          |
+| 0.4242        | 0.7092 | 200  | 0.4043          |
+| 0.4133        | 0.8865 | 250  | 0.3898          |
+| 0.3839        | 1.0638 | 300  | 0.3803          |
+| 0.3767        | 1.2411 | 350  | 0.3735          |
+| 0.3708        | 1.4184 | 400  | 0.3673          |
+| 0.3605        | 1.5957 | 450  | 0.3632          |
+| 0.3721        | 1.7730 | 500  | 0.3593          |
+| 0.3668        | 1.9504 | 550  | 0.3554          |
+| 0.3457        | 2.1277 | 600  | 0.3536          |
+| 0.3352        | 2.3050 | 650  | 0.3518          |
+| 0.3426        | 2.4823 | 700  | 0.3490          |
+| 0.3356        | 2.6596 | 750  | 0.3478          |
+| 0.3376        | 2.8369 | 800  | 0.3463          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -23,13 +23,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "v_proj",
-    "q_proj",
-    "k_proj",
-    "gate_proj",
-    "o_proj",
     "up_proj",
-    "down_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "up_proj",
+    "o_proj",
+    "gate_proj",
+    "k_proj",
+    "q_proj",
+    "down_proj",
+    "v_proj"
   ],
   "task_type": "CAUSAL_LM",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2ac29f9c910ce443e22b95c5d22065a4494d01b206e3b95a640628efdfd4b3a3
 size 83945296

 version https://git-lfs.github.com/spec/v1
+oid sha256:c8e43546f153b860ce564218512d310c5e1ade0db8a908613bf5f14aa7bb8068
 size 83945296

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:33d19787cd888e8059b7579e941ea460799765e929a9ff1b09b8607fcb61f2f4
 size 5304

 version https://git-lfs.github.com/spec/v1
+oid sha256:9b0a9ceb383c83e1a491d1dd6926482ad0e135551942eba2577aae492dbcdac4
 size 5304