strickvl
/

isafpr-mistral-lora

@@ -26,6 +26,9 @@ load_in_8bit: false
 load_in_4bit: true
 strict: false
 datasets:
   - path: data/isaf_press_releases_ft.jsonl
     conversation: alpaca
@@ -64,7 +67,7 @@ wandb_log_model:
 gradient_accumulation_steps: 4
 micro_batch_size: 2
-num_epochs: 1
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.0002
@@ -106,7 +109,7 @@ special_tokens:
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0456
 ## Model description
@@ -137,16 +140,28 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- num_epochs: 1
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.3462        | 0.0292 | 1    | 1.3536          |
-| 0.1247        | 0.2628 | 9    | 0.0949          |
-| 0.0526        | 0.5255 | 18   | 0.0533          |
-| 0.0448        | 0.7883 | 27   | 0.0456          |
 ### Framework versions

 load_in_4bit: true
 strict: false
+data_seed: 42
+seed: 42
 datasets:
   - path: data/isaf_press_releases_ft.jsonl
     conversation: alpaca
 gradient_accumulation_steps: 4
 micro_batch_size: 2
+num_epochs: 4
 optimizer: adamw_bnb_8bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0288
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 4
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.3462        | 0.0292 | 1    | 1.3536          |
+| 0.1245        | 0.2628 | 9    | 0.0958          |
+| 0.0521        | 0.5255 | 18   | 0.0523          |
+| 0.0437        | 0.7883 | 27   | 0.0420          |
+| 0.0312        | 1.0292 | 36   | 0.0383          |
+| 0.0395        | 1.2920 | 45   | 0.0351          |
+| 0.0309        | 1.5547 | 54   | 0.0329          |
+| 0.0342        | 1.8175 | 63   | 0.0314          |
+| 0.0334        | 2.0511 | 72   | 0.0318          |
+| 0.0282        | 2.3139 | 81   | 0.0322          |
+| 0.0263        | 2.5766 | 90   | 0.0301          |
+| 0.0255        | 2.8394 | 99   | 0.0294          |
+| 0.021         | 3.0803 | 108  | 0.0289          |
+| 0.0236        | 3.3431 | 117  | 0.0289          |
+| 0.0196        | 3.6058 | 126  | 0.0288          |
+| 0.0228        | 3.8686 | 135  | 0.0288          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d9724d5fb70e46c9450314a19527cc8038db5de23aa0161feaa36758bf310379
 size 335706186

 version https://git-lfs.github.com/spec/v1
+oid sha256:226abc5664ceb9d1b6b0db5a67a7a5f11c76e51be8d38e8d47612048bff3da1c
 size 335706186