besimray
/

test

@@ -58,7 +58,7 @@ max_steps: 10
 micro_batch_size: 7
 mlflow_experiment_name: mhenrichsen/alpaca_2k_test
 model_type: LlamaForCausalLM
-num_epochs: 4
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_besimray
 pad_to_sequence_len: false
@@ -78,7 +78,7 @@ wandb_mode: online
 wandb_project: Public_TuningSN
 wandb_run: miner_id_24
 wandb_runid: 383a850e-bb15-45a2-8f4b-fc96eb001a74
-warmup_steps: 10
 weight_decay: 0.0
 xformers_attention: null
@@ -90,7 +90,7 @@ xformers_attention: null
 This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.2202
 ## Model description
@@ -117,7 +117,7 @@ The following hyperparameters were used during training:
 - total_train_batch_size: 28
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 10
 - training_steps: 10
 ### Training results
@@ -125,15 +125,15 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.3327        | 0.0147 | 1    | 1.2694          |
-| 1.1887        | 0.0294 | 2    | 1.2705          |
-| 1.5717        | 0.0441 | 3    | 1.2656          |
-| 1.3113        | 0.0588 | 4    | 1.2619          |
-| 1.3671        | 0.0735 | 5    | 1.2536          |
-| 1.4151        | 0.0882 | 6    | 1.2436          |
-| 1.2607        | 0.1029 | 7    | 1.2301          |
-| 1.4189        | 0.1176 | 8    | 1.2256          |
-| 1.3843        | 0.1324 | 9    | 1.2237          |
-| 1.3753        | 0.1471 | 10   | 1.2202          |
 ### Framework versions

 micro_batch_size: 7
 mlflow_experiment_name: mhenrichsen/alpaca_2k_test
 model_type: LlamaForCausalLM
+num_epochs: 20
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_besimray
 pad_to_sequence_len: false
 wandb_project: Public_TuningSN
 wandb_run: miner_id_24
 wandb_runid: 383a850e-bb15-45a2-8f4b-fc96eb001a74
+warmup_steps: 100
 weight_decay: 0.0
 xformers_attention: null
 This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.2638
 ## Model description
 - total_train_batch_size: 28
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 100
 - training_steps: 10
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 1.3327        | 0.0147 | 1    | 1.2694          |
+| 1.1887        | 0.0294 | 2    | 1.2676          |
+| 1.5761        | 0.0441 | 3    | 1.2679          |
+| 1.3197        | 0.0588 | 4    | 1.2693          |
+| 1.3721        | 0.0735 | 5    | 1.2674          |
+| 1.4327        | 0.0882 | 6    | 1.2674          |
+| 1.2795        | 0.1029 | 7    | 1.2692          |
+| 1.4695        | 0.1176 | 8    | 1.2674          |
+| 1.4243        | 0.1324 | 9    | 1.2657          |
+| 1.4099        | 0.1471 | 10   | 1.2638          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c2295b41ac661bb1f048c5ea31fe90887943731c0624ee1b94ce0b0510bf55c3
 size 67713738

 version https://git-lfs.github.com/spec/v1
+oid sha256:339f69e146da45ed44b91d2e55f3389d74d840b0727be524af0a85b4f71ec13e
 size 67713738