End of training
Browse files- README.md +8 -8
- adapter_model.bin +1 -1
README.md
CHANGED
|
@@ -47,7 +47,7 @@ flash_attention: false
|
|
| 47 |
fp16: true
|
| 48 |
fsdp: null
|
| 49 |
fsdp_config: null
|
| 50 |
-
gradient_accumulation_steps:
|
| 51 |
gradient_checkpointing: true
|
| 52 |
group_by_length: false
|
| 53 |
hub_model_id: error577/cb2f8889-7cc4-4506-8b3f-9bd75b18db3d
|
|
@@ -103,7 +103,7 @@ xformers_attention: null
|
|
| 103 |
|
| 104 |
This model is a fine-tuned version of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on the None dataset.
|
| 105 |
It achieves the following results on the evaluation set:
|
| 106 |
-
- Loss: 3.
|
| 107 |
|
| 108 |
## Model description
|
| 109 |
|
|
@@ -126,8 +126,8 @@ The following hyperparameters were used during training:
|
|
| 126 |
- train_batch_size: 1
|
| 127 |
- eval_batch_size: 1
|
| 128 |
- seed: 42
|
| 129 |
-
- gradient_accumulation_steps:
|
| 130 |
-
- total_train_batch_size:
|
| 131 |
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 132 |
- lr_scheduler_type: cosine
|
| 133 |
- lr_scheduler_warmup_steps: 10
|
|
@@ -137,10 +137,10 @@ The following hyperparameters were used during training:
|
|
| 137 |
|
| 138 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 139 |
|:-------------:|:------:|:----:|:---------------:|
|
| 140 |
-
| 3.
|
| 141 |
-
| 3.
|
| 142 |
-
|
|
| 143 |
-
| 3.
|
| 144 |
|
| 145 |
|
| 146 |
### Framework versions
|
|
|
|
| 47 |
fp16: true
|
| 48 |
fsdp: null
|
| 49 |
fsdp_config: null
|
| 50 |
+
gradient_accumulation_steps: 24
|
| 51 |
gradient_checkpointing: true
|
| 52 |
group_by_length: false
|
| 53 |
hub_model_id: error577/cb2f8889-7cc4-4506-8b3f-9bd75b18db3d
|
|
|
|
| 103 |
|
| 104 |
This model is a fine-tuned version of [NousResearch/Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) on the None dataset.
|
| 105 |
It achieves the following results on the evaluation set:
|
| 106 |
+
- Loss: 3.5350
|
| 107 |
|
| 108 |
## Model description
|
| 109 |
|
|
|
|
| 126 |
- train_batch_size: 1
|
| 127 |
- eval_batch_size: 1
|
| 128 |
- seed: 42
|
| 129 |
+
- gradient_accumulation_steps: 24
|
| 130 |
+
- total_train_batch_size: 24
|
| 131 |
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 132 |
- lr_scheduler_type: cosine
|
| 133 |
- lr_scheduler_warmup_steps: 10
|
|
|
|
| 137 |
|
| 138 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 139 |
|:-------------:|:------:|:----:|:---------------:|
|
| 140 |
+
| 3.3147 | 0.0018 | 1 | 3.6378 |
|
| 141 |
+
| 3.391 | 0.0037 | 2 | 3.6362 |
|
| 142 |
+
| 3.105 | 0.0073 | 4 | 3.6184 |
|
| 143 |
+
| 3.5426 | 0.0110 | 6 | 3.5350 |
|
| 144 |
|
| 145 |
|
| 146 |
### Framework versions
|
adapter_model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 42104138
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:efdd3c2a88438634f09abfbbb9a9940eac8ca3674ad5002b0e66727e5b5e6b30
|
| 3 |
size 42104138
|