End of training
Browse files- README.md +34 -21
- adapter_model.bin +1 -1
- adapter_model.safetensors +1 -1
README.md
CHANGED
|
@@ -54,9 +54,9 @@ gradient_checkpointing: false
|
|
| 54 |
group_by_length: true
|
| 55 |
hub_model_id: baby-dev/test-09-01
|
| 56 |
hub_repo: null
|
| 57 |
-
hub_strategy:
|
| 58 |
hub_token: null
|
| 59 |
-
learning_rate: 0.
|
| 60 |
load_in_4bit: false
|
| 61 |
load_in_8bit: false
|
| 62 |
local_rank: null
|
|
@@ -67,7 +67,7 @@ lora_fan_in_fan_out: null
|
|
| 67 |
lora_model_dir: null
|
| 68 |
lora_r: 32
|
| 69 |
lora_target_linear: true
|
| 70 |
-
lr_scheduler:
|
| 71 |
max_grad_norm: 1.0
|
| 72 |
max_memory:
|
| 73 |
0: 75GB
|
|
@@ -113,7 +113,7 @@ xformers_attention: null
|
|
| 113 |
|
| 114 |
This model is a fine-tuned version of [peft-internal-testing/tiny-dummy-qwen2](https://huggingface.co/peft-internal-testing/tiny-dummy-qwen2) on the None dataset.
|
| 115 |
It achieves the following results on the evaluation set:
|
| 116 |
-
- Loss: 11.
|
| 117 |
|
| 118 |
## Model description
|
| 119 |
|
|
@@ -132,14 +132,14 @@ More information needed
|
|
| 132 |
### Training hyperparameters
|
| 133 |
|
| 134 |
The following hyperparameters were used during training:
|
| 135 |
-
- learning_rate: 0.
|
| 136 |
- train_batch_size: 4
|
| 137 |
- eval_batch_size: 4
|
| 138 |
- seed: 42
|
| 139 |
- gradient_accumulation_steps: 4
|
| 140 |
- total_train_batch_size: 16
|
| 141 |
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=adam_beta1=0.9,adam_beta2=0.95,adam_epsilon=1e-5
|
| 142 |
-
- lr_scheduler_type:
|
| 143 |
- lr_scheduler_warmup_steps: 50
|
| 144 |
- training_steps: 6007
|
| 145 |
|
|
@@ -148,21 +148,34 @@ The following hyperparameters were used during training:
|
|
| 148 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 149 |
|:-------------:|:-------:|:----:|:---------------:|
|
| 150 |
| No log | 0.0083 | 1 | 11.9304 |
|
| 151 |
-
| 12.
|
| 152 |
-
| 11.
|
| 153 |
-
| 11.
|
| 154 |
-
| 11.
|
| 155 |
-
| 12.
|
| 156 |
-
| 11.
|
| 157 |
-
| 11.
|
| 158 |
-
| 11.
|
| 159 |
-
| 12.
|
| 160 |
-
| 11.
|
| 161 |
-
| 11.
|
| 162 |
-
| 11.
|
| 163 |
-
| 12.
|
| 164 |
-
| 11.
|
| 165 |
-
| 11.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
|
| 168 |
### Framework versions
|
|
|
|
| 54 |
group_by_length: true
|
| 55 |
hub_model_id: baby-dev/test-09-01
|
| 56 |
hub_repo: null
|
| 57 |
+
hub_strategy: checkpoint
|
| 58 |
hub_token: null
|
| 59 |
+
learning_rate: 0.0001
|
| 60 |
load_in_4bit: false
|
| 61 |
load_in_8bit: false
|
| 62 |
local_rank: null
|
|
|
|
| 67 |
lora_model_dir: null
|
| 68 |
lora_r: 32
|
| 69 |
lora_target_linear: true
|
| 70 |
+
lr_scheduler: linear
|
| 71 |
max_grad_norm: 1.0
|
| 72 |
max_memory:
|
| 73 |
0: 75GB
|
|
|
|
| 113 |
|
| 114 |
This model is a fine-tuned version of [peft-internal-testing/tiny-dummy-qwen2](https://huggingface.co/peft-internal-testing/tiny-dummy-qwen2) on the None dataset.
|
| 115 |
It achieves the following results on the evaluation set:
|
| 116 |
+
- Loss: 11.8994
|
| 117 |
|
| 118 |
## Model description
|
| 119 |
|
|
|
|
| 132 |
### Training hyperparameters
|
| 133 |
|
| 134 |
The following hyperparameters were used during training:
|
| 135 |
+
- learning_rate: 0.0001
|
| 136 |
- train_batch_size: 4
|
| 137 |
- eval_batch_size: 4
|
| 138 |
- seed: 42
|
| 139 |
- gradient_accumulation_steps: 4
|
| 140 |
- total_train_batch_size: 16
|
| 141 |
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=adam_beta1=0.9,adam_beta2=0.95,adam_epsilon=1e-5
|
| 142 |
+
- lr_scheduler_type: linear
|
| 143 |
- lr_scheduler_warmup_steps: 50
|
| 144 |
- training_steps: 6007
|
| 145 |
|
|
|
|
| 148 |
| Training Loss | Epoch | Step | Validation Loss |
|
| 149 |
|:-------------:|:-------:|:----:|:---------------:|
|
| 150 |
| No log | 0.0083 | 1 | 11.9304 |
|
| 151 |
+
| 12.1074 | 1.2474 | 150 | 11.9141 |
|
| 152 |
+
| 11.917 | 2.4948 | 300 | 11.9077 |
|
| 153 |
+
| 11.9081 | 3.7422 | 450 | 11.9052 |
|
| 154 |
+
| 11.9026 | 4.9896 | 600 | 11.9038 |
|
| 155 |
+
| 12.0859 | 6.2370 | 750 | 11.9026 |
|
| 156 |
+
| 11.9028 | 7.4844 | 900 | 11.9019 |
|
| 157 |
+
| 11.8998 | 8.7318 | 1050 | 11.9016 |
|
| 158 |
+
| 11.9048 | 9.9792 | 1200 | 11.9015 |
|
| 159 |
+
| 12.084 | 11.2266 | 1350 | 11.9014 |
|
| 160 |
+
| 11.8994 | 12.4740 | 1500 | 11.9011 |
|
| 161 |
+
| 11.8969 | 13.7214 | 1650 | 11.9008 |
|
| 162 |
+
| 11.8969 | 14.9688 | 1800 | 11.9005 |
|
| 163 |
+
| 12.0752 | 16.2162 | 1950 | 11.9004 |
|
| 164 |
+
| 11.8995 | 17.4636 | 2100 | 11.9006 |
|
| 165 |
+
| 11.9041 | 18.7110 | 2250 | 11.9004 |
|
| 166 |
+
| 11.9008 | 19.9584 | 2400 | 11.9004 |
|
| 167 |
+
| 12.0829 | 21.2058 | 2550 | 11.9002 |
|
| 168 |
+
| 11.9013 | 22.4532 | 2700 | 11.8999 |
|
| 169 |
+
| 11.9025 | 23.7006 | 2850 | 11.8999 |
|
| 170 |
+
| 11.8988 | 24.9480 | 3000 | 11.8996 |
|
| 171 |
+
| 12.0787 | 26.1954 | 3150 | 11.8996 |
|
| 172 |
+
| 11.8966 | 27.4428 | 3300 | 11.8996 |
|
| 173 |
+
| 11.8997 | 28.6902 | 3450 | 11.8996 |
|
| 174 |
+
| 11.9017 | 29.9376 | 3600 | 11.8995 |
|
| 175 |
+
| 12.0742 | 31.1850 | 3750 | 11.8995 |
|
| 176 |
+
| 11.8992 | 32.4324 | 3900 | 11.8992 |
|
| 177 |
+
| 11.9043 | 33.6798 | 4050 | 11.8994 |
|
| 178 |
+
| 11.895 | 34.9272 | 4200 | 11.8994 |
|
| 179 |
|
| 180 |
|
| 181 |
### Framework versions
|
adapter_model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 55170
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b4508921f2ccb5fd88d4646b57d46f9749b5d720c519b7e10c595067c7e6ded1
|
| 3 |
size 55170
|
adapter_model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 48552
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:24c671988f484779d1bc65950834eaef9e98c12954bf650fea29c88a72d70f6b
|
| 3 |
size 48552
|