baby-dev commited on
Commit
d471f4f
·
verified ·
1 Parent(s): 867b184

End of training

Browse files
Files changed (3) hide show
  1. README.md +34 -21
  2. adapter_model.bin +1 -1
  3. adapter_model.safetensors +1 -1
README.md CHANGED
@@ -54,9 +54,9 @@ gradient_checkpointing: false
54
  group_by_length: true
55
  hub_model_id: baby-dev/test-09-01
56
  hub_repo: null
57
- hub_strategy: end
58
  hub_token: null
59
- learning_rate: 0.0002
60
  load_in_4bit: false
61
  load_in_8bit: false
62
  local_rank: null
@@ -67,7 +67,7 @@ lora_fan_in_fan_out: null
67
  lora_model_dir: null
68
  lora_r: 32
69
  lora_target_linear: true
70
- lr_scheduler: constant
71
  max_grad_norm: 1.0
72
  max_memory:
73
  0: 75GB
@@ -113,7 +113,7 @@ xformers_attention: null
113
 
114
  This model is a fine-tuned version of [peft-internal-testing/tiny-dummy-qwen2](https://huggingface.co/peft-internal-testing/tiny-dummy-qwen2) on the None dataset.
115
  It achieves the following results on the evaluation set:
116
- - Loss: 11.8980
117
 
118
  ## Model description
119
 
@@ -132,14 +132,14 @@ More information needed
132
  ### Training hyperparameters
133
 
134
  The following hyperparameters were used during training:
135
- - learning_rate: 0.0002
136
  - train_batch_size: 4
137
  - eval_batch_size: 4
138
  - seed: 42
139
  - gradient_accumulation_steps: 4
140
  - total_train_batch_size: 16
141
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=adam_beta1=0.9,adam_beta2=0.95,adam_epsilon=1e-5
142
- - lr_scheduler_type: constant
143
  - lr_scheduler_warmup_steps: 50
144
  - training_steps: 6007
145
 
@@ -148,21 +148,34 @@ The following hyperparameters were used during training:
148
  | Training Loss | Epoch | Step | Validation Loss |
149
  |:-------------:|:-------:|:----:|:---------------:|
150
  | No log | 0.0083 | 1 | 11.9304 |
151
- | 12.0987 | 1.2474 | 150 | 11.9076 |
152
- | 11.9124 | 2.4948 | 300 | 11.9027 |
153
- | 11.9043 | 3.7422 | 450 | 11.9013 |
154
- | 11.8992 | 4.9896 | 600 | 11.9006 |
155
- | 12.0835 | 6.2370 | 750 | 11.8999 |
156
- | 11.9004 | 7.4844 | 900 | 11.8995 |
157
- | 11.8977 | 8.7318 | 1050 | 11.8993 |
158
- | 11.9026 | 9.9792 | 1200 | 11.8991 |
159
- | 12.0817 | 11.2266 | 1350 | 11.8989 |
160
- | 11.8973 | 12.4740 | 1500 | 11.8987 |
161
- | 11.8949 | 13.7214 | 1650 | 11.8983 |
162
- | 11.8948 | 14.9688 | 1800 | 11.8980 |
163
- | 12.0731 | 16.2162 | 1950 | 11.8979 |
164
- | 11.8973 | 17.4636 | 2100 | 11.8981 |
165
- | 11.9018 | 18.7110 | 2250 | 11.8980 |
 
 
 
 
 
 
 
 
 
 
 
 
 
166
 
167
 
168
  ### Framework versions
 
54
  group_by_length: true
55
  hub_model_id: baby-dev/test-09-01
56
  hub_repo: null
57
+ hub_strategy: checkpoint
58
  hub_token: null
59
+ learning_rate: 0.0001
60
  load_in_4bit: false
61
  load_in_8bit: false
62
  local_rank: null
 
67
  lora_model_dir: null
68
  lora_r: 32
69
  lora_target_linear: true
70
+ lr_scheduler: linear
71
  max_grad_norm: 1.0
72
  max_memory:
73
  0: 75GB
 
113
 
114
  This model is a fine-tuned version of [peft-internal-testing/tiny-dummy-qwen2](https://huggingface.co/peft-internal-testing/tiny-dummy-qwen2) on the None dataset.
115
  It achieves the following results on the evaluation set:
116
+ - Loss: 11.8994
117
 
118
  ## Model description
119
 
 
132
  ### Training hyperparameters
133
 
134
  The following hyperparameters were used during training:
135
+ - learning_rate: 0.0001
136
  - train_batch_size: 4
137
  - eval_batch_size: 4
138
  - seed: 42
139
  - gradient_accumulation_steps: 4
140
  - total_train_batch_size: 16
141
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=adam_beta1=0.9,adam_beta2=0.95,adam_epsilon=1e-5
142
+ - lr_scheduler_type: linear
143
  - lr_scheduler_warmup_steps: 50
144
  - training_steps: 6007
145
 
 
148
  | Training Loss | Epoch | Step | Validation Loss |
149
  |:-------------:|:-------:|:----:|:---------------:|
150
  | No log | 0.0083 | 1 | 11.9304 |
151
+ | 12.1074 | 1.2474 | 150 | 11.9141 |
152
+ | 11.917 | 2.4948 | 300 | 11.9077 |
153
+ | 11.9081 | 3.7422 | 450 | 11.9052 |
154
+ | 11.9026 | 4.9896 | 600 | 11.9038 |
155
+ | 12.0859 | 6.2370 | 750 | 11.9026 |
156
+ | 11.9028 | 7.4844 | 900 | 11.9019 |
157
+ | 11.8998 | 8.7318 | 1050 | 11.9016 |
158
+ | 11.9048 | 9.9792 | 1200 | 11.9015 |
159
+ | 12.084 | 11.2266 | 1350 | 11.9014 |
160
+ | 11.8994 | 12.4740 | 1500 | 11.9011 |
161
+ | 11.8969 | 13.7214 | 1650 | 11.9008 |
162
+ | 11.8969 | 14.9688 | 1800 | 11.9005 |
163
+ | 12.0752 | 16.2162 | 1950 | 11.9004 |
164
+ | 11.8995 | 17.4636 | 2100 | 11.9006 |
165
+ | 11.9041 | 18.7110 | 2250 | 11.9004 |
166
+ | 11.9008 | 19.9584 | 2400 | 11.9004 |
167
+ | 12.0829 | 21.2058 | 2550 | 11.9002 |
168
+ | 11.9013 | 22.4532 | 2700 | 11.8999 |
169
+ | 11.9025 | 23.7006 | 2850 | 11.8999 |
170
+ | 11.8988 | 24.9480 | 3000 | 11.8996 |
171
+ | 12.0787 | 26.1954 | 3150 | 11.8996 |
172
+ | 11.8966 | 27.4428 | 3300 | 11.8996 |
173
+ | 11.8997 | 28.6902 | 3450 | 11.8996 |
174
+ | 11.9017 | 29.9376 | 3600 | 11.8995 |
175
+ | 12.0742 | 31.1850 | 3750 | 11.8995 |
176
+ | 11.8992 | 32.4324 | 3900 | 11.8992 |
177
+ | 11.9043 | 33.6798 | 4050 | 11.8994 |
178
+ | 11.895 | 34.9272 | 4200 | 11.8994 |
179
 
180
 
181
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:165aa5a7fe839e33187b0a24780193720ab54c01ee9cd9e7439bd62b11559d96
3
  size 55170
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4508921f2ccb5fd88d4646b57d46f9749b5d720c519b7e10c595067c7e6ded1
3
  size 55170
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:104d1f0f47cb1d033a95cc0751e04d58f04bd2a8195e48551100e3a832b5bf10
3
  size 48552
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24c671988f484779d1bc65950834eaef9e98c12954bf650fea29c88a72d70f6b
3
  size 48552