noeloco commited on
Commit
08b6745
·
verified ·
1 Parent(s): e69ac6c

End of training

Browse files
Files changed (2) hide show
  1. README.md +22 -22
  2. adapter_model.bin +2 -2
README.md CHANGED
@@ -26,7 +26,7 @@ is_llama_derived_model: true
26
  hub_model_id: noeloco/camel-lora
27
 
28
  load_in_8bit: false
29
- load_in_4bit: false
30
  strict: false
31
 
32
  datasets:
@@ -40,14 +40,14 @@ val_set_size: 0.05
40
  output_dir: ./lora-out
41
  chat_template: chatml
42
 
43
- sequence_len: 2048
44
  sample_packing: false
45
  pad_to_sequence_len: true
46
 
47
- adapter: lora
48
  lora_model_dir:
49
- lora_r: 16
50
- lora_alpha: 8
51
  lora_dropout: 0.05
52
  lora_target_linear: true
53
  lora_fan_in_fan_out:
@@ -67,9 +67,9 @@ learning_rate: 0.0002
67
 
68
  train_on_inputs: false
69
  group_by_length: false
70
- bf16: true
71
  fp16: false
72
- tf32: false
73
 
74
  gradient_checkpointing: true
75
  early_stopping_patience:
@@ -100,7 +100,7 @@ special_tokens:
100
 
101
  This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
102
  It achieves the following results on the evaluation set:
103
- - Loss: 0.0383
104
 
105
  ## Model description
106
 
@@ -134,20 +134,20 @@ The following hyperparameters were used during training:
134
 
135
  | Training Loss | Epoch | Step | Validation Loss |
136
  |:-------------:|:-----:|:----:|:---------------:|
137
- | 1.7285 | 0.06 | 1 | 2.5189 |
138
- | 1.8487 | 0.29 | 5 | 2.4980 |
139
- | 1.4443 | 0.57 | 10 | 1.9379 |
140
- | 0.7471 | 0.86 | 15 | 1.0148 |
141
- | 0.561 | 1.14 | 20 | 0.5721 |
142
- | 0.2245 | 1.43 | 25 | 0.3640 |
143
- | 0.3456 | 1.71 | 30 | 0.1683 |
144
- | 0.2138 | 2.0 | 35 | 0.1051 |
145
- | 0.1145 | 2.29 | 40 | 0.0834 |
146
- | 0.1193 | 2.57 | 45 | 0.0526 |
147
- | 0.1083 | 2.86 | 50 | 0.0436 |
148
- | 0.1388 | 3.14 | 55 | 0.0387 |
149
- | 0.1102 | 3.43 | 60 | 0.0385 |
150
- | 0.0628 | 3.71 | 65 | 0.0383 |
151
 
152
 
153
  ### Framework versions
 
26
  hub_model_id: noeloco/camel-lora
27
 
28
  load_in_8bit: false
29
+ load_in_4bit: true
30
  strict: false
31
 
32
  datasets:
 
40
  output_dir: ./lora-out
41
  chat_template: chatml
42
 
43
+ sequence_len: 4096
44
  sample_packing: false
45
  pad_to_sequence_len: true
46
 
47
+ adapter: qlora
48
  lora_model_dir:
49
+ lora_r: 32
50
+ lora_alpha: 16
51
  lora_dropout: 0.05
52
  lora_target_linear: true
53
  lora_fan_in_fan_out:
 
67
 
68
  train_on_inputs: false
69
  group_by_length: false
70
+ bf16: auto
71
  fp16: false
72
+ tf32: true
73
 
74
  gradient_checkpointing: true
75
  early_stopping_patience:
 
100
 
101
  This model is a fine-tuned version of [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) on the None dataset.
102
  It achieves the following results on the evaluation set:
103
+ - Loss: 0.0290
104
 
105
  ## Model description
106
 
 
134
 
135
  | Training Loss | Epoch | Step | Validation Loss |
136
  |:-------------:|:-----:|:----:|:---------------:|
137
+ | 1.7685 | 0.06 | 1 | 2.5524 |
138
+ | 1.8762 | 0.29 | 5 | 2.4927 |
139
+ | 1.215 | 0.57 | 10 | 1.4546 |
140
+ | 0.484 | 0.86 | 15 | 0.7250 |
141
+ | 0.3667 | 1.14 | 20 | 0.4146 |
142
+ | 0.1638 | 1.43 | 25 | 0.2123 |
143
+ | 0.2948 | 1.71 | 30 | 0.0980 |
144
+ | 0.2003 | 2.0 | 35 | 0.0629 |
145
+ | 0.0888 | 2.29 | 40 | 0.0577 |
146
+ | 0.0918 | 2.57 | 45 | 0.0414 |
147
+ | 0.0931 | 2.86 | 50 | 0.0363 |
148
+ | 0.0982 | 3.14 | 55 | 0.0304 |
149
+ | 0.0849 | 3.43 | 60 | 0.0289 |
150
+ | 0.0511 | 3.71 | 65 | 0.0290 |
151
 
152
 
153
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3b66e19266959041f6863e2ae34bc5aee2db2a88b45bac11641d5327b69dc9df
3
- size 80115914
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:430885f7d5e76c656bb16d0c30097c03805f898ac3586031af9f6c6c1d88520a
3
+ size 160069834