error577 commited on
Commit
0048343
·
verified ·
1 Parent(s): 04cf5d7

End of training

Browse files
Files changed (2) hide show
  1. README.md +13 -22
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -42,11 +42,11 @@ early_stopping_patience: null
42
  eval_max_new_tokens: 128
43
  eval_table_size: null
44
  evals_per_epoch: 4
45
- flash_attention: false
46
  fp16: null
47
  fsdp: null
48
  fsdp_config: null
49
- gradient_accumulation_steps: 32
50
  gradient_checkpointing: false
51
  group_by_length: false
52
  hub_model_id: error577/f296a352-5ef7-4927-bdf5-fbdb83a318df
@@ -58,18 +58,18 @@ load_in_4bit: true
58
  load_in_8bit: true
59
  local_rank: null
60
  logging_steps: 1
61
- lora_alpha: 16
62
  lora_dropout: 0.05
63
  lora_fan_in_fan_out: null
64
  lora_model_dir: null
65
  lora_r: 8
66
  lora_target_linear: true
67
  lr_scheduler: cosine
68
- max_steps: 25
69
  micro_batch_size: 1
70
  mlflow_experiment_name: /tmp/683387fc31d2cb3c_train_data.json
71
  model_type: AutoModelForCausalLM
72
- num_epochs: 4
73
  optimizer: adamw_bnb_8bit
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
@@ -102,7 +102,7 @@ xformers_attention: null
102
 
103
  This model is a fine-tuned version of [unsloth/Llama-3.2-3B](https://huggingface.co/unsloth/Llama-3.2-3B) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
- - Loss: nan
106
 
107
  ## Model description
108
 
@@ -125,30 +125,21 @@ The following hyperparameters were used during training:
125
  - train_batch_size: 1
126
  - eval_batch_size: 1
127
  - seed: 42
128
- - gradient_accumulation_steps: 32
129
- - total_train_batch_size: 32
130
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
  - lr_scheduler_type: cosine
132
  - lr_scheduler_warmup_steps: 10
133
- - training_steps: 25
134
 
135
  ### Training results
136
 
137
  | Training Loss | Epoch | Step | Validation Loss |
138
  |:-------------:|:------:|:----:|:---------------:|
139
- | 97986.6328 | 0.0010 | 1 | nan |
140
- | 735943.375 | 0.0019 | 2 | nan |
141
- | 11801915.0 | 0.0039 | 4 | nan |
142
- | 57.2586 | 0.0058 | 6 | nan |
143
- | 2.3866 | 0.0078 | 8 | nan |
144
- | 500473.1562 | 0.0097 | 10 | nan |
145
- | 36475.6328 | 0.0117 | 12 | nan |
146
- | 550951.4375 | 0.0136 | 14 | nan |
147
- | 70.6405 | 0.0155 | 16 | nan |
148
- | 8.1094 | 0.0175 | 18 | nan |
149
- | 4237.7134 | 0.0194 | 20 | nan |
150
- | 5055018.5 | 0.0214 | 22 | nan |
151
- | 935796.3125 | 0.0233 | 24 | nan |
152
 
153
 
154
  ### Framework versions
 
42
  eval_max_new_tokens: 128
43
  eval_table_size: null
44
  evals_per_epoch: 4
45
+ flash_attention: true
46
  fp16: null
47
  fsdp: null
48
  fsdp_config: null
49
+ gradient_accumulation_steps: 8
50
  gradient_checkpointing: false
51
  group_by_length: false
52
  hub_model_id: error577/f296a352-5ef7-4927-bdf5-fbdb83a318df
 
58
  load_in_8bit: true
59
  local_rank: null
60
  logging_steps: 1
61
+ lora_alpha: 32
62
  lora_dropout: 0.05
63
  lora_fan_in_fan_out: null
64
  lora_model_dir: null
65
  lora_r: 8
66
  lora_target_linear: true
67
  lr_scheduler: cosine
68
+ max_steps: 10
69
  micro_batch_size: 1
70
  mlflow_experiment_name: /tmp/683387fc31d2cb3c_train_data.json
71
  model_type: AutoModelForCausalLM
72
+ num_epochs: 1
73
  optimizer: adamw_bnb_8bit
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
 
102
 
103
  This model is a fine-tuned version of [unsloth/Llama-3.2-3B](https://huggingface.co/unsloth/Llama-3.2-3B) on the None dataset.
104
  It achieves the following results on the evaluation set:
105
+ - Loss: 0.7216
106
 
107
  ## Model description
108
 
 
125
  - train_batch_size: 1
126
  - eval_batch_size: 1
127
  - seed: 42
128
+ - gradient_accumulation_steps: 8
129
+ - total_train_batch_size: 8
130
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
131
  - lr_scheduler_type: cosine
132
  - lr_scheduler_warmup_steps: 10
133
+ - training_steps: 10
134
 
135
  ### Training results
136
 
137
  | Training Loss | Epoch | Step | Validation Loss |
138
  |:-------------:|:------:|:----:|:---------------:|
139
+ | 1.0211 | 0.0002 | 1 | 0.8500 |
140
+ | 0.9332 | 0.0007 | 3 | 0.8475 |
141
+ | 0.7886 | 0.0015 | 6 | 0.8099 |
142
+ | 0.7123 | 0.0022 | 9 | 0.7216 |
 
 
 
 
 
 
 
 
 
143
 
144
 
145
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6ed553faa3353e096fcb1b1c3e1aee64171765e2eb728942b5520e8baec334d8
3
  size 48768810
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4979742ac8b7ba81aa58d1ee29229c72fc35370c140eb1689cc14f37a9e81a32
3
  size 48768810