besimray commited on
Commit
c919cc7
·
verified ·
1 Parent(s): ecaf1a3

End of training

Browse files
Files changed (2) hide show
  1. README.md +22 -36
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -1,6 +1,5 @@
1
  ---
2
  library_name: peft
3
- license: llama3.2
4
  base_model: unsloth/Llama-3.2-1B-Instruct
5
  tags:
6
  - axolotl
@@ -24,23 +23,14 @@ bf16: auto
24
  chat_template: llama3
25
  dataset_prepared_path: null
26
  datasets:
27
- - data_files:
28
- - MATH-Hard_train_data.json
29
- ds_type: json
30
- path: /workspace/input_data/MATH-Hard_train_data.json
31
- type:
32
- field_input: problem
33
- field_instruction: solution
34
- field_output: type
35
- system_format: '{system}'
36
- system_prompt: ''
37
  debug: null
38
  deepspeed: null
39
- early_stopping_patience: 3
40
  eval_max_new_tokens: 128
41
- eval_sample_packing: false
42
- eval_steps: 20
43
  eval_table_size: null
 
44
  flash_attention: true
45
  fp16: null
46
  fsdp: null
@@ -63,18 +53,18 @@ lora_model_dir: null
63
  lora_r: 16
64
  lora_target_linear: true
65
  lr_scheduler: cosine
66
- max_steps: 150
67
- micro_batch_size: 10
68
- mlflow_experiment_name: /tmp/MATH-Hard_train_data.json
69
  model_type: LlamaForCausalLM
70
- num_epochs: 10
71
  optimizer: adamw_bnb_8bit
72
- output_dir: miner_id_besimray
73
- pad_to_sequence_len: false
74
  resume_from_checkpoint: null
75
  s2_attention: null
76
  sample_packing: false
77
- save_steps: 20
78
  save_strategy: steps
79
  sequence_len: 4096
80
  strict: false
@@ -86,9 +76,9 @@ wandb_entity: besimray24-rayon
86
  wandb_mode: online
87
  wandb_project: Public_TuningSN
88
  wandb_run: miner_id_24
89
- wandb_runid: 3882ca50-f5d8-4a62-83cf-33e7720e8c52
90
  warmup_steps: 10
91
- weight_decay: 0.01
92
  xformers_attention: null
93
 
94
  ```
@@ -99,7 +89,7 @@ xformers_attention: null
99
 
100
  This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
101
  It achieves the following results on the evaluation set:
102
- - Loss: 0.0766
103
 
104
  ## Model description
105
 
@@ -119,28 +109,24 @@ More information needed
119
 
120
  The following hyperparameters were used during training:
121
  - learning_rate: 0.0002
122
- - train_batch_size: 10
123
- - eval_batch_size: 10
124
  - seed: 42
125
  - gradient_accumulation_steps: 4
126
- - total_train_batch_size: 40
127
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
128
  - lr_scheduler_type: cosine
129
  - lr_scheduler_warmup_steps: 10
130
- - training_steps: 150
131
 
132
  ### Training results
133
 
134
  | Training Loss | Epoch | Step | Validation Loss |
135
  |:-------------:|:------:|:----:|:---------------:|
136
- | 9.1 | 0.0129 | 1 | 8.9962 |
137
- | 0.3746 | 0.2572 | 20 | 0.3471 |
138
- | 0.2618 | 0.5145 | 40 | 0.1247 |
139
- | 0.106 | 0.7717 | 60 | 0.1141 |
140
- | 0.1457 | 1.0289 | 80 | 0.1035 |
141
- | 0.0493 | 1.2862 | 100 | 0.0947 |
142
- | 0.1237 | 1.5434 | 120 | 0.0765 |
143
- | 0.0294 | 1.8006 | 140 | 0.0766 |
144
 
145
 
146
  ### Framework versions
 
1
  ---
2
  library_name: peft
 
3
  base_model: unsloth/Llama-3.2-1B-Instruct
4
  tags:
5
  - axolotl
 
23
  chat_template: llama3
24
  dataset_prepared_path: null
25
  datasets:
26
+ - path: mhenrichsen/alpaca_2k_test
27
+ type: alpaca
 
 
 
 
 
 
 
 
28
  debug: null
29
  deepspeed: null
30
+ early_stopping_patience: null
31
  eval_max_new_tokens: 128
 
 
32
  eval_table_size: null
33
+ evals_per_epoch: 4
34
  flash_attention: true
35
  fp16: null
36
  fsdp: null
 
53
  lora_r: 16
54
  lora_target_linear: true
55
  lr_scheduler: cosine
56
+ max_steps: 10
57
+ micro_batch_size: 2
58
+ mlflow_experiment_name: mhenrichsen/alpaca_2k_test
59
  model_type: LlamaForCausalLM
60
+ num_epochs: 1
61
  optimizer: adamw_bnb_8bit
62
+ output_dir: miner_id_24
63
+ pad_to_sequence_len: true
64
  resume_from_checkpoint: null
65
  s2_attention: null
66
  sample_packing: false
67
+ save_steps: 5
68
  save_strategy: steps
69
  sequence_len: 4096
70
  strict: false
 
76
  wandb_mode: online
77
  wandb_project: Public_TuningSN
78
  wandb_run: miner_id_24
79
+ wandb_runid: 383a850e-bb15-45a2-8f4b-fc96eb001a74
80
  warmup_steps: 10
81
+ weight_decay: 0.0
82
  xformers_attention: null
83
 
84
  ```
 
89
 
90
  This model is a fine-tuned version of [unsloth/Llama-3.2-1B-Instruct](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct) on the None dataset.
91
  It achieves the following results on the evaluation set:
92
+ - Loss: 1.2201
93
 
94
  ## Model description
95
 
 
109
 
110
  The following hyperparameters were used during training:
111
  - learning_rate: 0.0002
112
+ - train_batch_size: 2
113
+ - eval_batch_size: 2
114
  - seed: 42
115
  - gradient_accumulation_steps: 4
116
+ - total_train_batch_size: 8
117
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
118
  - lr_scheduler_type: cosine
119
  - lr_scheduler_warmup_steps: 10
120
+ - training_steps: 10
121
 
122
  ### Training results
123
 
124
  | Training Loss | Epoch | Step | Validation Loss |
125
  |:-------------:|:------:|:----:|:---------------:|
126
+ | 1.3218 | 0.0042 | 1 | 1.2625 |
127
+ | 1.3092 | 0.0126 | 3 | 1.2572 |
128
+ | 1.4991 | 0.0253 | 6 | 1.2118 |
129
+ | 1.2957 | 0.0379 | 9 | 1.2201 |
 
 
 
 
130
 
131
 
132
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a9e5432c0904d2ebf3acbe3d9dc2636391218bc21433147b98fca1a4e966c12f
3
  size 45169354
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:675843a59309216350febea0140e8a214bf010da20f9877d322a7fda208f3c3d
3
  size 45169354