Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +164 -150
README.md CHANGED
@@ -1,151 +1,165 @@
1
- ---
2
- library_name: peft
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-1.5B
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - Aivesa/dataset_fe02391f-dc57-4aa5-9b78-45187ff5fb46
10
- model-index:
11
- - name: d037a536-e2d0-4e2b-9ac6-efaaacbaa2df
12
- results: []
13
- ---
14
-
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
- <details><summary>See axolotl config</summary>
20
-
21
- axolotl version: `0.6.0`
22
- ```yaml
23
- adapter: lora
24
- base_model: Qwen/Qwen2.5-1.5B
25
- bf16: auto
26
- chat_template: llama3
27
- dataset_prepared_path: /workspace/axolotl/data/prepared
28
- datasets:
29
- - ds_type: json
30
- format: custom
31
- path: Aivesa/dataset_fe02391f-dc57-4aa5-9b78-45187ff5fb46
32
- type:
33
- field_instruction: premise
34
- field_output: hypothesis
35
- system_format: '{system}'
36
- system_prompt: ''
37
- debug: null
38
- deepspeed: null
39
- early_stopping_patience: null
40
- eval_max_new_tokens: 128
41
- eval_table_size: null
42
- evals_per_epoch: 4
43
- flash_attention: false
44
- fp16: null
45
- fsdp: null
46
- fsdp_config: null
47
- gradient_accumulation_steps: 4
48
- gradient_checkpointing: false
49
- group_by_length: false
50
- hub_model_id: Aivesa/d037a536-e2d0-4e2b-9ac6-efaaacbaa2df
51
- hub_private_repo: true
52
- hub_repo: null
53
- hub_strategy: checkpoint
54
- hub_token: null
55
- learning_rate: 0.0002
56
- load_in_4bit: false
57
- load_in_8bit: false
58
- local_rank: null
59
- logging_steps: 1
60
- lora_alpha: 16
61
- lora_dropout: 0.05
62
- lora_fan_in_fan_out: null
63
- lora_model_dir: null
64
- lora_r: 8
65
- lora_target_linear: true
66
- lr_scheduler: cosine
67
- max_steps: 10
68
- micro_batch_size: 2
69
- model_type: AutoModelForCausalLM
70
- num_epochs: 1
71
- optimizer: adamw_bnb_8bit
72
- output_dir: /workspace/axolotl/outputs
73
- pad_to_sequence_len: true
74
- push_to_hub: true
75
- resume_from_checkpoint: null
76
- s2_attention: null
77
- sample_packing: false
78
- save_safetensors: true
79
- saves_per_epoch: 4
80
- sequence_len: 512
81
- strict: false
82
- tf32: false
83
- tokenizer_type: AutoTokenizer
84
- train_on_inputs: false
85
- trust_remote_code: true
86
- use_accelerate: true
87
- val_set_size: 0.05
88
- wandb_entity: null
89
- wandb_mode: online
90
- wandb_name: fe02391f-dc57-4aa5-9b78-45187ff5fb46
91
- wandb_project: Gradients-On-Demand
92
- wandb_run: your_name
93
- wandb_runid: fe02391f-dc57-4aa5-9b78-45187ff5fb46
94
- warmup_steps: 10
95
- weight_decay: 0.0
96
- xformers_attention: null
97
-
98
- ```
99
-
100
- </details><br>
101
-
102
- # d037a536-e2d0-4e2b-9ac6-efaaacbaa2df
103
-
104
- This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) on the Aivesa/dataset_fe02391f-dc57-4aa5-9b78-45187ff5fb46 dataset.
105
- It achieves the following results on the evaluation set:
106
- - Loss: 1.1429
107
-
108
- ## Model description
109
-
110
- More information needed
111
-
112
- ## Intended uses & limitations
113
-
114
- More information needed
115
-
116
- ## Training and evaluation data
117
-
118
- More information needed
119
-
120
- ## Training procedure
121
-
122
- ### Training hyperparameters
123
-
124
- The following hyperparameters were used during training:
125
- - learning_rate: 0.0002
126
- - train_batch_size: 2
127
- - eval_batch_size: 2
128
- - seed: 42
129
- - gradient_accumulation_steps: 4
130
- - total_train_batch_size: 8
131
- - optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
132
- - lr_scheduler_type: cosine
133
- - lr_scheduler_warmup_steps: 10
134
- - training_steps: 10
135
-
136
- ### Training results
137
-
138
- | Training Loss | Epoch | Step | Validation Loss |
139
- |:-------------:|:------:|:----:|:---------------:|
140
- | 1.3236 | 0.0035 | 3 | 1.2901 |
141
- | 1.2011 | 0.0069 | 6 | 1.2545 |
142
- | 1.1983 | 0.0104 | 9 | 1.1429 |
143
-
144
-
145
- ### Framework versions
146
-
147
- - PEFT 0.14.0
148
- - Transformers 4.47.1
149
- - Pytorch 2.5.0a0+e000cf0ad9.nv24.10
150
- - Datasets 3.1.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
  - Tokenizers 0.21.0
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-1.5B
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - Aivesa/dataset_fe02391f-dc57-4aa5-9b78-45187ff5fb46
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ model-index:
25
+ - name: d037a536-e2d0-4e2b-9ac6-efaaacbaa2df
26
+ results: []
27
+ ---
28
+
29
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
+ should probably proofread and complete it, then remove this comment. -->
31
+
32
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
33
+ <details><summary>See axolotl config</summary>
34
+
35
+ axolotl version: `0.6.0`
36
+ ```yaml
37
+ adapter: lora
38
+ base_model: Qwen/Qwen2.5-1.5B
39
+ bf16: auto
40
+ chat_template: llama3
41
+ dataset_prepared_path: /workspace/axolotl/data/prepared
42
+ datasets:
43
+ - ds_type: json
44
+ format: custom
45
+ path: Aivesa/dataset_fe02391f-dc57-4aa5-9b78-45187ff5fb46
46
+ type:
47
+ field_instruction: premise
48
+ field_output: hypothesis
49
+ system_format: '{system}'
50
+ system_prompt: ''
51
+ debug: null
52
+ deepspeed: null
53
+ early_stopping_patience: null
54
+ eval_max_new_tokens: 128
55
+ eval_table_size: null
56
+ evals_per_epoch: 4
57
+ flash_attention: false
58
+ fp16: null
59
+ fsdp: null
60
+ fsdp_config: null
61
+ gradient_accumulation_steps: 4
62
+ gradient_checkpointing: false
63
+ group_by_length: false
64
+ hub_model_id: Aivesa/d037a536-e2d0-4e2b-9ac6-efaaacbaa2df
65
+ hub_private_repo: true
66
+ hub_repo: null
67
+ hub_strategy: checkpoint
68
+ hub_token: null
69
+ learning_rate: 0.0002
70
+ load_in_4bit: false
71
+ load_in_8bit: false
72
+ local_rank: null
73
+ logging_steps: 1
74
+ lora_alpha: 16
75
+ lora_dropout: 0.05
76
+ lora_fan_in_fan_out: null
77
+ lora_model_dir: null
78
+ lora_r: 8
79
+ lora_target_linear: true
80
+ lr_scheduler: cosine
81
+ max_steps: 10
82
+ micro_batch_size: 2
83
+ model_type: AutoModelForCausalLM
84
+ num_epochs: 1
85
+ optimizer: adamw_bnb_8bit
86
+ output_dir: /workspace/axolotl/outputs
87
+ pad_to_sequence_len: true
88
+ push_to_hub: true
89
+ resume_from_checkpoint: null
90
+ s2_attention: null
91
+ sample_packing: false
92
+ save_safetensors: true
93
+ saves_per_epoch: 4
94
+ sequence_len: 512
95
+ strict: false
96
+ tf32: false
97
+ tokenizer_type: AutoTokenizer
98
+ train_on_inputs: false
99
+ trust_remote_code: true
100
+ use_accelerate: true
101
+ val_set_size: 0.05
102
+ wandb_entity: null
103
+ wandb_mode: online
104
+ wandb_name: fe02391f-dc57-4aa5-9b78-45187ff5fb46
105
+ wandb_project: Gradients-On-Demand
106
+ wandb_run: your_name
107
+ wandb_runid: fe02391f-dc57-4aa5-9b78-45187ff5fb46
108
+ warmup_steps: 10
109
+ weight_decay: 0.0
110
+ xformers_attention: null
111
+
112
+ ```
113
+
114
+ </details><br>
115
+
116
+ # d037a536-e2d0-4e2b-9ac6-efaaacbaa2df
117
+
118
+ This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B) on the Aivesa/dataset_fe02391f-dc57-4aa5-9b78-45187ff5fb46 dataset.
119
+ It achieves the following results on the evaluation set:
120
+ - Loss: 1.1429
121
+
122
+ ## Model description
123
+
124
+ More information needed
125
+
126
+ ## Intended uses & limitations
127
+
128
+ More information needed
129
+
130
+ ## Training and evaluation data
131
+
132
+ More information needed
133
+
134
+ ## Training procedure
135
+
136
+ ### Training hyperparameters
137
+
138
+ The following hyperparameters were used during training:
139
+ - learning_rate: 0.0002
140
+ - train_batch_size: 2
141
+ - eval_batch_size: 2
142
+ - seed: 42
143
+ - gradient_accumulation_steps: 4
144
+ - total_train_batch_size: 8
145
+ - optimizer: Use adamw_bnb_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
146
+ - lr_scheduler_type: cosine
147
+ - lr_scheduler_warmup_steps: 10
148
+ - training_steps: 10
149
+
150
+ ### Training results
151
+
152
+ | Training Loss | Epoch | Step | Validation Loss |
153
+ |:-------------:|:------:|:----:|:---------------:|
154
+ | 1.3236 | 0.0035 | 3 | 1.2901 |
155
+ | 1.2011 | 0.0069 | 6 | 1.2545 |
156
+ | 1.1983 | 0.0104 | 9 | 1.1429 |
157
+
158
+
159
+ ### Framework versions
160
+
161
+ - PEFT 0.14.0
162
+ - Transformers 4.47.1
163
+ - Pytorch 2.5.0a0+e000cf0ad9.nv24.10
164
+ - Datasets 3.1.0
165
  - Tokenizers 0.21.0