Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +179 -165
README.md CHANGED
@@ -1,165 +1,179 @@
1
- ---
2
- library_name: transformers
3
- license: other
4
- base_model: Qwen/Qwen2.5-3B-Instruct
5
- tags:
6
- - generated_from_trainer
7
- datasets:
8
- - train.jsonl
9
- model-index:
10
- - name: outputs/out
11
- results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
-
20
- axolotl version: `0.6.0`
21
- ```yaml
22
- base_model: Qwen/Qwen2.5-3B-Instruct
23
- model_type: AutoModelForCausalLM
24
- tokenizer_type: AutoTokenizer
25
- trust_remote_code: false
26
-
27
- load_in_8bit: false
28
- load_in_4bit: false
29
- strict: false
30
-
31
- output_dir: ./outputs/out
32
- chat_template: qwen_25
33
- datasets:
34
- - path: train.jsonl
35
- type: chat_template
36
- field_messages: messages
37
- message_field_role: role
38
- message_field_content: content
39
- roles:
40
- system:
41
- - system
42
- user:
43
- - user
44
- assistant:
45
- - assistant
46
-
47
- dataset_prepared_path: last_run_prepared
48
- val_set_size: 0.005
49
- output_dir: ./outputs/out
50
- eval_sample_packing: False
51
-
52
- sequence_len: 8192
53
- sample_packing: False
54
- pad_to_sequence_len: False
55
-
56
- wandb_project: mergedbench
57
- wandb_entity:
58
- wandb_watch:
59
- wandb_name:
60
- wandb_log_model:
61
- # hub_model_id: amphora/merged-bench-qwen-full
62
-
63
- plugins:
64
- - axolotl.integrations.liger.LigerPlugin
65
- liger_rope: true
66
- liger_rms_norm: true
67
- liger_swiglu: true
68
- liger_fused_linear_cross_entropy: true
69
-
70
- gradient_accumulation_steps: 4
71
- micro_batch_size: 8
72
- eval_batch_size: 4
73
- num_epochs: 3
74
- optimizer: paged_adamw_8bit
75
- lr_scheduler: cosine
76
- learning_rate: 2e-5
77
-
78
- train_on_inputs: false
79
- group_by_length: false
80
- bf16: auto
81
- fp16:
82
- tf32: false
83
-
84
- gradient_checkpointing: true
85
- gradient_checkpointing_kwargs:
86
- use_reentrant: false
87
- early_stopping_patience:
88
- resume_from_checkpoint:
89
- logging_steps: 1
90
- xformers_attention:
91
- flash_attention: true
92
-
93
- warmup_steps: 30
94
- evals_per_epoch: 3
95
- eval_max_new_tokens: 128
96
- eval_table_size:
97
- saves_per_epoch: 1
98
- debug:
99
- deepspeed: deepspeed_configs/zero1.json
100
- weight_decay: 0.01
101
- fsdp:
102
- fsdp_config:
103
- special_tokens:
104
- ```
105
-
106
- </details><br>
107
-
108
- # outputs/out
109
-
110
- This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) on the train.jsonl dataset.
111
- It achieves the following results on the evaluation set:
112
- - Loss: 0.2783
113
-
114
- ## Model description
115
-
116
- More information needed
117
-
118
- ## Intended uses & limitations
119
-
120
- More information needed
121
-
122
- ## Training and evaluation data
123
-
124
- More information needed
125
-
126
- ## Training procedure
127
-
128
- ### Training hyperparameters
129
-
130
- The following hyperparameters were used during training:
131
- - learning_rate: 2e-05
132
- - train_batch_size: 8
133
- - eval_batch_size: 4
134
- - seed: 42
135
- - distributed_type: multi-GPU
136
- - num_devices: 2
137
- - gradient_accumulation_steps: 4
138
- - total_train_batch_size: 64
139
- - total_eval_batch_size: 8
140
- - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
141
- - lr_scheduler_type: cosine
142
- - lr_scheduler_warmup_steps: 30
143
- - num_epochs: 3.0
144
-
145
- ### Training results
146
-
147
- | Training Loss | Epoch | Step | Validation Loss |
148
- |:-------------:|:------:|:----:|:---------------:|
149
- | 1.3989 | 0.0041 | 1 | 1.7111 |
150
- | 0.2969 | 0.3350 | 82 | 0.3192 |
151
- | 0.3027 | 0.6701 | 164 | 0.2914 |
152
- | 0.177 | 1.0082 | 246 | 0.2854 |
153
- | 0.1735 | 1.3432 | 328 | 0.2857 |
154
- | 0.1684 | 1.6782 | 410 | 0.2805 |
155
- | 0.1109 | 2.0163 | 492 | 0.2741 |
156
- | 0.0946 | 2.3514 | 574 | 0.2828 |
157
- | 0.0968 | 2.6864 | 656 | 0.2783 |
158
-
159
-
160
- ### Framework versions
161
-
162
- - Transformers 4.48.1
163
- - Pytorch 2.5.1+cu121
164
- - Datasets 3.2.0
165
- - Tokenizers 0.21.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: other
4
+ base_model: Qwen/Qwen2.5-3B-Instruct
5
+ tags:
6
+ - generated_from_trainer
7
+ datasets:
8
+ - train.jsonl
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ model-index:
24
+ - name: outputs/out
25
+ results: []
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
32
+ <details><summary>See axolotl config</summary>
33
+
34
+ axolotl version: `0.6.0`
35
+ ```yaml
36
+ base_model: Qwen/Qwen2.5-3B-Instruct
37
+ model_type: AutoModelForCausalLM
38
+ tokenizer_type: AutoTokenizer
39
+ trust_remote_code: false
40
+
41
+ load_in_8bit: false
42
+ load_in_4bit: false
43
+ strict: false
44
+
45
+ output_dir: ./outputs/out
46
+ chat_template: qwen_25
47
+ datasets:
48
+ - path: train.jsonl
49
+ type: chat_template
50
+ field_messages: messages
51
+ message_field_role: role
52
+ message_field_content: content
53
+ roles:
54
+ system:
55
+ - system
56
+ user:
57
+ - user
58
+ assistant:
59
+ - assistant
60
+
61
+ dataset_prepared_path: last_run_prepared
62
+ val_set_size: 0.005
63
+ output_dir: ./outputs/out
64
+ eval_sample_packing: False
65
+
66
+ sequence_len: 8192
67
+ sample_packing: False
68
+ pad_to_sequence_len: False
69
+
70
+ wandb_project: mergedbench
71
+ wandb_entity:
72
+ wandb_watch:
73
+ wandb_name:
74
+ wandb_log_model:
75
+ # hub_model_id: amphora/merged-bench-qwen-full
76
+
77
+ plugins:
78
+ - axolotl.integrations.liger.LigerPlugin
79
+ liger_rope: true
80
+ liger_rms_norm: true
81
+ liger_swiglu: true
82
+ liger_fused_linear_cross_entropy: true
83
+
84
+ gradient_accumulation_steps: 4
85
+ micro_batch_size: 8
86
+ eval_batch_size: 4
87
+ num_epochs: 3
88
+ optimizer: paged_adamw_8bit
89
+ lr_scheduler: cosine
90
+ learning_rate: 2e-5
91
+
92
+ train_on_inputs: false
93
+ group_by_length: false
94
+ bf16: auto
95
+ fp16:
96
+ tf32: false
97
+
98
+ gradient_checkpointing: true
99
+ gradient_checkpointing_kwargs:
100
+ use_reentrant: false
101
+ early_stopping_patience:
102
+ resume_from_checkpoint:
103
+ logging_steps: 1
104
+ xformers_attention:
105
+ flash_attention: true
106
+
107
+ warmup_steps: 30
108
+ evals_per_epoch: 3
109
+ eval_max_new_tokens: 128
110
+ eval_table_size:
111
+ saves_per_epoch: 1
112
+ debug:
113
+ deepspeed: deepspeed_configs/zero1.json
114
+ weight_decay: 0.01
115
+ fsdp:
116
+ fsdp_config:
117
+ special_tokens:
118
+ ```
119
+
120
+ </details><br>
121
+
122
+ # outputs/out
123
+
124
+ This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) on the train.jsonl dataset.
125
+ It achieves the following results on the evaluation set:
126
+ - Loss: 0.2783
127
+
128
+ ## Model description
129
+
130
+ More information needed
131
+
132
+ ## Intended uses & limitations
133
+
134
+ More information needed
135
+
136
+ ## Training and evaluation data
137
+
138
+ More information needed
139
+
140
+ ## Training procedure
141
+
142
+ ### Training hyperparameters
143
+
144
+ The following hyperparameters were used during training:
145
+ - learning_rate: 2e-05
146
+ - train_batch_size: 8
147
+ - eval_batch_size: 4
148
+ - seed: 42
149
+ - distributed_type: multi-GPU
150
+ - num_devices: 2
151
+ - gradient_accumulation_steps: 4
152
+ - total_train_batch_size: 64
153
+ - total_eval_batch_size: 8
154
+ - optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
155
+ - lr_scheduler_type: cosine
156
+ - lr_scheduler_warmup_steps: 30
157
+ - num_epochs: 3.0
158
+
159
+ ### Training results
160
+
161
+ | Training Loss | Epoch | Step | Validation Loss |
162
+ |:-------------:|:------:|:----:|:---------------:|
163
+ | 1.3989 | 0.0041 | 1 | 1.7111 |
164
+ | 0.2969 | 0.3350 | 82 | 0.3192 |
165
+ | 0.3027 | 0.6701 | 164 | 0.2914 |
166
+ | 0.177 | 1.0082 | 246 | 0.2854 |
167
+ | 0.1735 | 1.3432 | 328 | 0.2857 |
168
+ | 0.1684 | 1.6782 | 410 | 0.2805 |
169
+ | 0.1109 | 2.0163 | 492 | 0.2741 |
170
+ | 0.0946 | 2.3514 | 574 | 0.2828 |
171
+ | 0.0968 | 2.6864 | 656 | 0.2783 |
172
+
173
+
174
+ ### Framework versions
175
+
176
+ - Transformers 4.48.1
177
+ - Pytorch 2.5.1+cu121
178
+ - Datasets 3.2.0
179
+ - Tokenizers 0.21.0