lbourdois commited on
Commit
29e0805
·
verified ·
1 Parent(s): 9d5b280

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +165 -151
README.md CHANGED
@@ -1,152 +1,166 @@
1
- ---
2
- library_name: peft
3
- license: apache-2.0
4
- base_model: Qwen/Qwen2.5-7B-Instruct
5
- tags:
6
- - generated_from_trainer
7
- datasets:
8
- - aaditya/mimicraw_clinicaltrial_train
9
- model-index:
10
- - name: out
11
- results: []
12
- ---
13
-
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
-
20
- axolotl version: `0.6.0`
21
- ```yaml
22
- base_model: Qwen/Qwen2.5-7B-Instruct
23
- model_type: AutoModelForCausalLM
24
- tokenizer_type: AutoTokenizer
25
- trust_remote_code: true
26
-
27
- load_in_8bit: false
28
- load_in_4bit: true
29
- strict: false
30
-
31
- datasets:
32
- - path: aaditya/mimicraw_clinicaltrial_train
33
- type: alpaca
34
- val_set_size: 0.05
35
- output_dir: ./out
36
-
37
- sequence_len: 4096
38
- sample_packing: true
39
- pad_to_sequence_len: true
40
-
41
- adapter: qlora
42
- lora_r: 256
43
- lora_alpha: 512
44
- lora_dropout: 0.05
45
- lora_target_linear: true
46
- lora_target_modules:
47
- - q_proj
48
- - k_proj
49
- - v_proj
50
- - o_proj
51
- - gate_proj
52
- - down_proj
53
- - up_proj
54
-
55
- wandb_project: qwen_mimicrawclinicaltrail
56
- wandb_entity:
57
- wandb_watch:
58
- wandb_name:
59
- wandb_log_model:
60
-
61
- gradient_accumulation_steps: 4
62
- micro_batch_size: 6
63
- num_epochs: 3
64
- optimizer: adamw_torch
65
- lr_scheduler: cosine
66
- learning_rate: 2e-6
67
-
68
- train_on_inputs: false
69
- group_by_length: false
70
- bf16: auto
71
- fp16: false
72
- tf32: false
73
-
74
- gradient_checkpointing: true
75
- early_stopping_patience:
76
- resume_from_checkpoint:
77
- logging_steps: 1
78
- xformers_attention:
79
- flash_attention: true
80
-
81
- warmup_steps: 100
82
- evals_per_epoch: 3
83
- eval_table_size:
84
- saves_per_epoch: 1
85
- debug:
86
- deepspeed:
87
- weight_decay: 0.0
88
- fsdp:
89
- fsdp_config:
90
- save_total_limit: 4
91
-
92
- ```
93
-
94
- </details><br>
95
-
96
- # out
97
-
98
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the aaditya/mimicraw_clinicaltrial_train dataset.
99
- It achieves the following results on the evaluation set:
100
- - Loss: 0.6060
101
-
102
- ## Model description
103
-
104
- More information needed
105
-
106
- ## Intended uses & limitations
107
-
108
- More information needed
109
-
110
- ## Training and evaluation data
111
-
112
- More information needed
113
-
114
- ## Training procedure
115
-
116
- ### Training hyperparameters
117
-
118
- The following hyperparameters were used during training:
119
- - learning_rate: 2e-06
120
- - train_batch_size: 6
121
- - eval_batch_size: 6
122
- - seed: 42
123
- - distributed_type: multi-GPU
124
- - gradient_accumulation_steps: 4
125
- - total_train_batch_size: 24
126
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
127
- - lr_scheduler_type: cosine
128
- - lr_scheduler_warmup_steps: 100
129
- - num_epochs: 3
130
-
131
- ### Training results
132
-
133
- | Training Loss | Epoch | Step | Validation Loss |
134
- |:-------------:|:------:|:----:|:---------------:|
135
- | 0.8273 | 0.0008 | 1 | 0.8615 |
136
- | 0.6312 | 0.3335 | 400 | 0.6677 |
137
- | 0.6221 | 0.6671 | 800 | 0.6416 |
138
- | 0.1335 | 1.0 | 1200 | 0.6267 |
139
- | 0.6062 | 1.3327 | 1600 | 0.6176 |
140
- | 0.5861 | 1.6662 | 2000 | 0.6119 |
141
- | 0.6194 | 1.9998 | 2400 | 0.6084 |
142
- | 0.5953 | 2.3319 | 2800 | 0.6068 |
143
- | 0.6394 | 2.6654 | 3200 | 0.6060 |
144
-
145
-
146
- ### Framework versions
147
-
148
- - PEFT 0.14.0
149
- - Transformers 4.48.1
150
- - Pytorch 2.5.1+cu124
151
- - Datasets 3.2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
  - Tokenizers 0.21.0
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-7B-Instruct
5
+ tags:
6
+ - generated_from_trainer
7
+ datasets:
8
+ - aaditya/mimicraw_clinicaltrial_train
9
+ language:
10
+ - zho
11
+ - eng
12
+ - fra
13
+ - spa
14
+ - por
15
+ - deu
16
+ - ita
17
+ - rus
18
+ - jpn
19
+ - kor
20
+ - vie
21
+ - tha
22
+ - ara
23
+ model-index:
24
+ - name: out
25
+ results: []
26
+ ---
27
+
28
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
29
+ should probably proofread and complete it, then remove this comment. -->
30
+
31
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
32
+ <details><summary>See axolotl config</summary>
33
+
34
+ axolotl version: `0.6.0`
35
+ ```yaml
36
+ base_model: Qwen/Qwen2.5-7B-Instruct
37
+ model_type: AutoModelForCausalLM
38
+ tokenizer_type: AutoTokenizer
39
+ trust_remote_code: true
40
+
41
+ load_in_8bit: false
42
+ load_in_4bit: true
43
+ strict: false
44
+
45
+ datasets:
46
+ - path: aaditya/mimicraw_clinicaltrial_train
47
+ type: alpaca
48
+ val_set_size: 0.05
49
+ output_dir: ./out
50
+
51
+ sequence_len: 4096
52
+ sample_packing: true
53
+ pad_to_sequence_len: true
54
+
55
+ adapter: qlora
56
+ lora_r: 256
57
+ lora_alpha: 512
58
+ lora_dropout: 0.05
59
+ lora_target_linear: true
60
+ lora_target_modules:
61
+ - q_proj
62
+ - k_proj
63
+ - v_proj
64
+ - o_proj
65
+ - gate_proj
66
+ - down_proj
67
+ - up_proj
68
+
69
+ wandb_project: qwen_mimicrawclinicaltrail
70
+ wandb_entity:
71
+ wandb_watch:
72
+ wandb_name:
73
+ wandb_log_model:
74
+
75
+ gradient_accumulation_steps: 4
76
+ micro_batch_size: 6
77
+ num_epochs: 3
78
+ optimizer: adamw_torch
79
+ lr_scheduler: cosine
80
+ learning_rate: 2e-6
81
+
82
+ train_on_inputs: false
83
+ group_by_length: false
84
+ bf16: auto
85
+ fp16: false
86
+ tf32: false
87
+
88
+ gradient_checkpointing: true
89
+ early_stopping_patience:
90
+ resume_from_checkpoint:
91
+ logging_steps: 1
92
+ xformers_attention:
93
+ flash_attention: true
94
+
95
+ warmup_steps: 100
96
+ evals_per_epoch: 3
97
+ eval_table_size:
98
+ saves_per_epoch: 1
99
+ debug:
100
+ deepspeed:
101
+ weight_decay: 0.0
102
+ fsdp:
103
+ fsdp_config:
104
+ save_total_limit: 4
105
+
106
+ ```
107
+
108
+ </details><br>
109
+
110
+ # out
111
+
112
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the aaditya/mimicraw_clinicaltrial_train dataset.
113
+ It achieves the following results on the evaluation set:
114
+ - Loss: 0.6060
115
+
116
+ ## Model description
117
+
118
+ More information needed
119
+
120
+ ## Intended uses & limitations
121
+
122
+ More information needed
123
+
124
+ ## Training and evaluation data
125
+
126
+ More information needed
127
+
128
+ ## Training procedure
129
+
130
+ ### Training hyperparameters
131
+
132
+ The following hyperparameters were used during training:
133
+ - learning_rate: 2e-06
134
+ - train_batch_size: 6
135
+ - eval_batch_size: 6
136
+ - seed: 42
137
+ - distributed_type: multi-GPU
138
+ - gradient_accumulation_steps: 4
139
+ - total_train_batch_size: 24
140
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
141
+ - lr_scheduler_type: cosine
142
+ - lr_scheduler_warmup_steps: 100
143
+ - num_epochs: 3
144
+
145
+ ### Training results
146
+
147
+ | Training Loss | Epoch | Step | Validation Loss |
148
+ |:-------------:|:------:|:----:|:---------------:|
149
+ | 0.8273 | 0.0008 | 1 | 0.8615 |
150
+ | 0.6312 | 0.3335 | 400 | 0.6677 |
151
+ | 0.6221 | 0.6671 | 800 | 0.6416 |
152
+ | 0.1335 | 1.0 | 1200 | 0.6267 |
153
+ | 0.6062 | 1.3327 | 1600 | 0.6176 |
154
+ | 0.5861 | 1.6662 | 2000 | 0.6119 |
155
+ | 0.6194 | 1.9998 | 2400 | 0.6084 |
156
+ | 0.5953 | 2.3319 | 2800 | 0.6068 |
157
+ | 0.6394 | 2.6654 | 3200 | 0.6060 |
158
+
159
+
160
+ ### Framework versions
161
+
162
+ - PEFT 0.14.0
163
+ - Transformers 4.48.1
164
+ - Pytorch 2.5.1+cu124
165
+ - Datasets 3.2.0
166
  - Tokenizers 0.21.0