Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +155 -141
README.md CHANGED
@@ -1,142 +1,156 @@
1
- ---
2
- library_name: peft
3
- license: other
4
- base_model: Qwen/Qwen2.5-3B-Instruct
5
- tags:
6
- - axolotl
7
- - generated_from_trainer
8
- datasets:
9
- - VinitT/Cricket-Commentary-Sample
10
- model-index:
11
- - name: Commentary-qwen-3B
12
- results: []
13
- ---
14
-
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
- <details><summary>See axolotl config</summary>
20
-
21
- axolotl version: `0.8.0.dev0`
22
- ```yaml
23
-
24
- base_model: Qwen/Qwen2.5-3B-Instruct
25
- load_in_8bit: false
26
- load_in_4bit: true
27
- strict: false
28
-
29
- datasets:
30
- - path: VinitT/Cricket-Commentary-Sample
31
- type: alpaca
32
- dataset_prepared_path:
33
- val_set_size: 0
34
- output_dir: ./outputs/qlora-out
35
-
36
- adapter: qlora
37
- lora_model_dir:
38
-
39
- sequence_len: 1024
40
- sample_packing: true
41
- eval_sample_packing: false
42
- pad_to_sequence_len: true
43
-
44
- lora_r: 32
45
- lora_alpha: 16
46
- lora_dropout: 0.05
47
- lora_target_modules:
48
- lora_target_linear: true
49
- lora_fan_in_fan_out:
50
-
51
- hub_model_id: Commentary-qwen-3B
52
-
53
- wandb_project: Cricket-Commentary-1
54
- wandb_entity:
55
- wandb_watch: all
56
- wandb_name: Cricket-Commentary-1
57
- wandb_log_model:
58
-
59
- gradient_accumulation_steps: 2
60
- micro_batch_size: 2
61
- num_epochs: 1
62
- optimizer: paged_adamw_8bit
63
- lr_scheduler: cosine
64
- cosine_min_lr_ratio: 0.2
65
- learning_rate: 2e-5
66
-
67
- train_on_inputs: false
68
- group_by_length: false
69
- bf16: false
70
- fp16:
71
- tf32: false
72
-
73
- gradient_checkpointing: true
74
- early_stopping_patience:
75
- resume_from_checkpoint:
76
- local_rank:
77
- logging_steps: 1
78
- xformers_attention:
79
- flash_attention: false
80
-
81
- #gpu_memory_limit: 20GiB
82
- #lora_on_cpu: true
83
-
84
- warmup_steps: 10
85
- evals_per_epoch: 4
86
- saves_per_epoch: 1
87
- debug:
88
- deepspeed: deepspeed_configs/zero1.json
89
- weight_decay: 0.0
90
- special_tokens:
91
- pad_token: <|end_of_text|>
92
-
93
- ```
94
-
95
- </details><br>
96
-
97
- # Commentary-qwen-3B
98
-
99
- This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) on the VinitT/Cricket-Commentary-Sample dataset.
100
-
101
- ## Model description
102
-
103
- More information needed
104
-
105
- ## Intended uses & limitations
106
-
107
- More information needed
108
-
109
- ## Training and evaluation data
110
-
111
- More information needed
112
-
113
- ## Training procedure
114
-
115
- ### Training hyperparameters
116
-
117
- The following hyperparameters were used during training:
118
- - learning_rate: 2e-05
119
- - train_batch_size: 2
120
- - eval_batch_size: 2
121
- - seed: 42
122
- - distributed_type: multi-GPU
123
- - num_devices: 2
124
- - gradient_accumulation_steps: 2
125
- - total_train_batch_size: 8
126
- - total_eval_batch_size: 4
127
- - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
128
- - lr_scheduler_type: cosine
129
- - lr_scheduler_warmup_steps: 10
130
- - num_epochs: 1.0
131
-
132
- ### Training results
133
-
134
-
135
-
136
- ### Framework versions
137
-
138
- - PEFT 0.14.0
139
- - Transformers 4.49.0
140
- - Pytorch 2.5.1+cu121
141
- - Datasets 3.2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
  - Tokenizers 0.21.0
 
1
+ ---
2
+ library_name: peft
3
+ license: other
4
+ base_model: Qwen/Qwen2.5-3B-Instruct
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - VinitT/Cricket-Commentary-Sample
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ model-index:
25
+ - name: Commentary-qwen-3B
26
+ results: []
27
+ ---
28
+
29
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
30
+ should probably proofread and complete it, then remove this comment. -->
31
+
32
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
33
+ <details><summary>See axolotl config</summary>
34
+
35
+ axolotl version: `0.8.0.dev0`
36
+ ```yaml
37
+
38
+ base_model: Qwen/Qwen2.5-3B-Instruct
39
+ load_in_8bit: false
40
+ load_in_4bit: true
41
+ strict: false
42
+
43
+ datasets:
44
+ - path: VinitT/Cricket-Commentary-Sample
45
+ type: alpaca
46
+ dataset_prepared_path:
47
+ val_set_size: 0
48
+ output_dir: ./outputs/qlora-out
49
+
50
+ adapter: qlora
51
+ lora_model_dir:
52
+
53
+ sequence_len: 1024
54
+ sample_packing: true
55
+ eval_sample_packing: false
56
+ pad_to_sequence_len: true
57
+
58
+ lora_r: 32
59
+ lora_alpha: 16
60
+ lora_dropout: 0.05
61
+ lora_target_modules:
62
+ lora_target_linear: true
63
+ lora_fan_in_fan_out:
64
+
65
+ hub_model_id: Commentary-qwen-3B
66
+
67
+ wandb_project: Cricket-Commentary-1
68
+ wandb_entity:
69
+ wandb_watch: all
70
+ wandb_name: Cricket-Commentary-1
71
+ wandb_log_model:
72
+
73
+ gradient_accumulation_steps: 2
74
+ micro_batch_size: 2
75
+ num_epochs: 1
76
+ optimizer: paged_adamw_8bit
77
+ lr_scheduler: cosine
78
+ cosine_min_lr_ratio: 0.2
79
+ learning_rate: 2e-5
80
+
81
+ train_on_inputs: false
82
+ group_by_length: false
83
+ bf16: false
84
+ fp16:
85
+ tf32: false
86
+
87
+ gradient_checkpointing: true
88
+ early_stopping_patience:
89
+ resume_from_checkpoint:
90
+ local_rank:
91
+ logging_steps: 1
92
+ xformers_attention:
93
+ flash_attention: false
94
+
95
+ #gpu_memory_limit: 20GiB
96
+ #lora_on_cpu: true
97
+
98
+ warmup_steps: 10
99
+ evals_per_epoch: 4
100
+ saves_per_epoch: 1
101
+ debug:
102
+ deepspeed: deepspeed_configs/zero1.json
103
+ weight_decay: 0.0
104
+ special_tokens:
105
+ pad_token: <|end_of_text|>
106
+
107
+ ```
108
+
109
+ </details><br>
110
+
111
+ # Commentary-qwen-3B
112
+
113
+ This model is a fine-tuned version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) on the VinitT/Cricket-Commentary-Sample dataset.
114
+
115
+ ## Model description
116
+
117
+ More information needed
118
+
119
+ ## Intended uses & limitations
120
+
121
+ More information needed
122
+
123
+ ## Training and evaluation data
124
+
125
+ More information needed
126
+
127
+ ## Training procedure
128
+
129
+ ### Training hyperparameters
130
+
131
+ The following hyperparameters were used during training:
132
+ - learning_rate: 2e-05
133
+ - train_batch_size: 2
134
+ - eval_batch_size: 2
135
+ - seed: 42
136
+ - distributed_type: multi-GPU
137
+ - num_devices: 2
138
+ - gradient_accumulation_steps: 2
139
+ - total_train_batch_size: 8
140
+ - total_eval_batch_size: 4
141
+ - optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
142
+ - lr_scheduler_type: cosine
143
+ - lr_scheduler_warmup_steps: 10
144
+ - num_epochs: 1.0
145
+
146
+ ### Training results
147
+
148
+
149
+
150
+ ### Framework versions
151
+
152
+ - PEFT 0.14.0
153
+ - Transformers 4.49.0
154
+ - Pytorch 2.5.1+cu121
155
+ - Datasets 3.2.0
156
  - Tokenizers 0.21.0