Text Generation
Transformers
Safetensors
qwen3
conversational
text-generation-inference
rin2401 commited on
Commit
fd81cb8
·
verified ·
1 Parent(s): e99638b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +190 -0
README.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model:
4
+ - Qwen/Qwen3-8B
5
+ ---
6
+
7
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
8
+ should probably proofread and complete it, then remove this comment. -->
9
+
10
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
11
+ <details><summary>See axolotl config</summary>
12
+
13
+ axolotl version: `0.13.0.dev0`
14
+ ```yaml
15
+ base_model: /mnt/qwen3-8b
16
+
17
+ load_in_8bit: false
18
+ load_in_4bit: false
19
+ strict: false
20
+
21
+ plugins:
22
+ - axolotl.integrations.liger.LigerPlugin
23
+
24
+ liger_rope: true
25
+ liger_rms_norm: true
26
+ liger_glu_activation: true
27
+ liger_layer_norm: true
28
+ liger_fused_linear_cross_entropy: true
29
+
30
+ chat_template: qwen3
31
+ datasets:
32
+ # Dolci
33
+ - path: /home/aithucchien/Unicorn/data/dolci_vi_5k_single.jsonl
34
+ type: chat_template
35
+ chat_template: qwen3
36
+ split_thinking: true
37
+
38
+ # 12 Task
39
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task2_sua_loi_sai_answer.jsonl
40
+ type: chat_template
41
+ chat_template: qwen3
42
+ split_thinking: true
43
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task3_goi_mo_y_tuong_answer.jsonl
44
+ type: chat_template
45
+ chat_template: qwen3
46
+ split_thinking: true
47
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task10_hoc_tap_tuong_tac_answer.jsonl
48
+ type: chat_template
49
+ chat_template: qwen3
50
+ split_thinking: true
51
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task1_tra_loi_cau_hoi_extended_2.jsonl
52
+ type: chat_template
53
+ chat_template: qwen3
54
+ split_thinking: true
55
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task8_tao_tai_lieu_giang_day_answer.jsonl
56
+ type: chat_template
57
+ chat_template: qwen3
58
+ split_thinking: true
59
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task9_tao_noi_dung_ca_nhan_hoa_answer.jsonl
60
+ type: chat_template
61
+ chat_template: qwen3
62
+ split_thinking: true
63
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task5_ho_tro_tam_ly_answer.jsonl
64
+ type: chat_template
65
+ chat_template: qwen3
66
+ split_thinking: true
67
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task7_cham_diem_tu_dong_answer.jsonl
68
+ type: chat_template
69
+ chat_template: qwen3
70
+ split_thinking: true
71
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task6_tao_bo_cau_hoi_answer.jsonl
72
+ type: chat_template
73
+ chat_template: qwen3
74
+ split_thinking: true
75
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task1_tra_loi_cau_hoi_extended.jsonl
76
+ type: chat_template
77
+ chat_template: qwen3
78
+ split_thinking: true
79
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task4_hoc_tap_ca_nhan_hoa_answer.jsonl
80
+ type: chat_template
81
+ chat_template: qwen3
82
+ split_thinking: true
83
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task1_tra_loi_cau_hoi_answer.jsonl
84
+ type: chat_template
85
+ chat_template: qwen3
86
+ split_thinking: true
87
+ - path: /home/aithucchien/luannd/Unicorn/data/synthetic/output_12_task_split/task11_tu_van_an_toan_answer.jsonl
88
+ type: chat_template
89
+ chat_template: qwen3
90
+ split_thinking: true
91
+
92
+
93
+ output_dir: ./outputs/qwen3_8b_dolci_vi_5k_single_12task/
94
+
95
+ sequence_len: 32768
96
+ sample_packing: true
97
+ flex_attention: true
98
+
99
+
100
+ flex_attn_compile_kwargs:
101
+ dynamic: false
102
+ mode: max-autotune-no-cudagraphs
103
+
104
+ wandb_project: aithucchien
105
+ wandb_entity:
106
+ wandb_watch:
107
+ wandb_name: qwen3_8b_dolci_vi_5k_single_12task
108
+ wandb_log_model:
109
+
110
+ gradient_accumulation_steps: 4
111
+ micro_batch_size: 1
112
+ num_epochs: 3
113
+ optimizer: adamw_torch_fused
114
+ lr_scheduler: cosine
115
+ learning_rate: 2e-5
116
+
117
+ bf16: true
118
+ tf32: true
119
+
120
+ resume_from_checkpoint:
121
+ logging_steps: 1
122
+
123
+ saves_per_epoch: 1
124
+
125
+ warmup_ratio: 0.1
126
+ weight_decay: 0.0
127
+ fsdp:
128
+ - full_shard
129
+ - auto_wrap
130
+
131
+ fsdp_config:
132
+ fsdp_version: 2
133
+ fsdp_offload_params: false
134
+ fsdp_cpu_ram_efficient_loading: true
135
+ fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
136
+ fsdp_transformer_layer_cls_to_wrap: Qwen3DecoderLayer
137
+ fsdp_state_dict_type: FULL_STATE_DICT
138
+ fsdp_sharding_strategy: FULL_SHARD
139
+ fsdp_reshard_after_forward: true
140
+ fsdp_activation_checkpointing: true
141
+
142
+ special_tokens:
143
+
144
+ ```
145
+
146
+ </details><br>
147
+
148
+ # Unicorn-R3
149
+
150
+ ## Model description
151
+
152
+ More information needed
153
+
154
+ ## Intended uses & limitations
155
+
156
+ More information needed
157
+
158
+ ## Training and evaluation data
159
+
160
+ More information needed
161
+
162
+ ## Training procedure
163
+
164
+ ### Training hyperparameters
165
+
166
+ The following hyperparameters were used during training:
167
+ - learning_rate: 2e-05
168
+ - train_batch_size: 1
169
+ - eval_batch_size: 1
170
+ - seed: 42
171
+ - distributed_type: multi-GPU
172
+ - num_devices: 2
173
+ - gradient_accumulation_steps: 4
174
+ - total_train_batch_size: 8
175
+ - total_eval_batch_size: 2
176
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
177
+ - lr_scheduler_type: cosine
178
+ - lr_scheduler_warmup_steps: 18
179
+ - training_steps: 186
180
+
181
+ ### Training results
182
+
183
+
184
+
185
+ ### Framework versions
186
+
187
+ - Transformers 4.57.3
188
+ - Pytorch 2.8.0+cu128
189
+ - Datasets 4.4.1
190
+ - Tokenizers 0.22.1