cjkasbdkjnlakb commited on
Commit
93b458a
·
verified ·
1 Parent(s): 4d0261c

Upload LoRA adapter - README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -0
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen3-4B-Instruct-2507
5
+ tags:
6
+ - axolotl
7
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
8
+ - lora
9
+ - transformers
10
+ datasets:
11
+ - custom
12
+ pipeline_tag: text-generation
13
+ model-index:
14
+ - name: checkpoints
15
+ results: []
16
+ ---
17
+
18
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
+ should probably proofread and complete it, then remove this comment. -->
20
+
21
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
22
+ <details><summary>See axolotl config</summary>
23
+
24
+ axolotl version: `0.12.2`
25
+ ```yaml
26
+ # 基础模型配置
27
+ base_model: Qwen/Qwen3-4B-Instruct-2507
28
+ load_in_4bit: true
29
+ bnb_4bit_compute_dtype: bfloat16
30
+ bnb_4bit_quant_type: nf4
31
+ bnb_4bit_use_double_quant: true
32
+
33
+ # LoRA配置
34
+ adapter: lora
35
+ lora_r: 64
36
+ lora_alpha: 128
37
+ lora_dropout: 0.05
38
+ lora_target_modules:
39
+ - q_proj
40
+ - k_proj
41
+ - v_proj
42
+ - o_proj
43
+ - gate_proj
44
+ - up_proj
45
+ - down_proj
46
+ lora_target_linear: true
47
+ lora_fan_in_fan_out: false
48
+
49
+ # 数据集
50
+ chat_template: qwen3
51
+ datasets:
52
+ - path: /workspace/tool_data_1012_89086.json
53
+ type: chat_template
54
+ roles_to_train: ["assistant"]
55
+ field_messages: messages
56
+ message_property_mappings:
57
+ role: role
58
+ content: content
59
+
60
+ val_set_size: 0.05
61
+ output_dir: checkpoints
62
+
63
+ # 序列长度
64
+ sequence_len: 8192
65
+ pad_to_sequence_len: true
66
+ sample_packing: false
67
+ eval_sample_packing: false
68
+ group_by_length: true
69
+
70
+ # 训练参数
71
+ num_epochs: 3
72
+ micro_batch_size: 6
73
+ gradient_accumulation_steps: 4
74
+ eval_batch_size: 4
75
+
76
+ # 优化器
77
+ optimizer: adamw_bnb_8bit
78
+ lr_scheduler: cosine_with_restarts
79
+ cosine_restarts: 2
80
+ learning_rate: 1e-4
81
+ warmup_ratio: 0.05
82
+ weight_decay: 0.01
83
+
84
+ # 精度
85
+ bf16: auto
86
+ tf32: true
87
+ gradient_checkpointing: true
88
+ flash_attention: true
89
+
90
+ # ========== 关键:保存策略 ==========
91
+ save_strategy: steps
92
+ eval_strategy: steps
93
+ eval_steps: 500 # 每500步评估(约每1/6个epoch,根据数据量调整)
94
+ save_steps: 500 # 与eval_steps一致
95
+
96
+ save_total_limit: 1 # 只保留最优的1个
97
+ load_best_model_at_end: true # 训练结束加载最优
98
+ metric_for_best_model: eval_loss # 用验证集loss
99
+ greater_is_better: false # loss越小越好
100
+
101
+ logging_steps: 30
102
+
103
+ # DeepSpeed
104
+ deepspeed: zero2.json
105
+
106
+ # 其他
107
+ ddp_timeout: 3600
108
+ ddp_find_unused_parameters: false
109
+ ```
110
+
111
+ </details><br>
112
+
113
+ # checkpoints
114
+
115
+ This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on the /workspace/tool_data_1012_89086.json dataset.
116
+ It achieves the following results on the evaluation set:
117
+ - Loss: 0.0672
118
+ - Memory/max Mem Active(gib): 95.9
119
+ - Memory/max Mem Allocated(gib): 95.9
120
+ - Memory/device Mem Reserved(gib): 124.48
121
+
122
+ ## Model description
123
+
124
+ More information needed
125
+
126
+ ## Intended uses & limitations
127
+
128
+ More information needed
129
+
130
+ ## Training and evaluation data
131
+
132
+ More information needed
133
+
134
+ ## Training procedure
135
+
136
+ ### Training hyperparameters
137
+
138
+ The following hyperparameters were used during training:
139
+ - learning_rate: 0.0001
140
+ - train_batch_size: 6
141
+ - eval_batch_size: 4
142
+ - seed: 42
143
+ - distributed_type: multi-GPU
144
+ - num_devices: 8
145
+ - gradient_accumulation_steps: 4
146
+ - total_train_batch_size: 192
147
+ - total_eval_batch_size: 32
148
+ - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
149
+ - lr_scheduler_type: cosine_with_restarts
150
+ - lr_scheduler_warmup_steps: 65
151
+ - training_steps: 1310
152
+
153
+ ### Training results
154
+
155
+ | Training Loss | Epoch | Step | Validation Loss | Mem Active(gib) | Mem Allocated(gib) | Mem Reserved(gib) |
156
+ |:-------------:|:------:|:----:|:---------------:|:---------------:|:------------------:|:-----------------:|
157
+ | No log | 0 | 0 | 1.1993 | 50.61 | 50.61 | 51.0 |
158
+ | 0.0695 | 1.1442 | 500 | 0.0686 | 95.9 | 95.9 | 124.48 |
159
+ | 0.0681 | 2.2885 | 1000 | 0.0672 | 95.9 | 95.9 | 124.48 |
160
+
161
+
162
+ ### Framework versions
163
+
164
+ - PEFT 0.17.0
165
+ - Transformers 4.55.2
166
+ - Pytorch 2.6.0+cu126
167
+ - Datasets 4.0.0
168
+ - Tokenizers 0.21.4