cjkasbdkjnlakb commited on
Commit
9be5e8f
·
verified ·
1 Parent(s): f943756

Upload LoRA adapter - README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -0
README.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen3-4B-Instruct-2507
5
+ tags:
6
+ - axolotl
7
+ - base_model:adapter:Qwen/Qwen3-4B-Instruct-2507
8
+ - lora
9
+ - transformers
10
+ datasets:
11
+ - custom
12
+ pipeline_tag: text-generation
13
+ model-index:
14
+ - name: workspace/checkpoints
15
+ results: []
16
+ ---
17
+
18
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
+ should probably proofread and complete it, then remove this comment. -->
20
+
21
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
22
+ <details><summary>See axolotl config</summary>
23
+
24
+ axolotl version: `0.12.2`
25
+ ```yaml
26
+ # 基础模型配置
27
+ base_model: Qwen/Qwen3-4B-Instruct-2507
28
+ load_in_8bit: false
29
+ load_in_4bit: false # QLoRA才需要4bit
30
+
31
+ # LoRA 适配器配置 - 这是关键部分
32
+ adapter: lora # 明确指定使用LoRA
33
+ lora_model_dir: # 如果有预训练的LoRA权重可以在这里指定
34
+
35
+ # LoRA 具体参数
36
+ lora_r: 64
37
+ lora_alpha: 128
38
+ lora_dropout: 0.05
39
+ lora_target_modules: # Qwen3模型的关键模块
40
+ - q_proj
41
+ - k_proj
42
+ - v_proj
43
+ - o_proj
44
+ - gate_proj
45
+ - up_proj
46
+ - down_proj
47
+ lora_target_linear: true # 自动找到所有线性层
48
+ lora_fan_in_fan_out: false
49
+
50
+ # 数据集设置
51
+ chat_template: qwen3
52
+ datasets:
53
+ - path: /workspace/workspace/tool_data.json
54
+ type: chat_template
55
+ roles_to_train: ["assistant"]
56
+ field_messages: messages
57
+ message_property_mappings:
58
+ role: role
59
+ content: content
60
+
61
+ dataset_prepared_path:
62
+ val_set_size: 0.05
63
+ output_dir: /workspace/checkpoints
64
+
65
+ # 序列长度设置
66
+ sequence_len: 10000
67
+ pad_to_sequence_len: true
68
+ sample_packing: false
69
+ eval_sample_packing: false
70
+ group_by_length: true # 启用长度分组,提高效率
71
+
72
+ # 训练超参数
73
+ num_epochs: 3
74
+ micro_batch_size: 4 # H100显存大
75
+ gradient_accumulation_steps: 4 # 8卡LoRA不需要太大的累积
76
+ eval_batch_size: 8
77
+
78
+ # 优化器设置
79
+ optimizer: adamw_torch_fused
80
+ lr_scheduler: cosine_with_restarts
81
+ cosine_restarts: 2 # 每个epoch重启一次
82
+ learning_rate: 4e-5
83
+ warmup_ratio: 0.03
84
+ weight_decay: 0.05
85
+
86
+ # 精度设置
87
+ bf16: auto # H100支持bf16
88
+ tf32: true
89
+ gradient_checkpointing: true # 节省显存
90
+ flash_attention: true
91
+
92
+ # 日志和保存
93
+ logging_steps: 30
94
+ evals_per_epoch: 1
95
+ saves_per_epoch: 1
96
+ save_total_limit: 3 # 只保留最新的3个checkpoint
97
+
98
+ # 多卡训练配置 - 使用DeepSpeed而不是FSDP
99
+ deepspeed: /workspace/workspace/zero2.json # 或者直接内联配置
100
+
101
+ # 其他优化
102
+ ddp_timeout: 3600 # DDP超时设置
103
+ ddp_find_unused_parameters: false # LoRA通常不需要
104
+ ```
105
+
106
+ </details><br>
107
+
108
+ # workspace/checkpoints
109
+
110
+ This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) on the /workspace/workspace/tool_data.json dataset.
111
+ It achieves the following results on the evaluation set:
112
+ - Loss: 0.0482
113
+ - Memory/max Mem Active(gib): 123.28
114
+ - Memory/max Mem Allocated(gib): 123.28
115
+ - Memory/device Mem Reserved(gib): 124.72
116
+
117
+ ## Model description
118
+
119
+ More information needed
120
+
121
+ ## Intended uses & limitations
122
+
123
+ More information needed
124
+
125
+ ## Training and evaluation data
126
+
127
+ More information needed
128
+
129
+ ## Training procedure
130
+
131
+ ### Training hyperparameters
132
+
133
+ The following hyperparameters were used during training:
134
+ - learning_rate: 4e-05
135
+ - train_batch_size: 4
136
+ - eval_batch_size: 8
137
+ - seed: 42
138
+ - distributed_type: multi-GPU
139
+ - num_devices: 8
140
+ - gradient_accumulation_steps: 4
141
+ - total_train_batch_size: 128
142
+ - total_eval_batch_size: 64
143
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
144
+ - lr_scheduler_type: cosine_with_restarts
145
+ - lr_scheduler_warmup_steps: 39
146
+ - training_steps: 1316
147
+
148
+ ### Training results
149
+
150
+ | Training Loss | Epoch | Step | Validation Loss | Mem Active(gib) | Mem Allocated(gib) | Mem Reserved(gib) |
151
+ |:-------------:|:-----:|:----:|:---------------:|:---------------:|:------------------:|:-----------------:|
152
+ | No log | 0 | 0 | 1.1193 | 123.25 | 123.25 | 124.05 |
153
+ | 0.0509 | 1.0 | 439 | 0.0503 | 123.28 | 123.28 | 124.72 |
154
+ | 0.0461 | 2.0 | 878 | 0.0482 | 123.28 | 123.28 | 124.72 |
155
+
156
+
157
+ ### Framework versions
158
+
159
+ - PEFT 0.17.0
160
+ - Transformers 4.55.2
161
+ - Pytorch 2.6.0+cu126
162
+ - Datasets 4.0.0
163
+ - Tokenizers 0.21.4