darwinkernelpanic commited on
Commit
c1e47de
·
verified ·
1 Parent(s): e5ccec1

Model save

Browse files
Files changed (1) hide show
  1. README.md +169 -3
README.md CHANGED
@@ -1,3 +1,169 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: peft
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
5
+ tags:
6
+ - axolotl
7
+ - base_model:adapter:Qwen/Qwen2.5-Coder-7B-Instruct
8
+ - lora
9
+ - transformers
10
+ datasets:
11
+ - darwinkernelpanic/luau_corpus_axolotl
12
+ pipeline_tag: text-generation
13
+ model-index:
14
+ - name: Qwen2.5-Coder-7B-Instruct-Luau
15
+ results: []
16
+ ---
17
+
18
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
+ should probably proofread and complete it, then remove this comment. -->
20
+
21
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
22
+ <details><summary>See axolotl config</summary>
23
+
24
+ axolotl version: `0.13.0.dev0`
25
+ ```yaml
26
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
27
+ # Auto-upload to HuggingFace when done
28
+ hub_model_id: darwinkernelpanic/Qwen2.5-Coder-7B-Instruct-Luau # Change this to your HF username
29
+ hub_strategy: every_save # Uploads checkpoints as you train
30
+ trust_remote_code: true
31
+
32
+ load_in_8bit: false
33
+ load_in_4bit: true
34
+
35
+ datasets:
36
+ - path: darwinkernelpanic/luau_corpus_axolotl
37
+ type: completion
38
+ field_instruction: text # Check the actual column names on HF
39
+ field_output: completion # Might be "text" or "code" — verify first
40
+
41
+ dataset_prepared_path:
42
+ val_set_size: 0.05
43
+ output_dir: ./outputs/qwen-luau-finetune
44
+
45
+ sequence_len: 2048
46
+ sample_packing: true
47
+ eval_sample_packing: true
48
+
49
+ adapter: qlora
50
+ lora_model_dir:
51
+ lora_r: 64
52
+ lora_alpha: 64
53
+ lora_dropout: 0.05
54
+ lora_target_linear: true
55
+
56
+ # Weights & Biases tracking (optional but clutch)
57
+ wandb_project: qwen-luau-finetune
58
+ wandb_entity:
59
+ wandb_watch:
60
+ wandb_name: qwen2.5-coder-7b-luau
61
+ wandb_log_model:
62
+
63
+ gradient_accumulation_steps: 2
64
+ micro_batch_size: 2
65
+ num_epochs: 3
66
+ optimizer: adamw_torch_fused
67
+ lr_scheduler: cosine
68
+ learning_rate: 0.0003
69
+ bf16: auto
70
+ tf32: true
71
+
72
+ gradient_checkpointing: true
73
+ gradient_checkpointing_kwargs:
74
+ use_reentrant: false
75
+
76
+ resume_from_checkpoint:
77
+ logging_steps: 10
78
+ flash_attention: true
79
+ warmup_ratio: 0.1
80
+ evals_per_epoch: 4
81
+ saves_per_epoch: 1
82
+ weight_decay: 0.01
83
+
84
+ fsdp:
85
+ - full_shard
86
+ - auto_wrap
87
+ fsdp_config:
88
+ fsdp_limit_all_gathers: true
89
+ fsdp_sync_module_states: true
90
+ fsdp_offload_params: false
91
+ fsdp_use_orig_params: false
92
+ fsdp_cpu_ram_efficient_loading: true
93
+ fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
94
+ fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
95
+ fsdp_sharding_strategy: FULL_SHARD
96
+ fsdp_state_dict_type: FULL_STATE_DICT
97
+
98
+ special_tokens:
99
+ pad_token: "<|endoftext|>"
100
+ ```
101
+
102
+ </details><br>
103
+
104
+ # Qwen2.5-Coder-7B-Instruct-Luau
105
+
106
+ This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) on the darwinkernelpanic/luau_corpus_axolotl dataset.
107
+ It achieves the following results on the evaluation set:
108
+ - Loss: nan
109
+ - Ppl: nan
110
+ - Memory/max Active (gib): 14.12
111
+ - Memory/max Allocated (gib): 14.01
112
+ - Memory/device Reserved (gib): 14.69
113
+
114
+ ## Model description
115
+
116
+ More information needed
117
+
118
+ ## Intended uses & limitations
119
+
120
+ More information needed
121
+
122
+ ## Training and evaluation data
123
+
124
+ More information needed
125
+
126
+ ## Training procedure
127
+
128
+ ### Training hyperparameters
129
+
130
+ The following hyperparameters were used during training:
131
+ - learning_rate: 0.0003
132
+ - train_batch_size: 2
133
+ - eval_batch_size: 2
134
+ - seed: 42
135
+ - distributed_type: multi-GPU
136
+ - num_devices: 2
137
+ - gradient_accumulation_steps: 2
138
+ - total_train_batch_size: 8
139
+ - total_eval_batch_size: 4
140
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
141
+ - lr_scheduler_type: cosine
142
+ - lr_scheduler_warmup_steps: 10
143
+ - training_steps: 105
144
+
145
+ ### Training results
146
+
147
+ | Training Loss | Epoch | Step | Validation Loss | Ppl | Active (gib) | Allocated (gib) | Reserved (gib) |
148
+ |:-------------:|:------:|:----:|:---------------:|:------:|:------------:|:---------------:|:--------------:|
149
+ | No log | 0 | 0 | 3.9969 | 54.428 | 11.21 | 11.1 | 12.26 |
150
+ | No log | 0.2535 | 9 | nan | nan | 14.12 | 14.01 | 15.56 |
151
+ | 12.4054 | 0.5070 | 18 | nan | nan | 14.12 | 14.01 | 14.69 |
152
+ | 0.0 | 0.7606 | 27 | nan | nan | 14.12 | 14.01 | 14.69 |
153
+ | 0.0 | 1.0 | 36 | nan | nan | 14.12 | 14.01 | 14.69 |
154
+ | 0.0 | 1.2535 | 45 | nan | nan | 14.12 | 14.01 | 14.69 |
155
+ | 0.0 | 1.5070 | 54 | nan | nan | 14.12 | 14.01 | 14.69 |
156
+ | 0.0 | 1.7606 | 63 | nan | nan | 14.12 | 14.01 | 14.69 |
157
+ | 0.0 | 2.0 | 72 | nan | nan | 14.12 | 14.01 | 14.69 |
158
+ | 0.0 | 2.2535 | 81 | nan | nan | 14.12 | 14.01 | 14.69 |
159
+ | 0.0 | 2.5070 | 90 | nan | nan | 11.83 | 11.72 | 14.69 |
160
+ | 0.0 | 2.7606 | 99 | nan | nan | 14.12 | 14.01 | 14.69 |
161
+
162
+
163
+ ### Framework versions
164
+
165
+ - PEFT 0.18.0
166
+ - Transformers 4.57.1
167
+ - Pytorch 2.8.0+cu128
168
+ - Datasets 4.4.1
169
+ - Tokenizers 0.22.1