timarni commited on
Commit
f19b775
·
verified ·
1 Parent(s): f9b57e2

Upload final fine-tuned Qwen3-0.6B model

Browse files
Files changed (4) hide show
  1. README.md +150 -0
  2. config.json +0 -1
  3. generation_config.json +7 -0
  4. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,150 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: timarni/qwen3_dpo
4
+ tags:
5
+ - generated_from_trainer
6
+ datasets:
7
+ - timarni/MNLP_intstruction_tuning
8
+ model-index:
9
+ - name: outputs/dpo_full_alpaca
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.9.2`
20
+ ```yaml
21
+ base_model: timarni/qwen3_dpo
22
+ # Automatically upload checkpoint and final model to HF
23
+ # hub_model_id: username/custom_model_name
24
+
25
+ plugins:
26
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
27
+ strict: false
28
+
29
+ chat_template: qwen3
30
+ datasets:
31
+ - path: timarni/MNLP_intstruction_tuning
32
+ type: alpaca
33
+ split: train
34
+
35
+ shuffle_merged_datasets: true
36
+
37
+ val_set_size: 0.1
38
+ output_dir: ./outputs/dpo_full_alpaca
39
+ dataset_prepared_path: last_run_prepared
40
+
41
+ sequence_len: 4096 #2048
42
+ sample_packing: true # was true -> need to check if it actually learns on the samples or not (better understand te hyperparam and event. install axolotl to debug)
43
+ eval_sample_packing: true
44
+ pad_to_sequence_len: true
45
+ # train_on_inputs: true # NEW
46
+ # group_by_length: false NEW?
47
+
48
+ # To be sure that no LORA is done
49
+ adapter: null
50
+ lora: false
51
+ merge_lora: false
52
+
53
+ wandb_project: mnlp_project
54
+ wandb_entity: tim-arni
55
+ wandb_watch:
56
+ wandb_name: dpo_full_alpaca_resume_from_ckpt
57
+ wandb_log_model:
58
+
59
+ gradient_accumulation_steps: 16 # 2
60
+ micro_batch_size: 2 # 1
61
+ num_epochs: 3
62
+ optimizer: adamw_torch
63
+ lr_scheduler: cosine
64
+ learning_rate: 0.00005 # 0.00005
65
+ # cosine_min_lr_ratio: 0.1
66
+
67
+ warmup_ratio: 0.05
68
+ weight_decay: 0.01
69
+
70
+ bf16: auto
71
+ tf32: true
72
+
73
+ gradient_checkpointing: offload
74
+ gradient_checkpointing_kwargs:
75
+ use_reentrant: false
76
+ resume_from_checkpoint: /mloscratch/users/arni/Workspace/mnlp_sft/outputs/dpo_full_alpaca/checkpoint-186
77
+ logging_steps: 1
78
+ gradient_clipping: 1.0 # or max_grad_norm?
79
+ flash_attention: true
80
+
81
+ evals_per_epoch: 2
82
+ saves_per_epoch: 1
83
+ save_total_limit: 20
84
+ special_tokens:
85
+
86
+ ```
87
+
88
+ </details><br>
89
+
90
+ # outputs/dpo_full_alpaca
91
+
92
+ This model is a fine-tuned version of [timarni/qwen3_dpo](https://huggingface.co/timarni/qwen3_dpo) on the timarni/MNLP_intstruction_tuning dataset.
93
+ It achieves the following results on the evaluation set:
94
+ - Loss: 0.1520
95
+
96
+ ## Model description
97
+
98
+ More information needed
99
+
100
+ ## Intended uses & limitations
101
+
102
+ More information needed
103
+
104
+ ## Training and evaluation data
105
+
106
+ More information needed
107
+
108
+ ## Training procedure
109
+
110
+ ### Training hyperparameters
111
+
112
+ The following hyperparameters were used during training:
113
+ - learning_rate: 5e-05
114
+ - train_batch_size: 2
115
+ - eval_batch_size: 2
116
+ - seed: 42
117
+ - distributed_type: multi-GPU
118
+ - num_devices: 4
119
+ - gradient_accumulation_steps: 16
120
+ - total_train_batch_size: 128
121
+ - total_eval_batch_size: 8
122
+ - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
123
+ - lr_scheduler_type: cosine
124
+ - lr_scheduler_warmup_steps: 13
125
+ - num_epochs: 3.0
126
+
127
+ ### Training results
128
+
129
+ | Training Loss | Epoch | Step | Validation Loss |
130
+ |:-------------:|:------:|:----:|:---------------:|
131
+ | 0.7154 | 0.0107 | 1 | 1.1239 |
132
+ | 0.1282 | 0.2567 | 24 | 0.2029 |
133
+ | 0.1105 | 0.5134 | 48 | 0.1860 |
134
+ | 0.1056 | 0.7701 | 72 | 0.1779 |
135
+ | 0.1004 | 1.0214 | 96 | 0.1736 |
136
+ | 0.0912 | 1.2781 | 120 | 0.1643 |
137
+ | 0.0861 | 1.5348 | 144 | 0.1576 |
138
+ | 0.0791 | 1.7914 | 168 | 0.1530 |
139
+ | 0.0751 | 2.0642 | 192 | 0.1510 |
140
+ | 0.0625 | 2.3209 | 216 | 0.1509 |
141
+ | 0.0453 | 2.5775 | 240 | 0.1513 |
142
+ | 0.0426 | 2.8342 | 264 | 0.1520 |
143
+
144
+
145
+ ### Framework versions
146
+
147
+ - Transformers 4.51.3
148
+ - Pytorch 2.5.1+cu121
149
+ - Datasets 3.5.1
150
+ - Tokenizers 0.21.1
config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "_attn_implementation_autoset": true,
3
  "architectures": [
4
  "Qwen3ForCausalLM"
5
  ],
 
1
  {
 
2
  "architectures": [
3
  "Qwen3ForCausalLM"
4
  ],
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": 151643,
5
+ "max_new_tokens": 2048,
6
+ "transformers_version": "4.51.3"
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75f18bea40acac2a7ebd5f777510fde188f27ab48415502697db4590480c5a6e
3
+ size 1192135096