Linksome commited on
Commit
09b5e73
·
verified ·
1 Parent(s): 30231de

Add files using upload-large-folder tool

Browse files
Files changed (38) hide show
  1. LlamaFactory/wandb/run-20260205_023738-7rn01zb3/files/output.log +0 -0
  2. LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/config.yaml +723 -0
  3. LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/requirements.txt +257 -0
  4. LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/wandb-metadata.json +41 -0
  5. LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/wandb-summary.json +1 -0
  6. LlamaFactory/wandb/run-20260205_024315-zvjq5754/logs/debug-internal.log +11 -0
  7. LlamaFactory/wandb/run-20260205_024315-zvjq5754/logs/debug.log +25 -0
  8. LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/config.yaml +723 -0
  9. LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/requirements.txt +257 -0
  10. LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/wandb-metadata.json +41 -0
  11. LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/wandb-summary.json +1 -0
  12. LlamaFactory/wandb/run-20260205_074020-s91zw8js/logs/debug-internal.log +12 -0
  13. LlamaFactory/wandb/run-20260205_074020-s91zw8js/logs/debug.log +25 -0
  14. LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/config.yaml +723 -0
  15. LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/requirements.txt +257 -0
  16. LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/wandb-metadata.json +41 -0
  17. LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/wandb-summary.json +1 -0
  18. LlamaFactory/wandb/run-20260205_074050-63c40qxy/logs/debug-internal.log +12 -0
  19. LlamaFactory/wandb/run-20260205_074050-63c40qxy/logs/debug.log +25 -0
  20. LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/config.yaml +723 -0
  21. LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/output.log +370 -0
  22. LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/requirements.txt +257 -0
  23. LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/wandb-metadata.json +41 -0
  24. LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/wandb-summary.json +1 -0
  25. LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/logs/debug-internal.log +11 -0
  26. LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/logs/debug.log +25 -0
  27. LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/config.yaml +723 -0
  28. LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/output.log +156 -0
  29. LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/requirements.txt +257 -0
  30. LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/wandb-metadata.json +41 -0
  31. LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/wandb-summary.json +1 -0
  32. LlamaFactory/wandb/run-20260209_080305-rc9olpt3/logs/debug-internal.log +11 -0
  33. LlamaFactory/wandb/run-20260209_080305-rc9olpt3/logs/debug.log +25 -0
  34. LlamaFactory/wandb/run-20260209_081117-18fi1m6s/files/output.log +0 -0
  35. LlamaFactory/wandb/run-20260209_081117-18fi1m6s/files/requirements.txt +257 -0
  36. LlamaFactory/wandb/run-20260209_081117-18fi1m6s/files/wandb-metadata.json +41 -0
  37. LlamaFactory/wandb/run-20260209_081117-18fi1m6s/logs/debug-internal.log +11 -0
  38. LlamaFactory/wandb/run-20260209_081117-18fi1m6s/logs/debug.log +25 -0
LlamaFactory/wandb/run-20260205_023738-7rn01zb3/files/output.log ADDED
The diff for this file is too large to render. See raw diff
 
LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/config.yaml ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: /workspace/Qwen/Qwen3-8B-Base
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.24.2
6
+ e:
7
+ qv5hxlnjyijt45h1hirthpvj46z6i895:
8
+ args:
9
+ - /workspace/v127rc_exp1/B_mul.yaml
10
+ cpu_count: 24
11
+ cpu_count_logical: 48
12
+ cudaVersion: "12.4"
13
+ disk:
14
+ /:
15
+ total: "21474836480"
16
+ used: "2532245504"
17
+ email: markmochi200@gmail.com
18
+ executable: /usr/bin/python
19
+ git:
20
+ commit: 1a02717fa84c270d1c156c4c4a391c2f95525a63
21
+ remote: https://github.com/hiyouga/LlamaFactory.git
22
+ gpu: NVIDIA GeForce RTX 4090
23
+ gpu_count: 1
24
+ gpu_nvidia:
25
+ - architecture: Ada
26
+ cudaCores: 16384
27
+ memoryTotal: "25757220864"
28
+ name: NVIDIA GeForce RTX 4090
29
+ uuid: GPU-e976cc69-65b4-6c21-059f-e293b12f21b0
30
+ host: aac78e9f96d6
31
+ memory:
32
+ total: "270095220736"
33
+ os: Linux-6.5.0-45-generic-x86_64-with-glibc2.35
34
+ program: /usr/local/bin/llamafactory-cli
35
+ python: CPython 3.11.10
36
+ root: /workspace/LlamaFactory
37
+ startedAt: "2026-02-05T02:43:15.345218Z"
38
+ writerId: qv5hxlnjyijt45h1hirthpvj46z6i895
39
+ m:
40
+ - "1": train/global_step
41
+ "6":
42
+ - 3
43
+ "7": []
44
+ - "2": '*'
45
+ "5": 1
46
+ "6":
47
+ - 1
48
+ "7": []
49
+ python_version: 3.11.10
50
+ t:
51
+ "1":
52
+ - 1
53
+ - 11
54
+ - 41
55
+ - 49
56
+ - 51
57
+ - 71
58
+ - 84
59
+ - 98
60
+ - 105
61
+ "2":
62
+ - 1
63
+ - 11
64
+ - 41
65
+ - 49
66
+ - 51
67
+ - 71
68
+ - 84
69
+ - 98
70
+ - 105
71
+ "3":
72
+ - 7
73
+ - 19
74
+ - 62
75
+ - 66
76
+ "4": 3.11.10
77
+ "5": 0.24.2
78
+ "6": 5.0.0
79
+ "9":
80
+ "1": transformers_trainer
81
+ "12": 0.24.2
82
+ "13": linux-x86_64
83
+ accelerator_config:
84
+ value:
85
+ dispatch_batches: null
86
+ even_batches: true
87
+ gradient_accumulation_kwargs: null
88
+ non_blocking: false
89
+ split_batches: false
90
+ use_seedable_sampler: true
91
+ adam_beta1:
92
+ value: 0.9
93
+ adam_beta2:
94
+ value: 0.95
95
+ adam_epsilon:
96
+ value: 1e-08
97
+ architectures:
98
+ value:
99
+ - Qwen3ForCausalLM
100
+ attention_bias:
101
+ value: false
102
+ attention_dropout:
103
+ value: 0
104
+ auto_find_batch_size:
105
+ value: false
106
+ average_tokens_across_devices:
107
+ value: true
108
+ batch_eval_metrics:
109
+ value: false
110
+ bf16:
111
+ value: true
112
+ bf16_full_eval:
113
+ value: false
114
+ bos_token_id:
115
+ value: null
116
+ chunk_size_feed_forward:
117
+ value: 0
118
+ data_args:
119
+ value:
120
+ buffer_size: 16384
121
+ cutoff_len: 2047
122
+ data_shared_file_system: false
123
+ dataset:
124
+ - Markie_Voss_t35_d0_r286
125
+ dataset_dir: /workspace/LlamaFactory/data
126
+ default_system: null
127
+ enable_thinking: false
128
+ eval_dataset: null
129
+ eval_num_beams: null
130
+ eval_on_each_dataset: false
131
+ ignore_pad_token_for_loss: true
132
+ interleave_probs: null
133
+ mask_history: false
134
+ max_samples: 100000000
135
+ media_dir: /workspace/LlamaFactory/data
136
+ mix_strategy: concat
137
+ neat_packing: false
138
+ overwrite_cache: false
139
+ packing: true
140
+ preprocessing_batch_size: 1000
141
+ preprocessing_num_workers: 16
142
+ streaming: false
143
+ template: qwen3_nothink
144
+ tokenized_path: null
145
+ tool_format: null
146
+ train_on_prompt: false
147
+ val_size: 0
148
+ data_seed:
149
+ value: null
150
+ dataloader_drop_last:
151
+ value: false
152
+ dataloader_num_workers:
153
+ value: 0
154
+ dataloader_persistent_workers:
155
+ value: false
156
+ dataloader_pin_memory:
157
+ value: true
158
+ dataloader_prefetch_factor:
159
+ value: null
160
+ ddp_backend:
161
+ value: null
162
+ ddp_broadcast_buffers:
163
+ value: null
164
+ ddp_bucket_cap_mb:
165
+ value: null
166
+ ddp_find_unused_parameters:
167
+ value: null
168
+ ddp_timeout:
169
+ value: 180000000
170
+ debug:
171
+ value: []
172
+ deepspeed:
173
+ value: null
174
+ disable_tqdm:
175
+ value: false
176
+ do_eval:
177
+ value: false
178
+ do_predict:
179
+ value: false
180
+ do_train:
181
+ value: true
182
+ dtype:
183
+ value: bfloat16
184
+ enable_jit_checkpoint:
185
+ value: false
186
+ eos_token_id:
187
+ value: 151645
188
+ eval_accumulation_steps:
189
+ value: null
190
+ eval_delay:
191
+ value: 0
192
+ eval_do_concat_batches:
193
+ value: true
194
+ eval_on_start:
195
+ value: false
196
+ eval_steps:
197
+ value: null
198
+ eval_strategy:
199
+ value: "no"
200
+ eval_use_gather_object:
201
+ value: false
202
+ finetuning_args:
203
+ value:
204
+ additional_target: null
205
+ apollo_layerwise: false
206
+ apollo_proj: random
207
+ apollo_proj_type: std
208
+ apollo_rank: 16
209
+ apollo_scale: 32
210
+ apollo_scale_front: false
211
+ apollo_scale_type: channel
212
+ apollo_target:
213
+ - all
214
+ apollo_update_interval: 200
215
+ badam_mask_mode: adjacent
216
+ badam_mode: layer
217
+ badam_start_block: null
218
+ badam_switch_interval: 50
219
+ badam_switch_mode: ascending
220
+ badam_update_ratio: 0.05
221
+ badam_verbose: 0
222
+ compute_accuracy: false
223
+ create_new_adapter: false
224
+ disable_shuffling: false
225
+ dpo_label_smoothing: 0
226
+ eaft_alpha: 1
227
+ early_stopping_steps: null
228
+ finetuning_type: lora
229
+ freeze_extra_modules: null
230
+ freeze_language_model: false
231
+ freeze_multi_modal_projector: true
232
+ freeze_trainable_layers: 2
233
+ freeze_trainable_modules:
234
+ - all
235
+ freeze_vision_tower: true
236
+ galore_layerwise: false
237
+ galore_proj_type: std
238
+ galore_rank: 16
239
+ galore_scale: 2
240
+ galore_target:
241
+ - all
242
+ galore_update_interval: 200
243
+ include_effective_tokens_per_second: false
244
+ kto_chosen_weight: 1
245
+ kto_rejected_weight: 1
246
+ ld_alpha: null
247
+ lora_alpha: 32
248
+ lora_dropout: 0.03
249
+ lora_rank: 16
250
+ lora_target:
251
+ - all
252
+ loraplus_lr_embedding: 1e-06
253
+ loraplus_lr_ratio: null
254
+ module_dropout: 0
255
+ oft_block_size: 32
256
+ oft_rank: 0
257
+ oft_target:
258
+ - all
259
+ pissa_convert: false
260
+ pissa_init: false
261
+ pissa_iter: 16
262
+ plot_loss: true
263
+ ppo_buffer_size: 1
264
+ ppo_epochs: 4
265
+ ppo_score_norm: false
266
+ ppo_target: 6
267
+ ppo_whiten_rewards: false
268
+ pref_bco_weight: 0
269
+ pref_beta: 0.1
270
+ pref_ftx: 0
271
+ pref_loss: sigmoid
272
+ pure_bf16: false
273
+ ref_model: null
274
+ ref_model_adapters: null
275
+ ref_model_quantization_bit: null
276
+ reward_model: null
277
+ reward_model_adapters: null
278
+ reward_model_quantization_bit: null
279
+ reward_model_type: lora
280
+ simpo_gamma: 0.5
281
+ stage: pt
282
+ swanlab_api_key: <SWANLAB_API_KEY>
283
+ swanlab_lark_secret: null
284
+ swanlab_lark_webhook_url: null
285
+ swanlab_logdir: null
286
+ swanlab_mode: cloud
287
+ swanlab_project: llamafactory
288
+ swanlab_run_name: null
289
+ swanlab_workspace: null
290
+ use_adam_mini: false
291
+ use_apollo: false
292
+ use_badam: false
293
+ use_dft_loss: false
294
+ use_dora: false
295
+ use_eaft_loss: false
296
+ use_galore: false
297
+ use_llama_pro: false
298
+ use_mca: false
299
+ use_muon: false
300
+ use_rslora: false
301
+ use_swanlab: false
302
+ fp8:
303
+ value: false
304
+ fp8_backend:
305
+ value: auto
306
+ fp8_enable_fsdp_float8_all_gather:
307
+ value: false
308
+ fp16:
309
+ value: false
310
+ fp16_full_eval:
311
+ value: false
312
+ fsdp:
313
+ value: []
314
+ fsdp_config:
315
+ value:
316
+ min_num_params: 0
317
+ xla: false
318
+ xla_fsdp_grad_ckpt: false
319
+ xla_fsdp_v2: false
320
+ full_determinism:
321
+ value: false
322
+ generating_args:
323
+ value:
324
+ do_sample: true
325
+ length_penalty: 1
326
+ max_new_tokens: 1024
327
+ num_beams: 1
328
+ repetition_penalty: 1
329
+ skip_special_tokens: true
330
+ temperature: 0.95
331
+ top_k: 50
332
+ top_p: 0.7
333
+ generation_config:
334
+ value: null
335
+ generation_max_length:
336
+ value: 2047
337
+ generation_num_beams:
338
+ value: null
339
+ gradient_accumulation_steps:
340
+ value: 1
341
+ gradient_checkpointing:
342
+ value: false
343
+ gradient_checkpointing_kwargs:
344
+ value: null
345
+ greater_is_better:
346
+ value: null
347
+ group_by_length:
348
+ value: false
349
+ head_dim:
350
+ value: 128
351
+ hidden_act:
352
+ value: silu
353
+ hidden_size:
354
+ value: 4096
355
+ hub_always_push:
356
+ value: false
357
+ hub_model_id:
358
+ value: null
359
+ hub_private_repo:
360
+ value: null
361
+ hub_revision:
362
+ value: null
363
+ hub_strategy:
364
+ value: every_save
365
+ hub_token:
366
+ value: <HUB_TOKEN>
367
+ id2label:
368
+ value:
369
+ "0": LABEL_0
370
+ "1": LABEL_1
371
+ ignore_data_skip:
372
+ value: false
373
+ include_for_metrics:
374
+ value: []
375
+ include_num_input_tokens_seen:
376
+ value: all
377
+ initializer_range:
378
+ value: 0.02
379
+ intermediate_size:
380
+ value: 12288
381
+ is_encoder_decoder:
382
+ value: false
383
+ label_names:
384
+ value:
385
+ - labels
386
+ label_smoothing_factor:
387
+ value: 0
388
+ label2id:
389
+ value:
390
+ LABEL_0: 0
391
+ LABEL_1: 1
392
+ layer_types:
393
+ value:
394
+ - full_attention
395
+ - full_attention
396
+ - full_attention
397
+ - full_attention
398
+ - full_attention
399
+ - full_attention
400
+ - full_attention
401
+ - full_attention
402
+ - full_attention
403
+ - full_attention
404
+ - full_attention
405
+ - full_attention
406
+ - full_attention
407
+ - full_attention
408
+ - full_attention
409
+ - full_attention
410
+ - full_attention
411
+ - full_attention
412
+ - full_attention
413
+ - full_attention
414
+ - full_attention
415
+ - full_attention
416
+ - full_attention
417
+ - full_attention
418
+ - full_attention
419
+ - full_attention
420
+ - full_attention
421
+ - full_attention
422
+ - full_attention
423
+ - full_attention
424
+ - full_attention
425
+ - full_attention
426
+ - full_attention
427
+ - full_attention
428
+ - full_attention
429
+ - full_attention
430
+ learning_rate:
431
+ value: 5e-05
432
+ length_column_name:
433
+ value: length
434
+ liger_kernel_config:
435
+ value: null
436
+ load_best_model_at_end:
437
+ value: false
438
+ local_rank:
439
+ value: -1
440
+ log_level:
441
+ value: passive
442
+ log_level_replica:
443
+ value: warning
444
+ log_on_each_node:
445
+ value: true
446
+ logging_dir:
447
+ value: null
448
+ logging_first_step:
449
+ value: false
450
+ logging_nan_inf_filter:
451
+ value: true
452
+ logging_steps:
453
+ value: 1
454
+ logging_strategy:
455
+ value: steps
456
+ lr_scheduler_kwargs:
457
+ value: null
458
+ lr_scheduler_type:
459
+ value: cosine
460
+ master_addr:
461
+ value: null
462
+ master_port:
463
+ value: null
464
+ max_grad_norm:
465
+ value: 1
466
+ max_position_embeddings:
467
+ value: 32768
468
+ max_steps:
469
+ value: -1
470
+ max_window_layers:
471
+ value: 36
472
+ metric_for_best_model:
473
+ value: null
474
+ model/num_parameters:
475
+ value: 8234382336
476
+ model_args:
477
+ value:
478
+ adapter_folder: null
479
+ adapter_name_or_path: null
480
+ add_special_tokens: null
481
+ add_tokens: null
482
+ audio_sampling_rate: 16000
483
+ block_diag_attn: false
484
+ cache_dir: null
485
+ chunk_size: 8192
486
+ compute_dtype: torch.bfloat16
487
+ cpu_infer: 32
488
+ crop_to_patches: false
489
+ device_map:
490
+ "": cuda:0
491
+ disable_gradient_checkpointing: false
492
+ double_quantization: true
493
+ enable_liger_kernel: false
494
+ export_device: cpu
495
+ export_dir: null
496
+ export_hub_model_id: null
497
+ export_legacy_format: false
498
+ export_quantization_bit: null
499
+ export_quantization_dataset: null
500
+ export_quantization_maxlen: 1024
501
+ export_quantization_nsamples: 128
502
+ export_size: 5
503
+ flash_attn: auto
504
+ hf_hub_token: <HF_HUB_TOKEN>
505
+ image_do_pan_and_scan: false
506
+ image_max_pixels: 589824
507
+ image_min_pixels: 1024
508
+ infer_backend: HF
509
+ infer_dtype: auto
510
+ init_special_tokens: noise_init
511
+ kt_force_think: false
512
+ kt_maxlen: 4096
513
+ kt_mode: normal
514
+ kt_optimize_rule: null
515
+ kt_use_cuda_graph: true
516
+ low_cpu_mem_usage: true
517
+ mixture_of_depths: null
518
+ mode: normal
519
+ model_max_length: 2047
520
+ model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
521
+ model_revision: main
522
+ moe_aux_loss_coef: null
523
+ ms_hub_token: <MS_HUB_TOKEN>
524
+ new_special_tokens_config: null
525
+ offload_folder: offload
526
+ om_hub_token: <OM_HUB_TOKEN>
527
+ print_param_status: false
528
+ quantization_bit: null
529
+ quantization_device_map: null
530
+ quantization_method: BNB
531
+ quantization_type: nf4
532
+ resize_vocab: false
533
+ rope_scaling: null
534
+ sglang_config: null
535
+ sglang_lora_backend: triton
536
+ sglang_maxlen: 4096
537
+ sglang_mem_fraction: 0.7
538
+ sglang_tp_size: -1
539
+ shift_attn: false
540
+ split_special_tokens: false
541
+ train_from_scratch: false
542
+ trust_remote_code: true
543
+ upcast_layernorm: false
544
+ upcast_lmhead_output: false
545
+ use_audio_in_video: false
546
+ use_fast_tokenizer: true
547
+ use_kt: false
548
+ use_kv_cache: true
549
+ use_reentrant_gc: true
550
+ use_unsloth: false
551
+ use_unsloth_gc: false
552
+ use_v1_kernels: false
553
+ video_fps: 2
554
+ video_max_pixels: 65536
555
+ video_maxlen: 128
556
+ video_min_pixels: 256
557
+ vllm_config: null
558
+ vllm_enforce_eager: false
559
+ vllm_gpu_util: 0.7
560
+ vllm_max_lora_rank: 32
561
+ vllm_maxlen: 4096
562
+ model_type:
563
+ value: qwen3
564
+ neftune_noise_alpha:
565
+ value: null
566
+ num_attention_heads:
567
+ value: 32
568
+ num_hidden_layers:
569
+ value: 36
570
+ num_key_value_heads:
571
+ value: 8
572
+ num_train_epochs:
573
+ value: 5
574
+ optim:
575
+ value: adamw_torch
576
+ optim_args:
577
+ value: null
578
+ optim_target_modules:
579
+ value: null
580
+ output_attentions:
581
+ value: false
582
+ output_dir:
583
+ value: /workspace/v127rc_exp1/B_mul
584
+ output_hidden_states:
585
+ value: false
586
+ overwrite_output_dir:
587
+ value: false
588
+ pad_token_id:
589
+ value: 151643
590
+ parallelism_config:
591
+ value: null
592
+ peft_config:
593
+ value:
594
+ default:
595
+ alora_invocation_tokens: null
596
+ arrow_config: null
597
+ auto_mapping: null
598
+ base_model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
599
+ bias: none
600
+ corda_config: null
601
+ ensure_weight_tying: false
602
+ eva_config: null
603
+ exclude_modules: null
604
+ fan_in_fan_out: false
605
+ inference_mode: false
606
+ init_lora_weights: true
607
+ layer_replication: null
608
+ layers_pattern: null
609
+ layers_to_transform: null
610
+ lora_alpha: 32
611
+ lora_bias: false
612
+ lora_dropout: 0.03
613
+ megatron_config: null
614
+ megatron_core: megatron.core
615
+ modules_to_save: null
616
+ peft_type: LORA
617
+ peft_version: 0.18.1
618
+ qalora_group_size: 16
619
+ r: 16
620
+ revision: null
621
+ runtime_config:
622
+ ephemeral_gpu_offload: false
623
+ target_modules:
624
+ - down_proj
625
+ - gate_proj
626
+ - v_proj
627
+ - o_proj
628
+ - up_proj
629
+ - k_proj
630
+ - q_proj
631
+ target_parameters: null
632
+ task_type: CAUSAL_LM
633
+ trainable_token_indices: null
634
+ use_dora: false
635
+ use_qalora: false
636
+ use_rslora: false
637
+ per_device_eval_batch_size:
638
+ value: 8
639
+ per_device_train_batch_size:
640
+ value: 1
641
+ predict_with_generate:
642
+ value: false
643
+ prediction_loss_only:
644
+ value: false
645
+ problem_type:
646
+ value: null
647
+ project:
648
+ value: huggingface
649
+ push_to_hub:
650
+ value: false
651
+ ray_init_kwargs:
652
+ value: null
653
+ ray_num_workers:
654
+ value: 1
655
+ remove_unused_columns:
656
+ value: false
657
+ report_to:
658
+ value:
659
+ - wandb
660
+ restore_callback_states_from_checkpoint:
661
+ value: false
662
+ resume_from_checkpoint:
663
+ value: null
664
+ return_dict:
665
+ value: true
666
+ rms_norm_eps:
667
+ value: 1e-06
668
+ rope_parameters:
669
+ value:
670
+ rope_theta: 1000000
671
+ rope_type: default
672
+ run_name:
673
+ value: null
674
+ save_on_each_node:
675
+ value: false
676
+ save_only_model:
677
+ value: true
678
+ save_steps:
679
+ value: 1000
680
+ save_strategy:
681
+ value: steps
682
+ save_total_limit:
683
+ value: null
684
+ seed:
685
+ value: 42
686
+ skip_memory_metrics:
687
+ value: true
688
+ sliding_window:
689
+ value: null
690
+ sortish_sampler:
691
+ value: false
692
+ tf32:
693
+ value: null
694
+ tie_word_embeddings:
695
+ value: false
696
+ torch_compile:
697
+ value: false
698
+ torch_compile_backend:
699
+ value: null
700
+ torch_compile_mode:
701
+ value: null
702
+ torch_empty_cache_steps:
703
+ value: null
704
+ trackio_space_id:
705
+ value: trackio
706
+ transformers_version:
707
+ value: 5.0.0
708
+ use_cache:
709
+ value: false
710
+ use_cpu:
711
+ value: false
712
+ use_liger_kernel:
713
+ value: false
714
+ use_sliding_window:
715
+ value: false
716
+ vocab_size:
717
+ value: 151936
718
+ warmup_ratio:
719
+ value: 0.02
720
+ warmup_steps:
721
+ value: 0.02
722
+ weight_decay:
723
+ value: 0
LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/requirements.txt ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pytz==2025.2
2
+ pydub==0.25.1
3
+ brotli==1.2.0
4
+ antlr4-python3-runtime==4.9.3
5
+ xxhash==3.6.0
6
+ websockets==15.0.1
7
+ tzdata==2025.3
8
+ typing_extensions==4.15.0
9
+ tqdm==4.67.3
10
+ tomlkit==0.13.3
11
+ termcolor==3.3.0
12
+ shtab==1.8.0
13
+ shellingham==1.5.4
14
+ sentencepiece==0.2.1
15
+ semantic-version==2.10.0
16
+ safetensors==0.7.0
17
+ ruff==0.15.0
18
+ regex==2026.1.15
19
+ python-multipart==0.0.22
20
+ pyparsing==3.3.2
21
+ pyarrow==23.0.0
22
+ protobuf==6.33.5
23
+ propcache==0.4.1
24
+ orjson==3.11.7
25
+ omegaconf==2.3.0
26
+ numpy==2.4.2
27
+ multidict==6.7.1
28
+ mdurl==0.1.2
29
+ kiwisolver==1.4.9
30
+ hf-xet==1.2.0
31
+ hf_transfer==0.1.9
32
+ groovy==0.1.2
33
+ frozenlist==1.8.0
34
+ fonttools==4.61.1
35
+ ffmpy==1.0.0
36
+ einops==0.8.2
37
+ docstring_parser==0.17.0
38
+ dill==0.3.8
39
+ cycler==0.12.1
40
+ click==8.3.1
41
+ av==16.0.0
42
+ annotated-types==0.7.0
43
+ annotated-doc==0.0.4
44
+ aiohappyeyeballs==2.6.1
45
+ aiofiles==24.1.0
46
+ yarl==1.22.0
47
+ uvicorn==0.40.0
48
+ typing-inspection==0.4.2
49
+ typer-slim==0.21.1
50
+ tiktoken==0.12.0
51
+ scipy==1.17.0
52
+ pydantic_core==2.41.4
53
+ pandas==2.3.3
54
+ multiprocess==0.70.16
55
+ modelscope==1.34.0
56
+ markdown-it-py==4.0.0
57
+ fire==0.7.1
58
+ contourpy==1.3.3
59
+ anyio==4.12.1
60
+ aiosignal==1.4.0
61
+ starlette==0.50.0
62
+ rich==14.3.2
63
+ pydantic==2.12.3
64
+ matplotlib==3.10.8
65
+ aiohttp==3.13.3
66
+ tyro==0.8.14
67
+ typer==0.21.1
68
+ torchdata==0.11.0
69
+ sse-starlette==3.2.0
70
+ safehttpx==0.1.7
71
+ huggingface_hub==1.4.0
72
+ fastapi==0.128.1
73
+ tokenizers==0.22.2
74
+ gradio_client==1.14.0
75
+ datasets==4.0.0
76
+ accelerate==1.11.0
77
+ transformers==5.0.0
78
+ gradio==5.50.0
79
+ trl==0.24.0
80
+ peft==0.18.1
81
+ llamafactory==0.9.5.dev0
82
+ jieba==0.42.1
83
+ rouge-chinese==1.0.3
84
+ joblib==1.5.3
85
+ nltk==3.9.2
86
+ py-cpuinfo==9.0.0
87
+ nvidia-ml-py==13.590.48
88
+ hjson==3.1.0
89
+ ninja==1.13.0
90
+ msgpack==1.1.2
91
+ deepspeed==0.16.9
92
+ smmap==5.0.2
93
+ sentry-sdk==2.52.0
94
+ gitdb==4.0.12
95
+ GitPython==3.1.46
96
+ wandb==0.24.2
97
+ entrypoints==0.4
98
+ jupyter_client==7.4.9
99
+ nbclassic==1.1.0
100
+ notebook==6.5.5
101
+ pyzmq==24.0.1
102
+ PyYAML==6.0.2
103
+ Send2Trash==1.8.3
104
+ argon2-cffi==23.1.0
105
+ argon2-cffi-bindings==21.2.0
106
+ arrow==1.3.0
107
+ asttokens==2.4.1
108
+ async-lru==2.0.4
109
+ attrs==24.2.0
110
+ babel==2.16.0
111
+ beautifulsoup4==4.12.3
112
+ bleach==6.1.0
113
+ certifi==2024.8.30
114
+ cffi==1.17.1
115
+ charset-normalizer==3.3.2
116
+ comm==0.2.2
117
+ debugpy==1.8.5
118
+ decorator==5.1.1
119
+ defusedxml==0.7.1
120
+ executing==2.1.0
121
+ fastjsonschema==2.20.0
122
+ fqdn==1.5.1
123
+ h11==0.14.0
124
+ httpcore==1.0.5
125
+ httpx==0.27.2
126
+ idna==3.10
127
+ ipykernel==6.29.5
128
+ ipython==8.27.0
129
+ ipython-genutils==0.2.0
130
+ ipywidgets==8.1.5
131
+ isoduration==20.11.0
132
+ jedi==0.19.1
133
+ json5==0.9.25
134
+ jsonpointer==3.0.0
135
+ jsonschema==4.23.0
136
+ jsonschema-specifications==2023.12.1
137
+ jupyter-archive==3.4.0
138
+ jupyter_contrib_core==0.4.2
139
+ jupyter_contrib_nbextensions==0.7.0
140
+ jupyter_core==5.7.2
141
+ jupyter-events==0.10.0
142
+ jupyter-highlight-selected-word==0.2.0
143
+ jupyter-lsp==2.2.5
144
+ jupyter_nbextensions_configurator==0.6.4
145
+ jupyter_server==2.14.2
146
+ jupyter_server_terminals==0.5.3
147
+ jupyterlab==4.2.5
148
+ jupyterlab_pygments==0.3.0
149
+ jupyterlab_server==2.27.3
150
+ jupyterlab_widgets==3.0.13
151
+ lxml==5.3.0
152
+ matplotlib-inline==0.1.7
153
+ mistune==3.0.2
154
+ nbclient==0.10.0
155
+ nbconvert==7.16.4
156
+ nbformat==5.10.4
157
+ nest-asyncio==1.6.0
158
+ notebook_shim==0.2.4
159
+ overrides==7.7.0
160
+ packaging==24.1
161
+ pandocfilters==1.5.1
162
+ parso==0.8.4
163
+ pexpect==4.9.0
164
+ platformdirs==4.3.6
165
+ prometheus_client==0.21.0
166
+ prompt_toolkit==3.0.47
167
+ psutil==6.0.0
168
+ ptyprocess==0.7.0
169
+ pure_eval==0.2.3
170
+ pycparser==2.22
171
+ Pygments==2.18.0
172
+ python-dateutil==2.9.0.post0
173
+ python-json-logger==2.0.7
174
+ referencing==0.35.1
175
+ requests==2.32.3
176
+ rfc3339-validator==0.1.4
177
+ rfc3986-validator==0.1.1
178
+ rpds-py==0.20.0
179
+ sniffio==1.3.1
180
+ soupsieve==2.6
181
+ stack-data==0.6.3
182
+ terminado==0.18.1
183
+ tinycss2==1.3.0
184
+ tornado==6.4.1
185
+ traitlets==5.14.3
186
+ types-python-dateutil==2.9.0.20240906
187
+ uri-template==1.3.0
188
+ urllib3==2.2.3
189
+ wcwidth==0.2.13
190
+ webcolors==24.8.0
191
+ webencodings==0.5.1
192
+ websocket-client==1.8.0
193
+ widgetsnbextension==4.0.13
194
+ Jinja2==3.1.3
195
+ MarkupSafe==2.1.5
196
+ filelock==3.13.1
197
+ fsspec==2024.2.0
198
+ mpmath==1.3.0
199
+ networkx==3.2.1
200
+ nvidia-cublas-cu12==12.4.2.65
201
+ nvidia-cuda-cupti-cu12==12.4.99
202
+ nvidia-cuda-nvrtc-cu12==12.4.99
203
+ nvidia-cuda-runtime-cu12==12.4.99
204
+ nvidia-cudnn-cu12==9.1.0.70
205
+ nvidia-cufft-cu12==11.2.0.44
206
+ nvidia-curand-cu12==10.3.5.119
207
+ nvidia-cusolver-cu12==11.6.0.99
208
+ nvidia-cusparse-cu12==12.3.0.142
209
+ nvidia-nccl-cu12==2.20.5
210
+ nvidia-nvjitlink-cu12==12.4.99
211
+ nvidia-nvtx-cu12==12.4.99
212
+ pillow==10.2.0
213
+ sympy==1.12
214
+ torch==2.4.1+cu124
215
+ torchaudio==2.4.1+cu124
216
+ torchvision==0.19.1+cu124
217
+ triton==3.0.0
218
+ pip==24.2
219
+ setuptools==75.1.0
220
+ wheel==0.44.0
221
+ PyGObject==3.42.1
222
+ PyJWT==2.3.0
223
+ SecretStorage==3.3.1
224
+ blinker==1.4
225
+ cryptography==3.4.8
226
+ dbus-python==1.2.18
227
+ distro==1.7.0
228
+ httplib2==0.20.2
229
+ importlib-metadata==4.6.4
230
+ jeepney==0.7.1
231
+ keyring==23.5.0
232
+ launchpadlib==1.10.16
233
+ lazr.restfulclient==0.14.4
234
+ lazr.uri==1.0.6
235
+ more-itertools==8.10.0
236
+ oauthlib==3.2.0
237
+ python-apt==2.4.0+ubuntu4
238
+ six==1.16.0
239
+ wadllib==1.3.6
240
+ zipp==1.0.0
241
+ autocommand==2.2.2
242
+ backports.tarfile==1.2.0
243
+ importlib_metadata==8.0.0
244
+ importlib_resources==6.4.0
245
+ inflect==7.3.1
246
+ jaraco.collections==5.1.0
247
+ jaraco.context==5.3.0
248
+ jaraco.functools==4.0.1
249
+ jaraco.text==3.12.1
250
+ more-itertools==10.3.0
251
+ packaging==24.1
252
+ platformdirs==4.2.2
253
+ tomli==2.0.1
254
+ typeguard==4.3.0
255
+ typing_extensions==4.12.2
256
+ wheel==0.43.0
257
+ zipp==3.19.2
LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/wandb-metadata.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.5.0-45-generic-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.11.10",
4
+ "startedAt": "2026-02-05T02:43:15.345218Z",
5
+ "args": [
6
+ "/workspace/v127rc_exp1/B_mul.yaml"
7
+ ],
8
+ "program": "/usr/local/bin/llamafactory-cli",
9
+ "git": {
10
+ "remote": "https://github.com/hiyouga/LlamaFactory.git",
11
+ "commit": "1a02717fa84c270d1c156c4c4a391c2f95525a63"
12
+ },
13
+ "email": "markmochi200@gmail.com",
14
+ "root": "/workspace/LlamaFactory",
15
+ "host": "aac78e9f96d6",
16
+ "executable": "/usr/bin/python",
17
+ "cpu_count": 24,
18
+ "cpu_count_logical": 48,
19
+ "gpu": "NVIDIA GeForce RTX 4090",
20
+ "gpu_count": 1,
21
+ "disk": {
22
+ "/": {
23
+ "total": "21474836480",
24
+ "used": "2532245504"
25
+ }
26
+ },
27
+ "memory": {
28
+ "total": "270095220736"
29
+ },
30
+ "gpu_nvidia": [
31
+ {
32
+ "name": "NVIDIA GeForce RTX 4090",
33
+ "memoryTotal": "25757220864",
34
+ "cudaCores": 16384,
35
+ "architecture": "Ada",
36
+ "uuid": "GPU-e976cc69-65b4-6c21-059f-e293b12f21b0"
37
+ }
38
+ ],
39
+ "cudaVersion": "12.4",
40
+ "writerId": "qv5hxlnjyijt45h1hirthpvj46z6i895"
41
+ }
LlamaFactory/wandb/run-20260205_024315-zvjq5754/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"train/learning_rate":4.116093577088975e-15,"train/train_tokens_per_second":1972.905,"_runtime":183302,"train/grad_norm":1.8340671062469482,"_step":176660,"_timestamp":1.7704426938110843e+09,"train_steps_per_second":0.964,"train_loss":0.33225888585165475,"train/global_step":176660,"train/loss":0.3242933750152588,"total_flos":1.6516160437296538e+19,"train_samples_per_second":0.964,"_wandb":{"runtime":183302},"train/num_input_tokens_seen":361623020,"train/epoch":5,"train_runtime":183298.9789}
LlamaFactory/wandb/run-20260205_024315-zvjq5754/logs/debug-internal.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-02-05T02:43:15.615961908Z","level":"INFO","msg":"stream: starting","core version":"0.24.2"}
2
+ {"time":"2026-02-05T02:43:15.950222194Z","level":"INFO","msg":"stream: created new stream","id":"zvjq5754"}
3
+ {"time":"2026-02-05T02:43:15.95079635Z","level":"INFO","msg":"handler: started","stream_id":"zvjq5754"}
4
+ {"time":"2026-02-05T02:43:15.953439954Z","level":"INFO","msg":"stream: started","id":"zvjq5754"}
5
+ {"time":"2026-02-05T02:43:15.953486855Z","level":"INFO","msg":"writer: started","stream_id":"zvjq5754"}
6
+ {"time":"2026-02-05T02:43:15.953538277Z","level":"INFO","msg":"sender: started","stream_id":"zvjq5754"}
7
+ {"time":"2026-02-07T05:38:19.134362036Z","level":"INFO","msg":"stream: closing","id":"zvjq5754"}
8
+ {"time":"2026-02-07T05:38:21.814636279Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
9
+ {"time":"2026-02-07T05:38:22.180939303Z","level":"INFO","msg":"handler: closed","stream_id":"zvjq5754"}
10
+ {"time":"2026-02-07T05:38:22.184153269Z","level":"INFO","msg":"sender: closed","stream_id":"zvjq5754"}
11
+ {"time":"2026-02-07T05:38:22.184551619Z","level":"INFO","msg":"stream: closed","id":"zvjq5754"}
LlamaFactory/wandb/run-20260205_024315-zvjq5754/logs/debug.log ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-05 02:43:15,370 INFO MainThread:2144 [wandb_setup.py:_flush():81] Current SDK version is 0.24.2
2
+ 2026-02-05 02:43:15,370 INFO MainThread:2144 [wandb_setup.py:_flush():81] Configure stats pid to 2144
3
+ 2026-02-05 02:43:15,370 INFO MainThread:2144 [wandb_setup.py:_flush():81] Loading settings from environment variables
4
+ 2026-02-05 02:43:15,371 INFO MainThread:2144 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/LlamaFactory/wandb/run-20260205_024315-zvjq5754/logs/debug.log
5
+ 2026-02-05 02:43:15,372 INFO MainThread:2144 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/LlamaFactory/wandb/run-20260205_024315-zvjq5754/logs/debug-internal.log
6
+ 2026-02-05 02:43:15,373 INFO MainThread:2144 [wandb_init.py:init():844] calling init triggers
7
+ 2026-02-05 02:43:15,373 INFO MainThread:2144 [wandb_init.py:init():849] wandb.init called with sweep_config: {}
8
+ config: {'_wandb': {}}
9
+ 2026-02-05 02:43:15,374 INFO MainThread:2144 [wandb_init.py:init():892] starting backend
10
+ 2026-02-05 02:43:15,598 INFO MainThread:2144 [wandb_init.py:init():895] sending inform_init request
11
+ 2026-02-05 02:43:15,612 INFO MainThread:2144 [wandb_init.py:init():903] backend started and connected
12
+ 2026-02-05 02:43:15,615 INFO MainThread:2144 [wandb_init.py:init():973] updated telemetry
13
+ 2026-02-05 02:43:15,665 INFO MainThread:2144 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
14
+ 2026-02-05 02:43:16,280 INFO MainThread:2144 [wandb_init.py:init():1042] starting run threads in backend
15
+ 2026-02-05 02:43:16,427 INFO MainThread:2144 [wandb_run.py:_console_start():2529] atexit reg
16
+ 2026-02-05 02:43:16,428 INFO MainThread:2144 [wandb_run.py:_redirect():2377] redirect: wrap_raw
17
+ 2026-02-05 02:43:16,428 INFO MainThread:2144 [wandb_run.py:_redirect():2446] Wrapping output streams.
18
+ 2026-02-05 02:43:16,429 INFO MainThread:2144 [wandb_run.py:_redirect():2469] Redirects installed.
19
+ 2026-02-05 02:43:16,431 INFO MainThread:2144 [wandb_init.py:init():1082] run started, returning control to user process
20
+ 2026-02-05 02:43:16,432 INFO MainThread:2144 [wandb_run.py:_config_callback():1404] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.1', 'base_model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'revision': None, 'inference_mode': False, 'r': 16, 'target_modules': ['down_proj', 'gate_proj', 'v_proj', 'o_proj', 'up_proj', 'k_proj', 'q_proj'], 'exclude_modules': None, 'lora_alpha': 32, 'lora_dropout': 0.03, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 4096, 'intermediate_size': 12288, 'num_hidden_layers': 36, 'num_attention_heads': 32, 'use_sliding_window': False, 'sliding_window': None, 'max_window_layers': 36, 'num_key_value_heads': 8, 'head_dim': 128, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'attention_bias': False, 'attention_dropout': 0.0, 'layer_types': ['full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention'], 'pad_token_id': 151643, 'bos_token_id': None, 'eos_token_id': 151645, 'tie_word_embeddings': False, 'rope_parameters': {'rope_theta': 1000000, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'architectures': ['Qwen3ForCausalLM'], 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'problem_type': None, '_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'transformers_version': '5.0.0', 'model_type': 'qwen3', 'output_attentions': False, 'output_dir': '/workspace/v127rc_exp1/B_mul', 'do_train': True, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 8, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_epsilon': 1e-08, 'max_grad_norm': 1, 'num_train_epochs': 5, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.02, 'warmup_steps': 0.02, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': True, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': ['labels'], 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 180000000, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'all', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'sortish_sampler': False, 'predict_with_generate': False, 'generation_max_length': 2047, 'generation_num_beams': None, 'generation_config': None, 'ray_num_workers': 1, 'ray_init_kwargs': None, 'master_addr': None, 'master_port': None, 'fp8': False, 'fp8_backend': 'auto', 'fp8_enable_fsdp_float8_all_gather': False, 'overwrite_output_dir': False}
21
+ 2026-02-05 02:43:16,441 INFO MainThread:2144 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 8234382336 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x75449dcbca90>>
22
+ 2026-02-05 02:43:16,441 INFO MainThread:2144 [wandb_run.py:_config_callback():1404] config_cb model/num_parameters 8234382336 None
23
+ 2026-02-05 02:43:16,444 INFO MainThread:2144 [wandb_run.py:_config_callback():1404] config_cb None None {'model_args': {'model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'adapter_name_or_path': None, 'adapter_folder': None, 'cache_dir': None, 'use_fast_tokenizer': True, 'resize_vocab': False, 'split_special_tokens': False, 'add_tokens': None, 'add_special_tokens': None, 'new_special_tokens_config': None, 'init_special_tokens': 'noise_init', 'model_revision': 'main', 'low_cpu_mem_usage': True, 'rope_scaling': None, 'flash_attn': 'auto', 'shift_attn': False, 'mixture_of_depths': None, 'use_unsloth': False, 'use_unsloth_gc': False, 'enable_liger_kernel': False, 'moe_aux_loss_coef': None, 'disable_gradient_checkpointing': False, 'use_reentrant_gc': True, 'upcast_layernorm': False, 'upcast_lmhead_output': False, 'train_from_scratch': False, 'infer_backend': 'HF', 'offload_folder': 'offload', 'use_kv_cache': True, 'use_v1_kernels': False, 'infer_dtype': 'auto', 'hf_hub_token': '<HF_HUB_TOKEN>', 'ms_hub_token': '<MS_HUB_TOKEN>', 'om_hub_token': '<OM_HUB_TOKEN>', 'print_param_status': False, 'trust_remote_code': True, 'quantization_method': 'BNB', 'quantization_bit': None, 'quantization_type': 'nf4', 'double_quantization': True, 'quantization_device_map': None, 'image_max_pixels': 589824, 'image_min_pixels': 1024, 'image_do_pan_and_scan': False, 'crop_to_patches': False, 'video_max_pixels': 65536, 'video_min_pixels': 256, 'video_fps': 2.0, 'video_maxlen': 128, 'use_audio_in_video': False, 'audio_sampling_rate': 16000, 'export_dir': None, 'export_size': 5, 'export_device': 'cpu', 'export_quantization_bit': None, 'export_quantization_dataset': None, 'export_quantization_nsamples': 128, 'export_quantization_maxlen': 1024, 'export_legacy_format': False, 'export_hub_model_id': None, 'use_kt': False, 'kt_optimize_rule': None, 'cpu_infer': 32, 'chunk_size': 8192, 'mode': 'normal', 'kt_maxlen': 4096, 'kt_use_cuda_graph': True, 'kt_mode': 'normal', 'kt_force_think': False, 'vllm_maxlen': 4096, 'vllm_gpu_util': 0.7, 'vllm_enforce_eager': False, 'vllm_max_lora_rank': 32, 'vllm_config': None, 'sglang_maxlen': 4096, 'sglang_mem_fraction': 0.7, 'sglang_tp_size': -1, 'sglang_config': None, 'sglang_lora_backend': 'triton', 'compute_dtype': 'torch.bfloat16', 'device_map': {'': 'cuda:0'}, 'model_max_length': 2047, 'block_diag_attn': False}, 'data_args': {'template': 'qwen3_nothink', 'dataset': ['Markie_Voss_t35_d0_r286'], 'eval_dataset': None, 'dataset_dir': '/workspace/LlamaFactory/data', 'media_dir': '/workspace/LlamaFactory/data', 'cutoff_len': 2047, 'train_on_prompt': False, 'mask_history': False, 'streaming': False, 'buffer_size': 16384, 'mix_strategy': 'concat', 'interleave_probs': None, 'overwrite_cache': False, 'preprocessing_batch_size': 1000, 'preprocessing_num_workers': 16, 'max_samples': 100000000, 'eval_num_beams': None, 'ignore_pad_token_for_loss': True, 'val_size': 0.0, 'eval_on_each_dataset': False, 'packing': True, 'neat_packing': False, 'tool_format': None, 'default_system': None, 'enable_thinking': False, 'tokenized_path': None, 'data_shared_file_system': False}, 'finetuning_args': {'freeze_trainable_layers': 2, 'freeze_trainable_modules': ['all'], 'freeze_extra_modules': None, 'additional_target': None, 'module_dropout': 0.0, 'oft_rank': 0, 'oft_block_size': 32, 'oft_target': ['all'], 'create_new_adapter': False, 'lora_alpha': 32, 'lora_dropout': 0.03, 'lora_rank': 16, 'lora_target': ['all'], 'loraplus_lr_ratio': None, 'loraplus_lr_embedding': 1e-06, 'use_rslora': False, 'use_dora': False, 'pissa_init': False, 'pissa_iter': 16, 'pissa_convert': False, 'pref_beta': 0.1, 'pref_ftx': 0.0, 'pref_bco_weight': 0.0, 'pref_loss': 'sigmoid', 'dpo_label_smoothing': 0.0, 'kto_chosen_weight': 1.0, 'kto_rejected_weight': 1.0, 'simpo_gamma': 0.5, 'ppo_buffer_size': 1, 'ppo_epochs': 4, 'ppo_score_norm': False, 'ppo_target': 6.0, 'ppo_whiten_rewards': False, 'ref_model': None, 'ref_model_adapters': None, 'ref_model_quantization_bit': None, 'reward_model': None, 'reward_model_adapters': None, 'reward_model_quantization_bit': None, 'reward_model_type': 'lora', 'ld_alpha': None, 'use_galore': False, 'galore_target': ['all'], 'galore_rank': 16, 'galore_update_interval': 200, 'galore_scale': 2.0, 'galore_proj_type': 'std', 'galore_layerwise': False, 'use_apollo': False, 'apollo_target': ['all'], 'apollo_rank': 16, 'apollo_update_interval': 200, 'apollo_scale': 32.0, 'apollo_proj': 'random', 'apollo_proj_type': 'std', 'apollo_scale_type': 'channel', 'apollo_layerwise': False, 'apollo_scale_front': False, 'use_badam': False, 'badam_mode': 'layer', 'badam_start_block': None, 'badam_switch_mode': 'ascending', 'badam_switch_interval': 50, 'badam_update_ratio': 0.05, 'badam_mask_mode': 'adjacent', 'badam_verbose': 0, 'use_swanlab': False, 'swanlab_project': 'llamafactory', 'swanlab_workspace': None, 'swanlab_run_name': None, 'swanlab_mode': 'cloud', 'swanlab_api_key': '<SWANLAB_API_KEY>', 'swanlab_logdir': None, 'swanlab_lark_webhook_url': None, 'swanlab_lark_secret': None, 'pure_bf16': False, 'stage': 'pt', 'finetuning_type': 'lora', 'use_llama_pro': False, 'use_adam_mini': False, 'use_mca': False, 'use_muon': False, 'use_dft_loss': False, 'use_eaft_loss': False, 'eaft_alpha': 1.0, 'freeze_vision_tower': True, 'freeze_multi_modal_projector': True, 'freeze_language_model': False, 'compute_accuracy': False, 'disable_shuffling': False, 'early_stopping_steps': None, 'plot_loss': True, 'include_effective_tokens_per_second': False}, 'generating_args': {'do_sample': True, 'temperature': 0.95, 'top_p': 0.7, 'top_k': 50, 'num_beams': 1, 'max_new_tokens': 1024, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'skip_special_tokens': True}}
24
+ 2026-02-07 05:38:19,134 INFO wandb-AsyncioManager-main:2144 [service_client.py:_forward_responses():94] Reached EOF.
25
+ 2026-02-07 05:38:19,135 INFO wandb-AsyncioManager-main:2144 [mailbox.py:close():154] Closing mailbox, abandoning 1 handles.
LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/config.yaml ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: /workspace/Qwen/Qwen3-8B-Base
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.24.2
6
+ e:
7
+ qggzf6e4qf9vgtbvttkcenmdoze5458m:
8
+ args:
9
+ - /workspace/v127rc_exp1/E_mup.yaml
10
+ cpu_count: 16
11
+ cpu_count_logical: 32
12
+ cudaVersion: "12.7"
13
+ disk:
14
+ /:
15
+ total: "21474836480"
16
+ used: "2479427584"
17
+ email: markmochi200@gmail.com
18
+ executable: /usr/bin/python
19
+ git:
20
+ commit: 1a02717fa84c270d1c156c4c4a391c2f95525a63
21
+ remote: https://github.com/hiyouga/LlamaFactory.git
22
+ gpu: NVIDIA GeForce RTX 4090
23
+ gpu_count: 1
24
+ gpu_nvidia:
25
+ - architecture: Ada
26
+ cudaCores: 16384
27
+ memoryTotal: "25757220864"
28
+ name: NVIDIA GeForce RTX 4090
29
+ uuid: GPU-ceaf5bb7-0aeb-5000-346c-08e2d1143c3c
30
+ host: d74c94898fdd
31
+ memory:
32
+ total: "201668009984"
33
+ os: Linux-6.8.0-52-generic-x86_64-with-glibc2.35
34
+ program: /usr/local/bin/llamafactory-cli
35
+ python: CPython 3.11.10
36
+ root: /workspace/LlamaFactory
37
+ startedAt: "2026-02-05T07:40:20.662178Z"
38
+ writerId: qggzf6e4qf9vgtbvttkcenmdoze5458m
39
+ m:
40
+ - "1": train/global_step
41
+ "6":
42
+ - 3
43
+ "7": []
44
+ - "2": '*'
45
+ "5": 1
46
+ "6":
47
+ - 1
48
+ "7": []
49
+ python_version: 3.11.10
50
+ t:
51
+ "1":
52
+ - 1
53
+ - 11
54
+ - 41
55
+ - 49
56
+ - 51
57
+ - 71
58
+ - 84
59
+ - 98
60
+ - 105
61
+ "2":
62
+ - 1
63
+ - 11
64
+ - 41
65
+ - 49
66
+ - 51
67
+ - 71
68
+ - 84
69
+ - 98
70
+ - 105
71
+ "3":
72
+ - 7
73
+ - 19
74
+ - 62
75
+ - 66
76
+ "4": 3.11.10
77
+ "5": 0.24.2
78
+ "6": 5.0.0
79
+ "9":
80
+ "1": transformers_trainer
81
+ "12": 0.24.2
82
+ "13": linux-x86_64
83
+ accelerator_config:
84
+ value:
85
+ dispatch_batches: null
86
+ even_batches: true
87
+ gradient_accumulation_kwargs: null
88
+ non_blocking: false
89
+ split_batches: false
90
+ use_seedable_sampler: true
91
+ adam_beta1:
92
+ value: 0.9
93
+ adam_beta2:
94
+ value: 0.95
95
+ adam_epsilon:
96
+ value: 1e-08
97
+ architectures:
98
+ value:
99
+ - Qwen3ForCausalLM
100
+ attention_bias:
101
+ value: false
102
+ attention_dropout:
103
+ value: 0
104
+ auto_find_batch_size:
105
+ value: false
106
+ average_tokens_across_devices:
107
+ value: true
108
+ batch_eval_metrics:
109
+ value: false
110
+ bf16:
111
+ value: true
112
+ bf16_full_eval:
113
+ value: false
114
+ bos_token_id:
115
+ value: null
116
+ chunk_size_feed_forward:
117
+ value: 0
118
+ data_args:
119
+ value:
120
+ buffer_size: 16384
121
+ cutoff_len: 2047
122
+ data_shared_file_system: false
123
+ dataset:
124
+ - Markie_Voss_t119_d85_r1
125
+ dataset_dir: /workspace/LlamaFactory/data
126
+ default_system: null
127
+ enable_thinking: false
128
+ eval_dataset: null
129
+ eval_num_beams: null
130
+ eval_on_each_dataset: false
131
+ ignore_pad_token_for_loss: true
132
+ interleave_probs: null
133
+ mask_history: false
134
+ max_samples: 100000000
135
+ media_dir: /workspace/LlamaFactory/data
136
+ mix_strategy: concat
137
+ neat_packing: false
138
+ overwrite_cache: false
139
+ packing: true
140
+ preprocessing_batch_size: 1000
141
+ preprocessing_num_workers: 16
142
+ streaming: false
143
+ template: qwen3_nothink
144
+ tokenized_path: null
145
+ tool_format: null
146
+ train_on_prompt: false
147
+ val_size: 0
148
+ data_seed:
149
+ value: null
150
+ dataloader_drop_last:
151
+ value: false
152
+ dataloader_num_workers:
153
+ value: 0
154
+ dataloader_persistent_workers:
155
+ value: false
156
+ dataloader_pin_memory:
157
+ value: true
158
+ dataloader_prefetch_factor:
159
+ value: null
160
+ ddp_backend:
161
+ value: null
162
+ ddp_broadcast_buffers:
163
+ value: null
164
+ ddp_bucket_cap_mb:
165
+ value: null
166
+ ddp_find_unused_parameters:
167
+ value: null
168
+ ddp_timeout:
169
+ value: 180000000
170
+ debug:
171
+ value: []
172
+ deepspeed:
173
+ value: null
174
+ disable_tqdm:
175
+ value: false
176
+ do_eval:
177
+ value: false
178
+ do_predict:
179
+ value: false
180
+ do_train:
181
+ value: true
182
+ dtype:
183
+ value: bfloat16
184
+ enable_jit_checkpoint:
185
+ value: false
186
+ eos_token_id:
187
+ value: 151645
188
+ eval_accumulation_steps:
189
+ value: null
190
+ eval_delay:
191
+ value: 0
192
+ eval_do_concat_batches:
193
+ value: true
194
+ eval_on_start:
195
+ value: false
196
+ eval_steps:
197
+ value: null
198
+ eval_strategy:
199
+ value: "no"
200
+ eval_use_gather_object:
201
+ value: false
202
+ finetuning_args:
203
+ value:
204
+ additional_target: null
205
+ apollo_layerwise: false
206
+ apollo_proj: random
207
+ apollo_proj_type: std
208
+ apollo_rank: 16
209
+ apollo_scale: 32
210
+ apollo_scale_front: false
211
+ apollo_scale_type: channel
212
+ apollo_target:
213
+ - all
214
+ apollo_update_interval: 200
215
+ badam_mask_mode: adjacent
216
+ badam_mode: layer
217
+ badam_start_block: null
218
+ badam_switch_interval: 50
219
+ badam_switch_mode: ascending
220
+ badam_update_ratio: 0.05
221
+ badam_verbose: 0
222
+ compute_accuracy: false
223
+ create_new_adapter: false
224
+ disable_shuffling: false
225
+ dpo_label_smoothing: 0
226
+ eaft_alpha: 1
227
+ early_stopping_steps: null
228
+ finetuning_type: lora
229
+ freeze_extra_modules: null
230
+ freeze_language_model: false
231
+ freeze_multi_modal_projector: true
232
+ freeze_trainable_layers: 2
233
+ freeze_trainable_modules:
234
+ - all
235
+ freeze_vision_tower: true
236
+ galore_layerwise: false
237
+ galore_proj_type: std
238
+ galore_rank: 16
239
+ galore_scale: 2
240
+ galore_target:
241
+ - all
242
+ galore_update_interval: 200
243
+ include_effective_tokens_per_second: false
244
+ kto_chosen_weight: 1
245
+ kto_rejected_weight: 1
246
+ ld_alpha: null
247
+ lora_alpha: 32
248
+ lora_dropout: 0.03
249
+ lora_rank: 16
250
+ lora_target:
251
+ - all
252
+ loraplus_lr_embedding: 1e-06
253
+ loraplus_lr_ratio: null
254
+ module_dropout: 0
255
+ oft_block_size: 32
256
+ oft_rank: 0
257
+ oft_target:
258
+ - all
259
+ pissa_convert: false
260
+ pissa_init: false
261
+ pissa_iter: 16
262
+ plot_loss: true
263
+ ppo_buffer_size: 1
264
+ ppo_epochs: 4
265
+ ppo_score_norm: false
266
+ ppo_target: 6
267
+ ppo_whiten_rewards: false
268
+ pref_bco_weight: 0
269
+ pref_beta: 0.1
270
+ pref_ftx: 0
271
+ pref_loss: sigmoid
272
+ pure_bf16: false
273
+ ref_model: null
274
+ ref_model_adapters: null
275
+ ref_model_quantization_bit: null
276
+ reward_model: null
277
+ reward_model_adapters: null
278
+ reward_model_quantization_bit: null
279
+ reward_model_type: lora
280
+ simpo_gamma: 0.5
281
+ stage: pt
282
+ swanlab_api_key: <SWANLAB_API_KEY>
283
+ swanlab_lark_secret: null
284
+ swanlab_lark_webhook_url: null
285
+ swanlab_logdir: null
286
+ swanlab_mode: cloud
287
+ swanlab_project: llamafactory
288
+ swanlab_run_name: null
289
+ swanlab_workspace: null
290
+ use_adam_mini: false
291
+ use_apollo: false
292
+ use_badam: false
293
+ use_dft_loss: false
294
+ use_dora: false
295
+ use_eaft_loss: false
296
+ use_galore: false
297
+ use_llama_pro: false
298
+ use_mca: false
299
+ use_muon: false
300
+ use_rslora: false
301
+ use_swanlab: false
302
+ fp8:
303
+ value: false
304
+ fp8_backend:
305
+ value: auto
306
+ fp8_enable_fsdp_float8_all_gather:
307
+ value: false
308
+ fp16:
309
+ value: false
310
+ fp16_full_eval:
311
+ value: false
312
+ fsdp:
313
+ value: []
314
+ fsdp_config:
315
+ value:
316
+ min_num_params: 0
317
+ xla: false
318
+ xla_fsdp_grad_ckpt: false
319
+ xla_fsdp_v2: false
320
+ full_determinism:
321
+ value: false
322
+ generating_args:
323
+ value:
324
+ do_sample: true
325
+ length_penalty: 1
326
+ max_new_tokens: 1024
327
+ num_beams: 1
328
+ repetition_penalty: 1
329
+ skip_special_tokens: true
330
+ temperature: 0.95
331
+ top_k: 50
332
+ top_p: 0.7
333
+ generation_config:
334
+ value: null
335
+ generation_max_length:
336
+ value: 2047
337
+ generation_num_beams:
338
+ value: null
339
+ gradient_accumulation_steps:
340
+ value: 1
341
+ gradient_checkpointing:
342
+ value: false
343
+ gradient_checkpointing_kwargs:
344
+ value: null
345
+ greater_is_better:
346
+ value: null
347
+ group_by_length:
348
+ value: false
349
+ head_dim:
350
+ value: 128
351
+ hidden_act:
352
+ value: silu
353
+ hidden_size:
354
+ value: 4096
355
+ hub_always_push:
356
+ value: false
357
+ hub_model_id:
358
+ value: null
359
+ hub_private_repo:
360
+ value: null
361
+ hub_revision:
362
+ value: null
363
+ hub_strategy:
364
+ value: every_save
365
+ hub_token:
366
+ value: <HUB_TOKEN>
367
+ id2label:
368
+ value:
369
+ "0": LABEL_0
370
+ "1": LABEL_1
371
+ ignore_data_skip:
372
+ value: false
373
+ include_for_metrics:
374
+ value: []
375
+ include_num_input_tokens_seen:
376
+ value: all
377
+ initializer_range:
378
+ value: 0.02
379
+ intermediate_size:
380
+ value: 12288
381
+ is_encoder_decoder:
382
+ value: false
383
+ label_names:
384
+ value:
385
+ - labels
386
+ label_smoothing_factor:
387
+ value: 0
388
+ label2id:
389
+ value:
390
+ LABEL_0: 0
391
+ LABEL_1: 1
392
+ layer_types:
393
+ value:
394
+ - full_attention
395
+ - full_attention
396
+ - full_attention
397
+ - full_attention
398
+ - full_attention
399
+ - full_attention
400
+ - full_attention
401
+ - full_attention
402
+ - full_attention
403
+ - full_attention
404
+ - full_attention
405
+ - full_attention
406
+ - full_attention
407
+ - full_attention
408
+ - full_attention
409
+ - full_attention
410
+ - full_attention
411
+ - full_attention
412
+ - full_attention
413
+ - full_attention
414
+ - full_attention
415
+ - full_attention
416
+ - full_attention
417
+ - full_attention
418
+ - full_attention
419
+ - full_attention
420
+ - full_attention
421
+ - full_attention
422
+ - full_attention
423
+ - full_attention
424
+ - full_attention
425
+ - full_attention
426
+ - full_attention
427
+ - full_attention
428
+ - full_attention
429
+ - full_attention
430
+ learning_rate:
431
+ value: 5e-05
432
+ length_column_name:
433
+ value: length
434
+ liger_kernel_config:
435
+ value: null
436
+ load_best_model_at_end:
437
+ value: false
438
+ local_rank:
439
+ value: -1
440
+ log_level:
441
+ value: passive
442
+ log_level_replica:
443
+ value: warning
444
+ log_on_each_node:
445
+ value: true
446
+ logging_dir:
447
+ value: null
448
+ logging_first_step:
449
+ value: false
450
+ logging_nan_inf_filter:
451
+ value: true
452
+ logging_steps:
453
+ value: 1
454
+ logging_strategy:
455
+ value: steps
456
+ lr_scheduler_kwargs:
457
+ value: null
458
+ lr_scheduler_type:
459
+ value: cosine
460
+ master_addr:
461
+ value: null
462
+ master_port:
463
+ value: null
464
+ max_grad_norm:
465
+ value: 1
466
+ max_position_embeddings:
467
+ value: 32768
468
+ max_steps:
469
+ value: -1
470
+ max_window_layers:
471
+ value: 36
472
+ metric_for_best_model:
473
+ value: null
474
+ model/num_parameters:
475
+ value: 8234382336
476
+ model_args:
477
+ value:
478
+ adapter_folder: null
479
+ adapter_name_or_path: null
480
+ add_special_tokens: null
481
+ add_tokens: null
482
+ audio_sampling_rate: 16000
483
+ block_diag_attn: false
484
+ cache_dir: null
485
+ chunk_size: 8192
486
+ compute_dtype: torch.bfloat16
487
+ cpu_infer: 32
488
+ crop_to_patches: false
489
+ device_map:
490
+ "": cuda:0
491
+ disable_gradient_checkpointing: false
492
+ double_quantization: true
493
+ enable_liger_kernel: false
494
+ export_device: cpu
495
+ export_dir: null
496
+ export_hub_model_id: null
497
+ export_legacy_format: false
498
+ export_quantization_bit: null
499
+ export_quantization_dataset: null
500
+ export_quantization_maxlen: 1024
501
+ export_quantization_nsamples: 128
502
+ export_size: 5
503
+ flash_attn: auto
504
+ hf_hub_token: <HF_HUB_TOKEN>
505
+ image_do_pan_and_scan: false
506
+ image_max_pixels: 589824
507
+ image_min_pixels: 1024
508
+ infer_backend: HF
509
+ infer_dtype: auto
510
+ init_special_tokens: noise_init
511
+ kt_force_think: false
512
+ kt_maxlen: 4096
513
+ kt_mode: normal
514
+ kt_optimize_rule: null
515
+ kt_use_cuda_graph: true
516
+ low_cpu_mem_usage: true
517
+ mixture_of_depths: null
518
+ mode: normal
519
+ model_max_length: 2047
520
+ model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
521
+ model_revision: main
522
+ moe_aux_loss_coef: null
523
+ ms_hub_token: <MS_HUB_TOKEN>
524
+ new_special_tokens_config: null
525
+ offload_folder: offload
526
+ om_hub_token: <OM_HUB_TOKEN>
527
+ print_param_status: false
528
+ quantization_bit: null
529
+ quantization_device_map: null
530
+ quantization_method: BNB
531
+ quantization_type: nf4
532
+ resize_vocab: false
533
+ rope_scaling: null
534
+ sglang_config: null
535
+ sglang_lora_backend: triton
536
+ sglang_maxlen: 4096
537
+ sglang_mem_fraction: 0.7
538
+ sglang_tp_size: -1
539
+ shift_attn: false
540
+ split_special_tokens: false
541
+ train_from_scratch: false
542
+ trust_remote_code: true
543
+ upcast_layernorm: false
544
+ upcast_lmhead_output: false
545
+ use_audio_in_video: false
546
+ use_fast_tokenizer: true
547
+ use_kt: false
548
+ use_kv_cache: true
549
+ use_reentrant_gc: true
550
+ use_unsloth: false
551
+ use_unsloth_gc: false
552
+ use_v1_kernels: false
553
+ video_fps: 2
554
+ video_max_pixels: 65536
555
+ video_maxlen: 128
556
+ video_min_pixels: 256
557
+ vllm_config: null
558
+ vllm_enforce_eager: false
559
+ vllm_gpu_util: 0.7
560
+ vllm_max_lora_rank: 32
561
+ vllm_maxlen: 4096
562
+ model_type:
563
+ value: qwen3
564
+ neftune_noise_alpha:
565
+ value: null
566
+ num_attention_heads:
567
+ value: 32
568
+ num_hidden_layers:
569
+ value: 36
570
+ num_key_value_heads:
571
+ value: 8
572
+ num_train_epochs:
573
+ value: 5
574
+ optim:
575
+ value: adamw_torch
576
+ optim_args:
577
+ value: null
578
+ optim_target_modules:
579
+ value: null
580
+ output_attentions:
581
+ value: false
582
+ output_dir:
583
+ value: /workspace/v127rc_exp1/E_mup
584
+ output_hidden_states:
585
+ value: false
586
+ overwrite_output_dir:
587
+ value: false
588
+ pad_token_id:
589
+ value: 151643
590
+ parallelism_config:
591
+ value: null
592
+ peft_config:
593
+ value:
594
+ default:
595
+ alora_invocation_tokens: null
596
+ arrow_config: null
597
+ auto_mapping: null
598
+ base_model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
599
+ bias: none
600
+ corda_config: null
601
+ ensure_weight_tying: false
602
+ eva_config: null
603
+ exclude_modules: null
604
+ fan_in_fan_out: false
605
+ inference_mode: false
606
+ init_lora_weights: true
607
+ layer_replication: null
608
+ layers_pattern: null
609
+ layers_to_transform: null
610
+ lora_alpha: 32
611
+ lora_bias: false
612
+ lora_dropout: 0.03
613
+ megatron_config: null
614
+ megatron_core: megatron.core
615
+ modules_to_save: null
616
+ peft_type: LORA
617
+ peft_version: 0.18.1
618
+ qalora_group_size: 16
619
+ r: 16
620
+ revision: null
621
+ runtime_config:
622
+ ephemeral_gpu_offload: false
623
+ target_modules:
624
+ - k_proj
625
+ - down_proj
626
+ - q_proj
627
+ - up_proj
628
+ - gate_proj
629
+ - v_proj
630
+ - o_proj
631
+ target_parameters: null
632
+ task_type: CAUSAL_LM
633
+ trainable_token_indices: null
634
+ use_dora: false
635
+ use_qalora: false
636
+ use_rslora: false
637
+ per_device_eval_batch_size:
638
+ value: 8
639
+ per_device_train_batch_size:
640
+ value: 1
641
+ predict_with_generate:
642
+ value: false
643
+ prediction_loss_only:
644
+ value: false
645
+ problem_type:
646
+ value: null
647
+ project:
648
+ value: huggingface
649
+ push_to_hub:
650
+ value: false
651
+ ray_init_kwargs:
652
+ value: null
653
+ ray_num_workers:
654
+ value: 1
655
+ remove_unused_columns:
656
+ value: false
657
+ report_to:
658
+ value:
659
+ - wandb
660
+ restore_callback_states_from_checkpoint:
661
+ value: false
662
+ resume_from_checkpoint:
663
+ value: null
664
+ return_dict:
665
+ value: true
666
+ rms_norm_eps:
667
+ value: 1e-06
668
+ rope_parameters:
669
+ value:
670
+ rope_theta: 1000000
671
+ rope_type: default
672
+ run_name:
673
+ value: null
674
+ save_on_each_node:
675
+ value: false
676
+ save_only_model:
677
+ value: true
678
+ save_steps:
679
+ value: 1000
680
+ save_strategy:
681
+ value: steps
682
+ save_total_limit:
683
+ value: null
684
+ seed:
685
+ value: 42
686
+ skip_memory_metrics:
687
+ value: true
688
+ sliding_window:
689
+ value: null
690
+ sortish_sampler:
691
+ value: false
692
+ tf32:
693
+ value: null
694
+ tie_word_embeddings:
695
+ value: false
696
+ torch_compile:
697
+ value: false
698
+ torch_compile_backend:
699
+ value: null
700
+ torch_compile_mode:
701
+ value: null
702
+ torch_empty_cache_steps:
703
+ value: null
704
+ trackio_space_id:
705
+ value: trackio
706
+ transformers_version:
707
+ value: 5.0.0
708
+ use_cache:
709
+ value: false
710
+ use_cpu:
711
+ value: false
712
+ use_liger_kernel:
713
+ value: false
714
+ use_sliding_window:
715
+ value: false
716
+ vocab_size:
717
+ value: 151936
718
+ warmup_ratio:
719
+ value: 0.02
720
+ warmup_steps:
721
+ value: 0.02
722
+ weight_decay:
723
+ value: 0
LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/requirements.txt ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pytz==2025.2
2
+ pydub==0.25.1
3
+ brotli==1.2.0
4
+ antlr4-python3-runtime==4.9.3
5
+ xxhash==3.6.0
6
+ websockets==15.0.1
7
+ tzdata==2025.3
8
+ typing_extensions==4.15.0
9
+ tqdm==4.67.3
10
+ tomlkit==0.13.3
11
+ termcolor==3.3.0
12
+ shtab==1.8.0
13
+ shellingham==1.5.4
14
+ sentencepiece==0.2.1
15
+ semantic-version==2.10.0
16
+ safetensors==0.7.0
17
+ ruff==0.15.0
18
+ regex==2026.1.15
19
+ python-multipart==0.0.22
20
+ pyparsing==3.3.2
21
+ pyarrow==23.0.0
22
+ protobuf==6.33.5
23
+ propcache==0.4.1
24
+ orjson==3.11.7
25
+ omegaconf==2.3.0
26
+ numpy==2.4.2
27
+ multidict==6.7.1
28
+ mdurl==0.1.2
29
+ kiwisolver==1.4.9
30
+ hf-xet==1.2.0
31
+ hf_transfer==0.1.9
32
+ groovy==0.1.2
33
+ frozenlist==1.8.0
34
+ fonttools==4.61.1
35
+ ffmpy==1.0.0
36
+ einops==0.8.2
37
+ docstring_parser==0.17.0
38
+ dill==0.3.8
39
+ cycler==0.12.1
40
+ click==8.3.1
41
+ av==16.0.0
42
+ annotated-types==0.7.0
43
+ annotated-doc==0.0.4
44
+ aiohappyeyeballs==2.6.1
45
+ aiofiles==24.1.0
46
+ yarl==1.22.0
47
+ uvicorn==0.40.0
48
+ typing-inspection==0.4.2
49
+ typer-slim==0.21.1
50
+ tiktoken==0.12.0
51
+ scipy==1.17.0
52
+ pydantic_core==2.41.4
53
+ pandas==2.3.3
54
+ multiprocess==0.70.16
55
+ modelscope==1.34.0
56
+ markdown-it-py==4.0.0
57
+ fire==0.7.1
58
+ contourpy==1.3.3
59
+ anyio==4.12.1
60
+ aiosignal==1.4.0
61
+ starlette==0.50.0
62
+ rich==14.3.2
63
+ pydantic==2.12.3
64
+ matplotlib==3.10.8
65
+ aiohttp==3.13.3
66
+ tyro==0.8.14
67
+ typer==0.21.1
68
+ torchdata==0.11.0
69
+ sse-starlette==3.2.0
70
+ safehttpx==0.1.7
71
+ huggingface_hub==1.4.0
72
+ fastapi==0.128.1
73
+ tokenizers==0.22.2
74
+ gradio_client==1.14.0
75
+ datasets==4.0.0
76
+ accelerate==1.11.0
77
+ transformers==5.0.0
78
+ gradio==5.50.0
79
+ trl==0.24.0
80
+ peft==0.18.1
81
+ llamafactory==0.9.5.dev0
82
+ jieba==0.42.1
83
+ rouge-chinese==1.0.3
84
+ joblib==1.5.3
85
+ nltk==3.9.2
86
+ py-cpuinfo==9.0.0
87
+ nvidia-ml-py==13.590.48
88
+ hjson==3.1.0
89
+ ninja==1.13.0
90
+ msgpack==1.1.2
91
+ deepspeed==0.16.9
92
+ smmap==5.0.2
93
+ sentry-sdk==2.52.0
94
+ gitdb==4.0.12
95
+ GitPython==3.1.46
96
+ wandb==0.24.2
97
+ entrypoints==0.4
98
+ jupyter_client==7.4.9
99
+ nbclassic==1.1.0
100
+ notebook==6.5.5
101
+ pyzmq==24.0.1
102
+ PyYAML==6.0.2
103
+ Send2Trash==1.8.3
104
+ argon2-cffi==23.1.0
105
+ argon2-cffi-bindings==21.2.0
106
+ arrow==1.3.0
107
+ asttokens==2.4.1
108
+ async-lru==2.0.4
109
+ attrs==24.2.0
110
+ babel==2.16.0
111
+ beautifulsoup4==4.12.3
112
+ bleach==6.1.0
113
+ certifi==2024.8.30
114
+ cffi==1.17.1
115
+ charset-normalizer==3.3.2
116
+ comm==0.2.2
117
+ debugpy==1.8.5
118
+ decorator==5.1.1
119
+ defusedxml==0.7.1
120
+ executing==2.1.0
121
+ fastjsonschema==2.20.0
122
+ fqdn==1.5.1
123
+ h11==0.14.0
124
+ httpcore==1.0.5
125
+ httpx==0.27.2
126
+ idna==3.10
127
+ ipykernel==6.29.5
128
+ ipython==8.27.0
129
+ ipython-genutils==0.2.0
130
+ ipywidgets==8.1.5
131
+ isoduration==20.11.0
132
+ jedi==0.19.1
133
+ json5==0.9.25
134
+ jsonpointer==3.0.0
135
+ jsonschema==4.23.0
136
+ jsonschema-specifications==2023.12.1
137
+ jupyter-archive==3.4.0
138
+ jupyter_contrib_core==0.4.2
139
+ jupyter_contrib_nbextensions==0.7.0
140
+ jupyter_core==5.7.2
141
+ jupyter-events==0.10.0
142
+ jupyter-highlight-selected-word==0.2.0
143
+ jupyter-lsp==2.2.5
144
+ jupyter_nbextensions_configurator==0.6.4
145
+ jupyter_server==2.14.2
146
+ jupyter_server_terminals==0.5.3
147
+ jupyterlab==4.2.5
148
+ jupyterlab_pygments==0.3.0
149
+ jupyterlab_server==2.27.3
150
+ jupyterlab_widgets==3.0.13
151
+ lxml==5.3.0
152
+ matplotlib-inline==0.1.7
153
+ mistune==3.0.2
154
+ nbclient==0.10.0
155
+ nbconvert==7.16.4
156
+ nbformat==5.10.4
157
+ nest-asyncio==1.6.0
158
+ notebook_shim==0.2.4
159
+ overrides==7.7.0
160
+ packaging==24.1
161
+ pandocfilters==1.5.1
162
+ parso==0.8.4
163
+ pexpect==4.9.0
164
+ platformdirs==4.3.6
165
+ prometheus_client==0.21.0
166
+ prompt_toolkit==3.0.47
167
+ psutil==6.0.0
168
+ ptyprocess==0.7.0
169
+ pure_eval==0.2.3
170
+ pycparser==2.22
171
+ Pygments==2.18.0
172
+ python-dateutil==2.9.0.post0
173
+ python-json-logger==2.0.7
174
+ referencing==0.35.1
175
+ requests==2.32.3
176
+ rfc3339-validator==0.1.4
177
+ rfc3986-validator==0.1.1
178
+ rpds-py==0.20.0
179
+ sniffio==1.3.1
180
+ soupsieve==2.6
181
+ stack-data==0.6.3
182
+ terminado==0.18.1
183
+ tinycss2==1.3.0
184
+ tornado==6.4.1
185
+ traitlets==5.14.3
186
+ types-python-dateutil==2.9.0.20240906
187
+ uri-template==1.3.0
188
+ urllib3==2.2.3
189
+ wcwidth==0.2.13
190
+ webcolors==24.8.0
191
+ webencodings==0.5.1
192
+ websocket-client==1.8.0
193
+ widgetsnbextension==4.0.13
194
+ Jinja2==3.1.3
195
+ MarkupSafe==2.1.5
196
+ filelock==3.13.1
197
+ fsspec==2024.2.0
198
+ mpmath==1.3.0
199
+ networkx==3.2.1
200
+ nvidia-cublas-cu12==12.4.2.65
201
+ nvidia-cuda-cupti-cu12==12.4.99
202
+ nvidia-cuda-nvrtc-cu12==12.4.99
203
+ nvidia-cuda-runtime-cu12==12.4.99
204
+ nvidia-cudnn-cu12==9.1.0.70
205
+ nvidia-cufft-cu12==11.2.0.44
206
+ nvidia-curand-cu12==10.3.5.119
207
+ nvidia-cusolver-cu12==11.6.0.99
208
+ nvidia-cusparse-cu12==12.3.0.142
209
+ nvidia-nccl-cu12==2.20.5
210
+ nvidia-nvjitlink-cu12==12.4.99
211
+ nvidia-nvtx-cu12==12.4.99
212
+ pillow==10.2.0
213
+ sympy==1.12
214
+ torch==2.4.1+cu124
215
+ torchaudio==2.4.1+cu124
216
+ torchvision==0.19.1+cu124
217
+ triton==3.0.0
218
+ pip==24.2
219
+ setuptools==75.1.0
220
+ wheel==0.44.0
221
+ PyGObject==3.42.1
222
+ PyJWT==2.3.0
223
+ SecretStorage==3.3.1
224
+ blinker==1.4
225
+ cryptography==3.4.8
226
+ dbus-python==1.2.18
227
+ distro==1.7.0
228
+ httplib2==0.20.2
229
+ importlib-metadata==4.6.4
230
+ jeepney==0.7.1
231
+ keyring==23.5.0
232
+ launchpadlib==1.10.16
233
+ lazr.restfulclient==0.14.4
234
+ lazr.uri==1.0.6
235
+ more-itertools==8.10.0
236
+ oauthlib==3.2.0
237
+ python-apt==2.4.0+ubuntu4
238
+ six==1.16.0
239
+ wadllib==1.3.6
240
+ zipp==1.0.0
241
+ autocommand==2.2.2
242
+ backports.tarfile==1.2.0
243
+ importlib_metadata==8.0.0
244
+ importlib_resources==6.4.0
245
+ inflect==7.3.1
246
+ jaraco.collections==5.1.0
247
+ jaraco.context==5.3.0
248
+ jaraco.functools==4.0.1
249
+ jaraco.text==3.12.1
250
+ more-itertools==10.3.0
251
+ packaging==24.1
252
+ platformdirs==4.2.2
253
+ tomli==2.0.1
254
+ typeguard==4.3.0
255
+ typing_extensions==4.12.2
256
+ wheel==0.43.0
257
+ zipp==3.19.2
LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/wandb-metadata.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-52-generic-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.11.10",
4
+ "startedAt": "2026-02-05T07:40:20.662178Z",
5
+ "args": [
6
+ "/workspace/v127rc_exp1/E_mup.yaml"
7
+ ],
8
+ "program": "/usr/local/bin/llamafactory-cli",
9
+ "git": {
10
+ "remote": "https://github.com/hiyouga/LlamaFactory.git",
11
+ "commit": "1a02717fa84c270d1c156c4c4a391c2f95525a63"
12
+ },
13
+ "email": "markmochi200@gmail.com",
14
+ "root": "/workspace/LlamaFactory",
15
+ "host": "d74c94898fdd",
16
+ "executable": "/usr/bin/python",
17
+ "cpu_count": 16,
18
+ "cpu_count_logical": 32,
19
+ "gpu": "NVIDIA GeForce RTX 4090",
20
+ "gpu_count": 1,
21
+ "disk": {
22
+ "/": {
23
+ "total": "21474836480",
24
+ "used": "2479427584"
25
+ }
26
+ },
27
+ "memory": {
28
+ "total": "201668009984"
29
+ },
30
+ "gpu_nvidia": [
31
+ {
32
+ "name": "NVIDIA GeForce RTX 4090",
33
+ "memoryTotal": "25757220864",
34
+ "cudaCores": 16384,
35
+ "architecture": "Ada",
36
+ "uuid": "GPU-ceaf5bb7-0aeb-5000-346c-08e2d1143c3c"
37
+ }
38
+ ],
39
+ "cudaVersion": "12.7",
40
+ "writerId": "qggzf6e4qf9vgtbvttkcenmdoze5458m"
41
+ }
LlamaFactory/wandb/run-20260205_074020-s91zw8js/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"train/learning_rate":4.431618960687444e-15,"_timestamp":1.7704523558026636e+09,"train/train_tokens_per_second":1989.985,"train/epoch":5,"train/grad_norm":0.052400071173906326,"train/global_step":170255,"_step":170255,"train_samples_per_second":0.972,"train_loss":0.04851693114831647,"total_flos":1.5917349118373837e+19,"_runtime":175137,"train_runtime":175135.8338,"train/loss":0.007810490671545267,"train_steps_per_second":0.972,"train/num_input_tokens_seen":348511985,"_wandb":{"runtime":175137}}
LlamaFactory/wandb/run-20260205_074020-s91zw8js/logs/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-02-05T07:40:20.921871833Z","level":"INFO","msg":"stream: starting","core version":"0.24.2"}
2
+ {"time":"2026-02-05T07:40:21.276359489Z","level":"INFO","msg":"stream: created new stream","id":"s91zw8js"}
3
+ {"time":"2026-02-05T07:40:21.276953137Z","level":"INFO","msg":"handler: started","stream_id":"s91zw8js"}
4
+ {"time":"2026-02-05T07:40:21.279091459Z","level":"INFO","msg":"stream: started","id":"s91zw8js"}
5
+ {"time":"2026-02-05T07:40:21.279105706Z","level":"INFO","msg":"sender: started","stream_id":"s91zw8js"}
6
+ {"time":"2026-02-05T07:40:21.279100957Z","level":"INFO","msg":"writer: started","stream_id":"s91zw8js"}
7
+ {"time":"2026-02-06T18:48:06.891412554Z","level":"INFO","msg":"api: retrying HTTP error","status":502,"url":"https://api.wandb.ai/files/markmochi200-linksome-ai/llamafactory/s91zw8js/file_stream","body":"\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>502 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}
8
+ {"time":"2026-02-07T08:19:19.612096657Z","level":"INFO","msg":"stream: closing","id":"s91zw8js"}
9
+ {"time":"2026-02-07T08:19:22.254231428Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
10
+ {"time":"2026-02-07T08:19:22.492124193Z","level":"INFO","msg":"handler: closed","stream_id":"s91zw8js"}
11
+ {"time":"2026-02-07T08:19:22.495271717Z","level":"INFO","msg":"sender: closed","stream_id":"s91zw8js"}
12
+ {"time":"2026-02-07T08:19:22.495563792Z","level":"INFO","msg":"stream: closed","id":"s91zw8js"}
LlamaFactory/wandb/run-20260205_074020-s91zw8js/logs/debug.log ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-05 07:40:20,685 INFO MainThread:1062 [wandb_setup.py:_flush():81] Current SDK version is 0.24.2
2
+ 2026-02-05 07:40:20,685 INFO MainThread:1062 [wandb_setup.py:_flush():81] Configure stats pid to 1062
3
+ 2026-02-05 07:40:20,685 INFO MainThread:1062 [wandb_setup.py:_flush():81] Loading settings from environment variables
4
+ 2026-02-05 07:40:20,686 INFO MainThread:1062 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/LlamaFactory/wandb/run-20260205_074020-s91zw8js/logs/debug.log
5
+ 2026-02-05 07:40:20,687 INFO MainThread:1062 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/LlamaFactory/wandb/run-20260205_074020-s91zw8js/logs/debug-internal.log
6
+ 2026-02-05 07:40:20,688 INFO MainThread:1062 [wandb_init.py:init():844] calling init triggers
7
+ 2026-02-05 07:40:20,688 INFO MainThread:1062 [wandb_init.py:init():849] wandb.init called with sweep_config: {}
8
+ config: {'_wandb': {}}
9
+ 2026-02-05 07:40:20,689 INFO MainThread:1062 [wandb_init.py:init():892] starting backend
10
+ 2026-02-05 07:40:20,910 INFO MainThread:1062 [wandb_init.py:init():895] sending inform_init request
11
+ 2026-02-05 07:40:20,920 INFO MainThread:1062 [wandb_init.py:init():903] backend started and connected
12
+ 2026-02-05 07:40:20,922 INFO MainThread:1062 [wandb_init.py:init():973] updated telemetry
13
+ 2026-02-05 07:40:20,967 INFO MainThread:1062 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
14
+ 2026-02-05 07:40:21,655 INFO MainThread:1062 [wandb_init.py:init():1042] starting run threads in backend
15
+ 2026-02-05 07:40:21,731 INFO MainThread:1062 [wandb_run.py:_console_start():2529] atexit reg
16
+ 2026-02-05 07:40:21,732 INFO MainThread:1062 [wandb_run.py:_redirect():2377] redirect: wrap_raw
17
+ 2026-02-05 07:40:21,732 INFO MainThread:1062 [wandb_run.py:_redirect():2446] Wrapping output streams.
18
+ 2026-02-05 07:40:21,732 INFO MainThread:1062 [wandb_run.py:_redirect():2469] Redirects installed.
19
+ 2026-02-05 07:40:21,734 INFO MainThread:1062 [wandb_init.py:init():1082] run started, returning control to user process
20
+ 2026-02-05 07:40:21,735 INFO MainThread:1062 [wandb_run.py:_config_callback():1404] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.1', 'base_model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'revision': None, 'inference_mode': False, 'r': 16, 'target_modules': ['k_proj', 'down_proj', 'q_proj', 'up_proj', 'gate_proj', 'v_proj', 'o_proj'], 'exclude_modules': None, 'lora_alpha': 32, 'lora_dropout': 0.03, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 4096, 'intermediate_size': 12288, 'num_hidden_layers': 36, 'num_attention_heads': 32, 'use_sliding_window': False, 'sliding_window': None, 'max_window_layers': 36, 'num_key_value_heads': 8, 'head_dim': 128, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'attention_bias': False, 'attention_dropout': 0.0, 'layer_types': ['full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention'], 'pad_token_id': 151643, 'bos_token_id': None, 'eos_token_id': 151645, 'tie_word_embeddings': False, 'rope_parameters': {'rope_theta': 1000000, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'architectures': ['Qwen3ForCausalLM'], 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'problem_type': None, '_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'transformers_version': '5.0.0', 'model_type': 'qwen3', 'output_attentions': False, 'output_dir': '/workspace/v127rc_exp1/E_mup', 'do_train': True, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 8, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_epsilon': 1e-08, 'max_grad_norm': 1, 'num_train_epochs': 5, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.02, 'warmup_steps': 0.02, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': True, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': ['labels'], 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 180000000, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'all', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'sortish_sampler': False, 'predict_with_generate': False, 'generation_max_length': 2047, 'generation_num_beams': None, 'generation_config': None, 'ray_num_workers': 1, 'ray_init_kwargs': None, 'master_addr': None, 'master_port': None, 'fp8': False, 'fp8_backend': 'auto', 'fp8_enable_fsdp_float8_all_gather': False, 'overwrite_output_dir': False}
21
+ 2026-02-05 07:40:21,742 INFO MainThread:1062 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 8234382336 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7e4008ed12d0>>
22
+ 2026-02-05 07:40:21,742 INFO MainThread:1062 [wandb_run.py:_config_callback():1404] config_cb model/num_parameters 8234382336 None
23
+ 2026-02-05 07:40:21,744 INFO MainThread:1062 [wandb_run.py:_config_callback():1404] config_cb None None {'model_args': {'model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'adapter_name_or_path': None, 'adapter_folder': None, 'cache_dir': None, 'use_fast_tokenizer': True, 'resize_vocab': False, 'split_special_tokens': False, 'add_tokens': None, 'add_special_tokens': None, 'new_special_tokens_config': None, 'init_special_tokens': 'noise_init', 'model_revision': 'main', 'low_cpu_mem_usage': True, 'rope_scaling': None, 'flash_attn': 'auto', 'shift_attn': False, 'mixture_of_depths': None, 'use_unsloth': False, 'use_unsloth_gc': False, 'enable_liger_kernel': False, 'moe_aux_loss_coef': None, 'disable_gradient_checkpointing': False, 'use_reentrant_gc': True, 'upcast_layernorm': False, 'upcast_lmhead_output': False, 'train_from_scratch': False, 'infer_backend': 'HF', 'offload_folder': 'offload', 'use_kv_cache': True, 'use_v1_kernels': False, 'infer_dtype': 'auto', 'hf_hub_token': '<HF_HUB_TOKEN>', 'ms_hub_token': '<MS_HUB_TOKEN>', 'om_hub_token': '<OM_HUB_TOKEN>', 'print_param_status': False, 'trust_remote_code': True, 'quantization_method': 'BNB', 'quantization_bit': None, 'quantization_type': 'nf4', 'double_quantization': True, 'quantization_device_map': None, 'image_max_pixels': 589824, 'image_min_pixels': 1024, 'image_do_pan_and_scan': False, 'crop_to_patches': False, 'video_max_pixels': 65536, 'video_min_pixels': 256, 'video_fps': 2.0, 'video_maxlen': 128, 'use_audio_in_video': False, 'audio_sampling_rate': 16000, 'export_dir': None, 'export_size': 5, 'export_device': 'cpu', 'export_quantization_bit': None, 'export_quantization_dataset': None, 'export_quantization_nsamples': 128, 'export_quantization_maxlen': 1024, 'export_legacy_format': False, 'export_hub_model_id': None, 'use_kt': False, 'kt_optimize_rule': None, 'cpu_infer': 32, 'chunk_size': 8192, 'mode': 'normal', 'kt_maxlen': 4096, 'kt_use_cuda_graph': True, 'kt_mode': 'normal', 'kt_force_think': False, 'vllm_maxlen': 4096, 'vllm_gpu_util': 0.7, 'vllm_enforce_eager': False, 'vllm_max_lora_rank': 32, 'vllm_config': None, 'sglang_maxlen': 4096, 'sglang_mem_fraction': 0.7, 'sglang_tp_size': -1, 'sglang_config': None, 'sglang_lora_backend': 'triton', 'compute_dtype': 'torch.bfloat16', 'device_map': {'': 'cuda:0'}, 'model_max_length': 2047, 'block_diag_attn': False}, 'data_args': {'template': 'qwen3_nothink', 'dataset': ['Markie_Voss_t119_d85_r1'], 'eval_dataset': None, 'dataset_dir': '/workspace/LlamaFactory/data', 'media_dir': '/workspace/LlamaFactory/data', 'cutoff_len': 2047, 'train_on_prompt': False, 'mask_history': False, 'streaming': False, 'buffer_size': 16384, 'mix_strategy': 'concat', 'interleave_probs': None, 'overwrite_cache': False, 'preprocessing_batch_size': 1000, 'preprocessing_num_workers': 16, 'max_samples': 100000000, 'eval_num_beams': None, 'ignore_pad_token_for_loss': True, 'val_size': 0.0, 'eval_on_each_dataset': False, 'packing': True, 'neat_packing': False, 'tool_format': None, 'default_system': None, 'enable_thinking': False, 'tokenized_path': None, 'data_shared_file_system': False}, 'finetuning_args': {'freeze_trainable_layers': 2, 'freeze_trainable_modules': ['all'], 'freeze_extra_modules': None, 'additional_target': None, 'module_dropout': 0.0, 'oft_rank': 0, 'oft_block_size': 32, 'oft_target': ['all'], 'create_new_adapter': False, 'lora_alpha': 32, 'lora_dropout': 0.03, 'lora_rank': 16, 'lora_target': ['all'], 'loraplus_lr_ratio': None, 'loraplus_lr_embedding': 1e-06, 'use_rslora': False, 'use_dora': False, 'pissa_init': False, 'pissa_iter': 16, 'pissa_convert': False, 'pref_beta': 0.1, 'pref_ftx': 0.0, 'pref_bco_weight': 0.0, 'pref_loss': 'sigmoid', 'dpo_label_smoothing': 0.0, 'kto_chosen_weight': 1.0, 'kto_rejected_weight': 1.0, 'simpo_gamma': 0.5, 'ppo_buffer_size': 1, 'ppo_epochs': 4, 'ppo_score_norm': False, 'ppo_target': 6.0, 'ppo_whiten_rewards': False, 'ref_model': None, 'ref_model_adapters': None, 'ref_model_quantization_bit': None, 'reward_model': None, 'reward_model_adapters': None, 'reward_model_quantization_bit': None, 'reward_model_type': 'lora', 'ld_alpha': None, 'use_galore': False, 'galore_target': ['all'], 'galore_rank': 16, 'galore_update_interval': 200, 'galore_scale': 2.0, 'galore_proj_type': 'std', 'galore_layerwise': False, 'use_apollo': False, 'apollo_target': ['all'], 'apollo_rank': 16, 'apollo_update_interval': 200, 'apollo_scale': 32.0, 'apollo_proj': 'random', 'apollo_proj_type': 'std', 'apollo_scale_type': 'channel', 'apollo_layerwise': False, 'apollo_scale_front': False, 'use_badam': False, 'badam_mode': 'layer', 'badam_start_block': None, 'badam_switch_mode': 'ascending', 'badam_switch_interval': 50, 'badam_update_ratio': 0.05, 'badam_mask_mode': 'adjacent', 'badam_verbose': 0, 'use_swanlab': False, 'swanlab_project': 'llamafactory', 'swanlab_workspace': None, 'swanlab_run_name': None, 'swanlab_mode': 'cloud', 'swanlab_api_key': '<SWANLAB_API_KEY>', 'swanlab_logdir': None, 'swanlab_lark_webhook_url': None, 'swanlab_lark_secret': None, 'pure_bf16': False, 'stage': 'pt', 'finetuning_type': 'lora', 'use_llama_pro': False, 'use_adam_mini': False, 'use_mca': False, 'use_muon': False, 'use_dft_loss': False, 'use_eaft_loss': False, 'eaft_alpha': 1.0, 'freeze_vision_tower': True, 'freeze_multi_modal_projector': True, 'freeze_language_model': False, 'compute_accuracy': False, 'disable_shuffling': False, 'early_stopping_steps': None, 'plot_loss': True, 'include_effective_tokens_per_second': False}, 'generating_args': {'do_sample': True, 'temperature': 0.95, 'top_p': 0.7, 'top_k': 50, 'num_beams': 1, 'max_new_tokens': 1024, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'skip_special_tokens': True}}
24
+ 2026-02-07 08:19:19,611 INFO wandb-AsyncioManager-main:1062 [service_client.py:_forward_responses():94] Reached EOF.
25
+ 2026-02-07 08:19:19,612 INFO wandb-AsyncioManager-main:1062 [mailbox.py:close():154] Closing mailbox, abandoning 1 handles.
LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/config.yaml ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: /workspace/Qwen/Qwen3-8B-Base
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.24.2
6
+ e:
7
+ m15pzb5bw8fbrm1kn15f02cv2o8axrnb:
8
+ args:
9
+ - /workspace/v127rc_exp1/B_mup.yaml
10
+ cpu_count: 24
11
+ cpu_count_logical: 48
12
+ cudaVersion: "12.8"
13
+ disk:
14
+ /:
15
+ total: "21474836480"
16
+ used: "2414051328"
17
+ email: markmochi200@gmail.com
18
+ executable: /usr/bin/python
19
+ git:
20
+ commit: 1a02717fa84c270d1c156c4c4a391c2f95525a63
21
+ remote: https://github.com/hiyouga/LlamaFactory.git
22
+ gpu: NVIDIA GeForce RTX 4090
23
+ gpu_count: 1
24
+ gpu_nvidia:
25
+ - architecture: Ada
26
+ cudaCores: 16384
27
+ memoryTotal: "25757220864"
28
+ name: NVIDIA GeForce RTX 4090
29
+ uuid: GPU-64f7ee9c-3f46-4f01-74c0-f57a6e56968a
30
+ host: 540de1d49753
31
+ memory:
32
+ total: "270100414464"
33
+ os: Linux-6.8.0-78-generic-x86_64-with-glibc2.35
34
+ program: /usr/local/bin/llamafactory-cli
35
+ python: CPython 3.11.10
36
+ root: /workspace/LlamaFactory
37
+ startedAt: "2026-02-05T07:40:50.642512Z"
38
+ writerId: m15pzb5bw8fbrm1kn15f02cv2o8axrnb
39
+ m:
40
+ - "1": train/global_step
41
+ "6":
42
+ - 3
43
+ "7": []
44
+ - "2": '*'
45
+ "5": 1
46
+ "6":
47
+ - 1
48
+ "7": []
49
+ python_version: 3.11.10
50
+ t:
51
+ "1":
52
+ - 1
53
+ - 11
54
+ - 41
55
+ - 49
56
+ - 51
57
+ - 71
58
+ - 84
59
+ - 98
60
+ - 105
61
+ "2":
62
+ - 1
63
+ - 11
64
+ - 41
65
+ - 49
66
+ - 51
67
+ - 71
68
+ - 84
69
+ - 98
70
+ - 105
71
+ "3":
72
+ - 7
73
+ - 19
74
+ - 62
75
+ - 66
76
+ "4": 3.11.10
77
+ "5": 0.24.2
78
+ "6": 5.0.0
79
+ "9":
80
+ "1": transformers_trainer
81
+ "12": 0.24.2
82
+ "13": linux-x86_64
83
+ accelerator_config:
84
+ value:
85
+ dispatch_batches: null
86
+ even_batches: true
87
+ gradient_accumulation_kwargs: null
88
+ non_blocking: false
89
+ split_batches: false
90
+ use_seedable_sampler: true
91
+ adam_beta1:
92
+ value: 0.9
93
+ adam_beta2:
94
+ value: 0.95
95
+ adam_epsilon:
96
+ value: 1e-08
97
+ architectures:
98
+ value:
99
+ - Qwen3ForCausalLM
100
+ attention_bias:
101
+ value: false
102
+ attention_dropout:
103
+ value: 0
104
+ auto_find_batch_size:
105
+ value: false
106
+ average_tokens_across_devices:
107
+ value: true
108
+ batch_eval_metrics:
109
+ value: false
110
+ bf16:
111
+ value: true
112
+ bf16_full_eval:
113
+ value: false
114
+ bos_token_id:
115
+ value: null
116
+ chunk_size_feed_forward:
117
+ value: 0
118
+ data_args:
119
+ value:
120
+ buffer_size: 16384
121
+ cutoff_len: 2047
122
+ data_shared_file_system: false
123
+ dataset:
124
+ - Markie_Voss_t35_d286_r1
125
+ dataset_dir: /workspace/LlamaFactory/data
126
+ default_system: null
127
+ enable_thinking: false
128
+ eval_dataset: null
129
+ eval_num_beams: null
130
+ eval_on_each_dataset: false
131
+ ignore_pad_token_for_loss: true
132
+ interleave_probs: null
133
+ mask_history: false
134
+ max_samples: 100000000
135
+ media_dir: /workspace/LlamaFactory/data
136
+ mix_strategy: concat
137
+ neat_packing: false
138
+ overwrite_cache: false
139
+ packing: true
140
+ preprocessing_batch_size: 1000
141
+ preprocessing_num_workers: 16
142
+ streaming: false
143
+ template: qwen3_nothink
144
+ tokenized_path: null
145
+ tool_format: null
146
+ train_on_prompt: false
147
+ val_size: 0
148
+ data_seed:
149
+ value: null
150
+ dataloader_drop_last:
151
+ value: false
152
+ dataloader_num_workers:
153
+ value: 0
154
+ dataloader_persistent_workers:
155
+ value: false
156
+ dataloader_pin_memory:
157
+ value: true
158
+ dataloader_prefetch_factor:
159
+ value: null
160
+ ddp_backend:
161
+ value: null
162
+ ddp_broadcast_buffers:
163
+ value: null
164
+ ddp_bucket_cap_mb:
165
+ value: null
166
+ ddp_find_unused_parameters:
167
+ value: null
168
+ ddp_timeout:
169
+ value: 180000000
170
+ debug:
171
+ value: []
172
+ deepspeed:
173
+ value: null
174
+ disable_tqdm:
175
+ value: false
176
+ do_eval:
177
+ value: false
178
+ do_predict:
179
+ value: false
180
+ do_train:
181
+ value: true
182
+ dtype:
183
+ value: bfloat16
184
+ enable_jit_checkpoint:
185
+ value: false
186
+ eos_token_id:
187
+ value: 151645
188
+ eval_accumulation_steps:
189
+ value: null
190
+ eval_delay:
191
+ value: 0
192
+ eval_do_concat_batches:
193
+ value: true
194
+ eval_on_start:
195
+ value: false
196
+ eval_steps:
197
+ value: null
198
+ eval_strategy:
199
+ value: "no"
200
+ eval_use_gather_object:
201
+ value: false
202
+ finetuning_args:
203
+ value:
204
+ additional_target: null
205
+ apollo_layerwise: false
206
+ apollo_proj: random
207
+ apollo_proj_type: std
208
+ apollo_rank: 16
209
+ apollo_scale: 32
210
+ apollo_scale_front: false
211
+ apollo_scale_type: channel
212
+ apollo_target:
213
+ - all
214
+ apollo_update_interval: 200
215
+ badam_mask_mode: adjacent
216
+ badam_mode: layer
217
+ badam_start_block: null
218
+ badam_switch_interval: 50
219
+ badam_switch_mode: ascending
220
+ badam_update_ratio: 0.05
221
+ badam_verbose: 0
222
+ compute_accuracy: false
223
+ create_new_adapter: false
224
+ disable_shuffling: false
225
+ dpo_label_smoothing: 0
226
+ eaft_alpha: 1
227
+ early_stopping_steps: null
228
+ finetuning_type: lora
229
+ freeze_extra_modules: null
230
+ freeze_language_model: false
231
+ freeze_multi_modal_projector: true
232
+ freeze_trainable_layers: 2
233
+ freeze_trainable_modules:
234
+ - all
235
+ freeze_vision_tower: true
236
+ galore_layerwise: false
237
+ galore_proj_type: std
238
+ galore_rank: 16
239
+ galore_scale: 2
240
+ galore_target:
241
+ - all
242
+ galore_update_interval: 200
243
+ include_effective_tokens_per_second: false
244
+ kto_chosen_weight: 1
245
+ kto_rejected_weight: 1
246
+ ld_alpha: null
247
+ lora_alpha: 32
248
+ lora_dropout: 0.03
249
+ lora_rank: 16
250
+ lora_target:
251
+ - all
252
+ loraplus_lr_embedding: 1e-06
253
+ loraplus_lr_ratio: null
254
+ module_dropout: 0
255
+ oft_block_size: 32
256
+ oft_rank: 0
257
+ oft_target:
258
+ - all
259
+ pissa_convert: false
260
+ pissa_init: false
261
+ pissa_iter: 16
262
+ plot_loss: true
263
+ ppo_buffer_size: 1
264
+ ppo_epochs: 4
265
+ ppo_score_norm: false
266
+ ppo_target: 6
267
+ ppo_whiten_rewards: false
268
+ pref_bco_weight: 0
269
+ pref_beta: 0.1
270
+ pref_ftx: 0
271
+ pref_loss: sigmoid
272
+ pure_bf16: false
273
+ ref_model: null
274
+ ref_model_adapters: null
275
+ ref_model_quantization_bit: null
276
+ reward_model: null
277
+ reward_model_adapters: null
278
+ reward_model_quantization_bit: null
279
+ reward_model_type: lora
280
+ simpo_gamma: 0.5
281
+ stage: pt
282
+ swanlab_api_key: <SWANLAB_API_KEY>
283
+ swanlab_lark_secret: null
284
+ swanlab_lark_webhook_url: null
285
+ swanlab_logdir: null
286
+ swanlab_mode: cloud
287
+ swanlab_project: llamafactory
288
+ swanlab_run_name: null
289
+ swanlab_workspace: null
290
+ use_adam_mini: false
291
+ use_apollo: false
292
+ use_badam: false
293
+ use_dft_loss: false
294
+ use_dora: false
295
+ use_eaft_loss: false
296
+ use_galore: false
297
+ use_llama_pro: false
298
+ use_mca: false
299
+ use_muon: false
300
+ use_rslora: false
301
+ use_swanlab: false
302
+ fp8:
303
+ value: false
304
+ fp8_backend:
305
+ value: auto
306
+ fp8_enable_fsdp_float8_all_gather:
307
+ value: false
308
+ fp16:
309
+ value: false
310
+ fp16_full_eval:
311
+ value: false
312
+ fsdp:
313
+ value: []
314
+ fsdp_config:
315
+ value:
316
+ min_num_params: 0
317
+ xla: false
318
+ xla_fsdp_grad_ckpt: false
319
+ xla_fsdp_v2: false
320
+ full_determinism:
321
+ value: false
322
+ generating_args:
323
+ value:
324
+ do_sample: true
325
+ length_penalty: 1
326
+ max_new_tokens: 1024
327
+ num_beams: 1
328
+ repetition_penalty: 1
329
+ skip_special_tokens: true
330
+ temperature: 0.95
331
+ top_k: 50
332
+ top_p: 0.7
333
+ generation_config:
334
+ value: null
335
+ generation_max_length:
336
+ value: 2047
337
+ generation_num_beams:
338
+ value: null
339
+ gradient_accumulation_steps:
340
+ value: 1
341
+ gradient_checkpointing:
342
+ value: false
343
+ gradient_checkpointing_kwargs:
344
+ value: null
345
+ greater_is_better:
346
+ value: null
347
+ group_by_length:
348
+ value: false
349
+ head_dim:
350
+ value: 128
351
+ hidden_act:
352
+ value: silu
353
+ hidden_size:
354
+ value: 4096
355
+ hub_always_push:
356
+ value: false
357
+ hub_model_id:
358
+ value: null
359
+ hub_private_repo:
360
+ value: null
361
+ hub_revision:
362
+ value: null
363
+ hub_strategy:
364
+ value: every_save
365
+ hub_token:
366
+ value: <HUB_TOKEN>
367
+ id2label:
368
+ value:
369
+ "0": LABEL_0
370
+ "1": LABEL_1
371
+ ignore_data_skip:
372
+ value: false
373
+ include_for_metrics:
374
+ value: []
375
+ include_num_input_tokens_seen:
376
+ value: all
377
+ initializer_range:
378
+ value: 0.02
379
+ intermediate_size:
380
+ value: 12288
381
+ is_encoder_decoder:
382
+ value: false
383
+ label_names:
384
+ value:
385
+ - labels
386
+ label_smoothing_factor:
387
+ value: 0
388
+ label2id:
389
+ value:
390
+ LABEL_0: 0
391
+ LABEL_1: 1
392
+ layer_types:
393
+ value:
394
+ - full_attention
395
+ - full_attention
396
+ - full_attention
397
+ - full_attention
398
+ - full_attention
399
+ - full_attention
400
+ - full_attention
401
+ - full_attention
402
+ - full_attention
403
+ - full_attention
404
+ - full_attention
405
+ - full_attention
406
+ - full_attention
407
+ - full_attention
408
+ - full_attention
409
+ - full_attention
410
+ - full_attention
411
+ - full_attention
412
+ - full_attention
413
+ - full_attention
414
+ - full_attention
415
+ - full_attention
416
+ - full_attention
417
+ - full_attention
418
+ - full_attention
419
+ - full_attention
420
+ - full_attention
421
+ - full_attention
422
+ - full_attention
423
+ - full_attention
424
+ - full_attention
425
+ - full_attention
426
+ - full_attention
427
+ - full_attention
428
+ - full_attention
429
+ - full_attention
430
+ learning_rate:
431
+ value: 5e-05
432
+ length_column_name:
433
+ value: length
434
+ liger_kernel_config:
435
+ value: null
436
+ load_best_model_at_end:
437
+ value: false
438
+ local_rank:
439
+ value: -1
440
+ log_level:
441
+ value: passive
442
+ log_level_replica:
443
+ value: warning
444
+ log_on_each_node:
445
+ value: true
446
+ logging_dir:
447
+ value: null
448
+ logging_first_step:
449
+ value: false
450
+ logging_nan_inf_filter:
451
+ value: true
452
+ logging_steps:
453
+ value: 1
454
+ logging_strategy:
455
+ value: steps
456
+ lr_scheduler_kwargs:
457
+ value: null
458
+ lr_scheduler_type:
459
+ value: cosine
460
+ master_addr:
461
+ value: null
462
+ master_port:
463
+ value: null
464
+ max_grad_norm:
465
+ value: 1
466
+ max_position_embeddings:
467
+ value: 32768
468
+ max_steps:
469
+ value: -1
470
+ max_window_layers:
471
+ value: 36
472
+ metric_for_best_model:
473
+ value: null
474
+ model/num_parameters:
475
+ value: 8234382336
476
+ model_args:
477
+ value:
478
+ adapter_folder: null
479
+ adapter_name_or_path: null
480
+ add_special_tokens: null
481
+ add_tokens: null
482
+ audio_sampling_rate: 16000
483
+ block_diag_attn: false
484
+ cache_dir: null
485
+ chunk_size: 8192
486
+ compute_dtype: torch.bfloat16
487
+ cpu_infer: 32
488
+ crop_to_patches: false
489
+ device_map:
490
+ "": cuda:0
491
+ disable_gradient_checkpointing: false
492
+ double_quantization: true
493
+ enable_liger_kernel: false
494
+ export_device: cpu
495
+ export_dir: null
496
+ export_hub_model_id: null
497
+ export_legacy_format: false
498
+ export_quantization_bit: null
499
+ export_quantization_dataset: null
500
+ export_quantization_maxlen: 1024
501
+ export_quantization_nsamples: 128
502
+ export_size: 5
503
+ flash_attn: auto
504
+ hf_hub_token: <HF_HUB_TOKEN>
505
+ image_do_pan_and_scan: false
506
+ image_max_pixels: 589824
507
+ image_min_pixels: 1024
508
+ infer_backend: HF
509
+ infer_dtype: auto
510
+ init_special_tokens: noise_init
511
+ kt_force_think: false
512
+ kt_maxlen: 4096
513
+ kt_mode: normal
514
+ kt_optimize_rule: null
515
+ kt_use_cuda_graph: true
516
+ low_cpu_mem_usage: true
517
+ mixture_of_depths: null
518
+ mode: normal
519
+ model_max_length: 2047
520
+ model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
521
+ model_revision: main
522
+ moe_aux_loss_coef: null
523
+ ms_hub_token: <MS_HUB_TOKEN>
524
+ new_special_tokens_config: null
525
+ offload_folder: offload
526
+ om_hub_token: <OM_HUB_TOKEN>
527
+ print_param_status: false
528
+ quantization_bit: null
529
+ quantization_device_map: null
530
+ quantization_method: BNB
531
+ quantization_type: nf4
532
+ resize_vocab: false
533
+ rope_scaling: null
534
+ sglang_config: null
535
+ sglang_lora_backend: triton
536
+ sglang_maxlen: 4096
537
+ sglang_mem_fraction: 0.7
538
+ sglang_tp_size: -1
539
+ shift_attn: false
540
+ split_special_tokens: false
541
+ train_from_scratch: false
542
+ trust_remote_code: true
543
+ upcast_layernorm: false
544
+ upcast_lmhead_output: false
545
+ use_audio_in_video: false
546
+ use_fast_tokenizer: true
547
+ use_kt: false
548
+ use_kv_cache: true
549
+ use_reentrant_gc: true
550
+ use_unsloth: false
551
+ use_unsloth_gc: false
552
+ use_v1_kernels: false
553
+ video_fps: 2
554
+ video_max_pixels: 65536
555
+ video_maxlen: 128
556
+ video_min_pixels: 256
557
+ vllm_config: null
558
+ vllm_enforce_eager: false
559
+ vllm_gpu_util: 0.7
560
+ vllm_max_lora_rank: 32
561
+ vllm_maxlen: 4096
562
+ model_type:
563
+ value: qwen3
564
+ neftune_noise_alpha:
565
+ value: null
566
+ num_attention_heads:
567
+ value: 32
568
+ num_hidden_layers:
569
+ value: 36
570
+ num_key_value_heads:
571
+ value: 8
572
+ num_train_epochs:
573
+ value: 5
574
+ optim:
575
+ value: adamw_torch
576
+ optim_args:
577
+ value: null
578
+ optim_target_modules:
579
+ value: null
580
+ output_attentions:
581
+ value: false
582
+ output_dir:
583
+ value: /workspace/v127rc_exp1/B_mup
584
+ output_hidden_states:
585
+ value: false
586
+ overwrite_output_dir:
587
+ value: false
588
+ pad_token_id:
589
+ value: 151643
590
+ parallelism_config:
591
+ value: null
592
+ peft_config:
593
+ value:
594
+ default:
595
+ alora_invocation_tokens: null
596
+ arrow_config: null
597
+ auto_mapping: null
598
+ base_model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
599
+ bias: none
600
+ corda_config: null
601
+ ensure_weight_tying: false
602
+ eva_config: null
603
+ exclude_modules: null
604
+ fan_in_fan_out: false
605
+ inference_mode: false
606
+ init_lora_weights: true
607
+ layer_replication: null
608
+ layers_pattern: null
609
+ layers_to_transform: null
610
+ lora_alpha: 32
611
+ lora_bias: false
612
+ lora_dropout: 0.03
613
+ megatron_config: null
614
+ megatron_core: megatron.core
615
+ modules_to_save: null
616
+ peft_type: LORA
617
+ peft_version: 0.18.1
618
+ qalora_group_size: 16
619
+ r: 16
620
+ revision: null
621
+ runtime_config:
622
+ ephemeral_gpu_offload: false
623
+ target_modules:
624
+ - q_proj
625
+ - up_proj
626
+ - gate_proj
627
+ - down_proj
628
+ - k_proj
629
+ - v_proj
630
+ - o_proj
631
+ target_parameters: null
632
+ task_type: CAUSAL_LM
633
+ trainable_token_indices: null
634
+ use_dora: false
635
+ use_qalora: false
636
+ use_rslora: false
637
+ per_device_eval_batch_size:
638
+ value: 8
639
+ per_device_train_batch_size:
640
+ value: 1
641
+ predict_with_generate:
642
+ value: false
643
+ prediction_loss_only:
644
+ value: false
645
+ problem_type:
646
+ value: null
647
+ project:
648
+ value: huggingface
649
+ push_to_hub:
650
+ value: false
651
+ ray_init_kwargs:
652
+ value: null
653
+ ray_num_workers:
654
+ value: 1
655
+ remove_unused_columns:
656
+ value: false
657
+ report_to:
658
+ value:
659
+ - wandb
660
+ restore_callback_states_from_checkpoint:
661
+ value: false
662
+ resume_from_checkpoint:
663
+ value: null
664
+ return_dict:
665
+ value: true
666
+ rms_norm_eps:
667
+ value: 1e-06
668
+ rope_parameters:
669
+ value:
670
+ rope_theta: 1000000
671
+ rope_type: default
672
+ run_name:
673
+ value: null
674
+ save_on_each_node:
675
+ value: false
676
+ save_only_model:
677
+ value: true
678
+ save_steps:
679
+ value: 1000
680
+ save_strategy:
681
+ value: steps
682
+ save_total_limit:
683
+ value: null
684
+ seed:
685
+ value: 42
686
+ skip_memory_metrics:
687
+ value: true
688
+ sliding_window:
689
+ value: null
690
+ sortish_sampler:
691
+ value: false
692
+ tf32:
693
+ value: null
694
+ tie_word_embeddings:
695
+ value: false
696
+ torch_compile:
697
+ value: false
698
+ torch_compile_backend:
699
+ value: null
700
+ torch_compile_mode:
701
+ value: null
702
+ torch_empty_cache_steps:
703
+ value: null
704
+ trackio_space_id:
705
+ value: trackio
706
+ transformers_version:
707
+ value: 5.0.0
708
+ use_cache:
709
+ value: false
710
+ use_cpu:
711
+ value: false
712
+ use_liger_kernel:
713
+ value: false
714
+ use_sliding_window:
715
+ value: false
716
+ vocab_size:
717
+ value: 151936
718
+ warmup_ratio:
719
+ value: 0.02
720
+ warmup_steps:
721
+ value: 0.02
722
+ weight_decay:
723
+ value: 0
LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/requirements.txt ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pytz==2025.2
2
+ pydub==0.25.1
3
+ brotli==1.2.0
4
+ antlr4-python3-runtime==4.9.3
5
+ xxhash==3.6.0
6
+ websockets==15.0.1
7
+ tzdata==2025.3
8
+ typing_extensions==4.15.0
9
+ tqdm==4.67.3
10
+ tomlkit==0.13.3
11
+ termcolor==3.3.0
12
+ shtab==1.8.0
13
+ shellingham==1.5.4
14
+ sentencepiece==0.2.1
15
+ semantic-version==2.10.0
16
+ safetensors==0.7.0
17
+ ruff==0.15.0
18
+ regex==2026.1.15
19
+ python-multipart==0.0.22
20
+ pyparsing==3.3.2
21
+ pyarrow==23.0.0
22
+ protobuf==6.33.5
23
+ propcache==0.4.1
24
+ orjson==3.11.7
25
+ omegaconf==2.3.0
26
+ numpy==2.4.2
27
+ multidict==6.7.1
28
+ mdurl==0.1.2
29
+ kiwisolver==1.4.9
30
+ hf-xet==1.2.0
31
+ hf_transfer==0.1.9
32
+ groovy==0.1.2
33
+ frozenlist==1.8.0
34
+ fonttools==4.61.1
35
+ ffmpy==1.0.0
36
+ einops==0.8.2
37
+ docstring_parser==0.17.0
38
+ dill==0.3.8
39
+ cycler==0.12.1
40
+ click==8.3.1
41
+ av==16.0.0
42
+ annotated-types==0.7.0
43
+ annotated-doc==0.0.4
44
+ aiohappyeyeballs==2.6.1
45
+ aiofiles==24.1.0
46
+ yarl==1.22.0
47
+ uvicorn==0.40.0
48
+ typing-inspection==0.4.2
49
+ typer-slim==0.21.1
50
+ tiktoken==0.12.0
51
+ scipy==1.17.0
52
+ pydantic_core==2.41.4
53
+ pandas==2.3.3
54
+ multiprocess==0.70.16
55
+ modelscope==1.34.0
56
+ markdown-it-py==4.0.0
57
+ fire==0.7.1
58
+ contourpy==1.3.3
59
+ anyio==4.12.1
60
+ aiosignal==1.4.0
61
+ starlette==0.50.0
62
+ rich==14.3.2
63
+ pydantic==2.12.3
64
+ matplotlib==3.10.8
65
+ aiohttp==3.13.3
66
+ tyro==0.8.14
67
+ typer==0.21.1
68
+ torchdata==0.11.0
69
+ sse-starlette==3.2.0
70
+ safehttpx==0.1.7
71
+ huggingface_hub==1.4.0
72
+ fastapi==0.128.1
73
+ tokenizers==0.22.2
74
+ gradio_client==1.14.0
75
+ datasets==4.0.0
76
+ accelerate==1.11.0
77
+ transformers==5.0.0
78
+ gradio==5.50.0
79
+ trl==0.24.0
80
+ peft==0.18.1
81
+ llamafactory==0.9.5.dev0
82
+ jieba==0.42.1
83
+ rouge-chinese==1.0.3
84
+ joblib==1.5.3
85
+ nltk==3.9.2
86
+ py-cpuinfo==9.0.0
87
+ nvidia-ml-py==13.590.48
88
+ hjson==3.1.0
89
+ ninja==1.13.0
90
+ msgpack==1.1.2
91
+ deepspeed==0.16.9
92
+ smmap==5.0.2
93
+ sentry-sdk==2.52.0
94
+ gitdb==4.0.12
95
+ GitPython==3.1.46
96
+ wandb==0.24.2
97
+ entrypoints==0.4
98
+ jupyter_client==7.4.9
99
+ nbclassic==1.1.0
100
+ notebook==6.5.5
101
+ pyzmq==24.0.1
102
+ PyYAML==6.0.2
103
+ Send2Trash==1.8.3
104
+ argon2-cffi==23.1.0
105
+ argon2-cffi-bindings==21.2.0
106
+ arrow==1.3.0
107
+ asttokens==2.4.1
108
+ async-lru==2.0.4
109
+ attrs==24.2.0
110
+ babel==2.16.0
111
+ beautifulsoup4==4.12.3
112
+ bleach==6.1.0
113
+ certifi==2024.8.30
114
+ cffi==1.17.1
115
+ charset-normalizer==3.3.2
116
+ comm==0.2.2
117
+ debugpy==1.8.5
118
+ decorator==5.1.1
119
+ defusedxml==0.7.1
120
+ executing==2.1.0
121
+ fastjsonschema==2.20.0
122
+ fqdn==1.5.1
123
+ h11==0.14.0
124
+ httpcore==1.0.5
125
+ httpx==0.27.2
126
+ idna==3.10
127
+ ipykernel==6.29.5
128
+ ipython==8.27.0
129
+ ipython-genutils==0.2.0
130
+ ipywidgets==8.1.5
131
+ isoduration==20.11.0
132
+ jedi==0.19.1
133
+ json5==0.9.25
134
+ jsonpointer==3.0.0
135
+ jsonschema==4.23.0
136
+ jsonschema-specifications==2023.12.1
137
+ jupyter-archive==3.4.0
138
+ jupyter_contrib_core==0.4.2
139
+ jupyter_contrib_nbextensions==0.7.0
140
+ jupyter_core==5.7.2
141
+ jupyter-events==0.10.0
142
+ jupyter-highlight-selected-word==0.2.0
143
+ jupyter-lsp==2.2.5
144
+ jupyter_nbextensions_configurator==0.6.4
145
+ jupyter_server==2.14.2
146
+ jupyter_server_terminals==0.5.3
147
+ jupyterlab==4.2.5
148
+ jupyterlab_pygments==0.3.0
149
+ jupyterlab_server==2.27.3
150
+ jupyterlab_widgets==3.0.13
151
+ lxml==5.3.0
152
+ matplotlib-inline==0.1.7
153
+ mistune==3.0.2
154
+ nbclient==0.10.0
155
+ nbconvert==7.16.4
156
+ nbformat==5.10.4
157
+ nest-asyncio==1.6.0
158
+ notebook_shim==0.2.4
159
+ overrides==7.7.0
160
+ packaging==24.1
161
+ pandocfilters==1.5.1
162
+ parso==0.8.4
163
+ pexpect==4.9.0
164
+ platformdirs==4.3.6
165
+ prometheus_client==0.21.0
166
+ prompt_toolkit==3.0.47
167
+ psutil==6.0.0
168
+ ptyprocess==0.7.0
169
+ pure_eval==0.2.3
170
+ pycparser==2.22
171
+ Pygments==2.18.0
172
+ python-dateutil==2.9.0.post0
173
+ python-json-logger==2.0.7
174
+ referencing==0.35.1
175
+ requests==2.32.3
176
+ rfc3339-validator==0.1.4
177
+ rfc3986-validator==0.1.1
178
+ rpds-py==0.20.0
179
+ sniffio==1.3.1
180
+ soupsieve==2.6
181
+ stack-data==0.6.3
182
+ terminado==0.18.1
183
+ tinycss2==1.3.0
184
+ tornado==6.4.1
185
+ traitlets==5.14.3
186
+ types-python-dateutil==2.9.0.20240906
187
+ uri-template==1.3.0
188
+ urllib3==2.2.3
189
+ wcwidth==0.2.13
190
+ webcolors==24.8.0
191
+ webencodings==0.5.1
192
+ websocket-client==1.8.0
193
+ widgetsnbextension==4.0.13
194
+ Jinja2==3.1.3
195
+ MarkupSafe==2.1.5
196
+ filelock==3.13.1
197
+ fsspec==2024.2.0
198
+ mpmath==1.3.0
199
+ networkx==3.2.1
200
+ nvidia-cublas-cu12==12.4.2.65
201
+ nvidia-cuda-cupti-cu12==12.4.99
202
+ nvidia-cuda-nvrtc-cu12==12.4.99
203
+ nvidia-cuda-runtime-cu12==12.4.99
204
+ nvidia-cudnn-cu12==9.1.0.70
205
+ nvidia-cufft-cu12==11.2.0.44
206
+ nvidia-curand-cu12==10.3.5.119
207
+ nvidia-cusolver-cu12==11.6.0.99
208
+ nvidia-cusparse-cu12==12.3.0.142
209
+ nvidia-nccl-cu12==2.20.5
210
+ nvidia-nvjitlink-cu12==12.4.99
211
+ nvidia-nvtx-cu12==12.4.99
212
+ pillow==10.2.0
213
+ sympy==1.12
214
+ torch==2.4.1+cu124
215
+ torchaudio==2.4.1+cu124
216
+ torchvision==0.19.1+cu124
217
+ triton==3.0.0
218
+ pip==24.2
219
+ setuptools==75.1.0
220
+ wheel==0.44.0
221
+ PyGObject==3.42.1
222
+ PyJWT==2.3.0
223
+ SecretStorage==3.3.1
224
+ blinker==1.4
225
+ cryptography==3.4.8
226
+ dbus-python==1.2.18
227
+ distro==1.7.0
228
+ httplib2==0.20.2
229
+ importlib-metadata==4.6.4
230
+ jeepney==0.7.1
231
+ keyring==23.5.0
232
+ launchpadlib==1.10.16
233
+ lazr.restfulclient==0.14.4
234
+ lazr.uri==1.0.6
235
+ more-itertools==8.10.0
236
+ oauthlib==3.2.0
237
+ python-apt==2.4.0+ubuntu4
238
+ six==1.16.0
239
+ wadllib==1.3.6
240
+ zipp==1.0.0
241
+ autocommand==2.2.2
242
+ backports.tarfile==1.2.0
243
+ importlib_metadata==8.0.0
244
+ importlib_resources==6.4.0
245
+ inflect==7.3.1
246
+ jaraco.collections==5.1.0
247
+ jaraco.context==5.3.0
248
+ jaraco.functools==4.0.1
249
+ jaraco.text==3.12.1
250
+ more-itertools==10.3.0
251
+ packaging==24.1
252
+ platformdirs==4.2.2
253
+ tomli==2.0.1
254
+ typeguard==4.3.0
255
+ typing_extensions==4.12.2
256
+ wheel==0.43.0
257
+ zipp==3.19.2
LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/wandb-metadata.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-78-generic-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.11.10",
4
+ "startedAt": "2026-02-05T07:40:50.642512Z",
5
+ "args": [
6
+ "/workspace/v127rc_exp1/B_mup.yaml"
7
+ ],
8
+ "program": "/usr/local/bin/llamafactory-cli",
9
+ "git": {
10
+ "remote": "https://github.com/hiyouga/LlamaFactory.git",
11
+ "commit": "1a02717fa84c270d1c156c4c4a391c2f95525a63"
12
+ },
13
+ "email": "markmochi200@gmail.com",
14
+ "root": "/workspace/LlamaFactory",
15
+ "host": "540de1d49753",
16
+ "executable": "/usr/bin/python",
17
+ "cpu_count": 24,
18
+ "cpu_count_logical": 48,
19
+ "gpu": "NVIDIA GeForce RTX 4090",
20
+ "gpu_count": 1,
21
+ "disk": {
22
+ "/": {
23
+ "total": "21474836480",
24
+ "used": "2414051328"
25
+ }
26
+ },
27
+ "memory": {
28
+ "total": "270100414464"
29
+ },
30
+ "gpu_nvidia": [
31
+ {
32
+ "name": "NVIDIA GeForce RTX 4090",
33
+ "memoryTotal": "25757220864",
34
+ "cudaCores": 16384,
35
+ "architecture": "Ada",
36
+ "uuid": "GPU-64f7ee9c-3f46-4f01-74c0-f57a6e56968a"
37
+ }
38
+ ],
39
+ "cudaVersion": "12.8",
40
+ "writerId": "m15pzb5bw8fbrm1kn15f02cv2o8axrnb"
41
+ }
LlamaFactory/wandb/run-20260205_074050-63c40qxy/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"train_steps_per_second":0.942,"train/epoch":5,"train_loss":0.025983464172169894,"train/grad_norm":0.08169613033533096,"_runtime":161229,"train/num_input_tokens_seen":310857420,"train/learning_rate":5.570260919185444e-15,"train/train_tokens_per_second":1928.177,"train_runtime":161223.8419,"train_samples_per_second":0.942,"_timestamp":1.7704384740298996e+09,"total_flos":1.419757796902441e+19,"train/global_step":151860,"_wandb":{"runtime":161229},"train/loss":0.013144368305802345,"_step":151860}
LlamaFactory/wandb/run-20260205_074050-63c40qxy/logs/debug-internal.log ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-02-05T07:40:50.941923145Z","level":"INFO","msg":"stream: starting","core version":"0.24.2"}
2
+ {"time":"2026-02-05T07:40:51.29954609Z","level":"INFO","msg":"stream: created new stream","id":"63c40qxy"}
3
+ {"time":"2026-02-05T07:40:51.300347384Z","level":"INFO","msg":"handler: started","stream_id":"63c40qxy"}
4
+ {"time":"2026-02-05T07:40:51.3025576Z","level":"INFO","msg":"stream: started","id":"63c40qxy"}
5
+ {"time":"2026-02-05T07:40:51.302595101Z","level":"INFO","msg":"writer: started","stream_id":"63c40qxy"}
6
+ {"time":"2026-02-05T07:40:51.302604391Z","level":"INFO","msg":"sender: started","stream_id":"63c40qxy"}
7
+ {"time":"2026-02-06T20:26:51.942845343Z","level":"INFO","msg":"api: retrying HTTP error","status":502,"url":"https://api.wandb.ai/files/markmochi200-linksome-ai/llamafactory/63c40qxy/file_stream","body":"\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>502 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"}
8
+ {"time":"2026-02-07T04:28:00.95034651Z","level":"INFO","msg":"stream: closing","id":"63c40qxy"}
9
+ {"time":"2026-02-07T04:28:04.143895875Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
10
+ {"time":"2026-02-07T04:28:04.395109247Z","level":"INFO","msg":"handler: closed","stream_id":"63c40qxy"}
11
+ {"time":"2026-02-07T04:28:04.404712936Z","level":"INFO","msg":"sender: closed","stream_id":"63c40qxy"}
12
+ {"time":"2026-02-07T04:28:04.405386166Z","level":"INFO","msg":"stream: closed","id":"63c40qxy"}
LlamaFactory/wandb/run-20260205_074050-63c40qxy/logs/debug.log ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-05 07:40:50,676 INFO MainThread:476 [wandb_setup.py:_flush():81] Current SDK version is 0.24.2
2
+ 2026-02-05 07:40:50,677 INFO MainThread:476 [wandb_setup.py:_flush():81] Configure stats pid to 476
3
+ 2026-02-05 07:40:50,677 INFO MainThread:476 [wandb_setup.py:_flush():81] Loading settings from environment variables
4
+ 2026-02-05 07:40:50,678 INFO MainThread:476 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/LlamaFactory/wandb/run-20260205_074050-63c40qxy/logs/debug.log
5
+ 2026-02-05 07:40:50,679 INFO MainThread:476 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/LlamaFactory/wandb/run-20260205_074050-63c40qxy/logs/debug-internal.log
6
+ 2026-02-05 07:40:50,680 INFO MainThread:476 [wandb_init.py:init():844] calling init triggers
7
+ 2026-02-05 07:40:50,680 INFO MainThread:476 [wandb_init.py:init():849] wandb.init called with sweep_config: {}
8
+ config: {'_wandb': {}}
9
+ 2026-02-05 07:40:50,681 INFO MainThread:476 [wandb_init.py:init():892] starting backend
10
+ 2026-02-05 07:40:50,925 INFO MainThread:476 [wandb_init.py:init():895] sending inform_init request
11
+ 2026-02-05 07:40:50,936 INFO MainThread:476 [wandb_init.py:init():903] backend started and connected
12
+ 2026-02-05 07:40:50,941 INFO MainThread:476 [wandb_init.py:init():973] updated telemetry
13
+ 2026-02-05 07:40:51,025 INFO MainThread:476 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
14
+ 2026-02-05 07:40:51,583 INFO MainThread:476 [wandb_init.py:init():1042] starting run threads in backend
15
+ 2026-02-05 07:40:51,760 INFO MainThread:476 [wandb_run.py:_console_start():2529] atexit reg
16
+ 2026-02-05 07:40:51,772 INFO MainThread:476 [wandb_run.py:_redirect():2377] redirect: wrap_raw
17
+ 2026-02-05 07:40:51,775 INFO MainThread:476 [wandb_run.py:_redirect():2446] Wrapping output streams.
18
+ 2026-02-05 07:40:51,777 INFO MainThread:476 [wandb_run.py:_redirect():2469] Redirects installed.
19
+ 2026-02-05 07:40:51,781 INFO MainThread:476 [wandb_init.py:init():1082] run started, returning control to user process
20
+ 2026-02-05 07:40:51,783 INFO MainThread:476 [wandb_run.py:_config_callback():1404] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.1', 'base_model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'revision': None, 'inference_mode': False, 'r': 16, 'target_modules': ['q_proj', 'up_proj', 'gate_proj', 'down_proj', 'k_proj', 'v_proj', 'o_proj'], 'exclude_modules': None, 'lora_alpha': 32, 'lora_dropout': 0.03, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 4096, 'intermediate_size': 12288, 'num_hidden_layers': 36, 'num_attention_heads': 32, 'use_sliding_window': False, 'sliding_window': None, 'max_window_layers': 36, 'num_key_value_heads': 8, 'head_dim': 128, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'attention_bias': False, 'attention_dropout': 0.0, 'layer_types': ['full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention'], 'pad_token_id': 151643, 'bos_token_id': None, 'eos_token_id': 151645, 'tie_word_embeddings': False, 'rope_parameters': {'rope_theta': 1000000, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'architectures': ['Qwen3ForCausalLM'], 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'problem_type': None, '_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'transformers_version': '5.0.0', 'model_type': 'qwen3', 'output_attentions': False, 'output_dir': '/workspace/v127rc_exp1/B_mup', 'do_train': True, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 8, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_epsilon': 1e-08, 'max_grad_norm': 1, 'num_train_epochs': 5, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.02, 'warmup_steps': 0.02, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': True, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': ['labels'], 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 180000000, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'all', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'sortish_sampler': False, 'predict_with_generate': False, 'generation_max_length': 2047, 'generation_num_beams': None, 'generation_config': None, 'ray_num_workers': 1, 'ray_init_kwargs': None, 'master_addr': None, 'master_port': None, 'fp8': False, 'fp8_backend': 'auto', 'fp8_enable_fsdp_float8_all_gather': False, 'overwrite_output_dir': False}
21
+ 2026-02-05 07:40:51,797 INFO MainThread:476 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 8234382336 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x77d54498b790>>
22
+ 2026-02-05 07:40:51,798 INFO MainThread:476 [wandb_run.py:_config_callback():1404] config_cb model/num_parameters 8234382336 None
23
+ 2026-02-05 07:40:51,802 INFO MainThread:476 [wandb_run.py:_config_callback():1404] config_cb None None {'model_args': {'model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'adapter_name_or_path': None, 'adapter_folder': None, 'cache_dir': None, 'use_fast_tokenizer': True, 'resize_vocab': False, 'split_special_tokens': False, 'add_tokens': None, 'add_special_tokens': None, 'new_special_tokens_config': None, 'init_special_tokens': 'noise_init', 'model_revision': 'main', 'low_cpu_mem_usage': True, 'rope_scaling': None, 'flash_attn': 'auto', 'shift_attn': False, 'mixture_of_depths': None, 'use_unsloth': False, 'use_unsloth_gc': False, 'enable_liger_kernel': False, 'moe_aux_loss_coef': None, 'disable_gradient_checkpointing': False, 'use_reentrant_gc': True, 'upcast_layernorm': False, 'upcast_lmhead_output': False, 'train_from_scratch': False, 'infer_backend': 'HF', 'offload_folder': 'offload', 'use_kv_cache': True, 'use_v1_kernels': False, 'infer_dtype': 'auto', 'hf_hub_token': '<HF_HUB_TOKEN>', 'ms_hub_token': '<MS_HUB_TOKEN>', 'om_hub_token': '<OM_HUB_TOKEN>', 'print_param_status': False, 'trust_remote_code': True, 'quantization_method': 'BNB', 'quantization_bit': None, 'quantization_type': 'nf4', 'double_quantization': True, 'quantization_device_map': None, 'image_max_pixels': 589824, 'image_min_pixels': 1024, 'image_do_pan_and_scan': False, 'crop_to_patches': False, 'video_max_pixels': 65536, 'video_min_pixels': 256, 'video_fps': 2.0, 'video_maxlen': 128, 'use_audio_in_video': False, 'audio_sampling_rate': 16000, 'export_dir': None, 'export_size': 5, 'export_device': 'cpu', 'export_quantization_bit': None, 'export_quantization_dataset': None, 'export_quantization_nsamples': 128, 'export_quantization_maxlen': 1024, 'export_legacy_format': False, 'export_hub_model_id': None, 'use_kt': False, 'kt_optimize_rule': None, 'cpu_infer': 32, 'chunk_size': 8192, 'mode': 'normal', 'kt_maxlen': 4096, 'kt_use_cuda_graph': True, 'kt_mode': 'normal', 'kt_force_think': False, 'vllm_maxlen': 4096, 'vllm_gpu_util': 0.7, 'vllm_enforce_eager': False, 'vllm_max_lora_rank': 32, 'vllm_config': None, 'sglang_maxlen': 4096, 'sglang_mem_fraction': 0.7, 'sglang_tp_size': -1, 'sglang_config': None, 'sglang_lora_backend': 'triton', 'compute_dtype': 'torch.bfloat16', 'device_map': {'': 'cuda:0'}, 'model_max_length': 2047, 'block_diag_attn': False}, 'data_args': {'template': 'qwen3_nothink', 'dataset': ['Markie_Voss_t35_d286_r1'], 'eval_dataset': None, 'dataset_dir': '/workspace/LlamaFactory/data', 'media_dir': '/workspace/LlamaFactory/data', 'cutoff_len': 2047, 'train_on_prompt': False, 'mask_history': False, 'streaming': False, 'buffer_size': 16384, 'mix_strategy': 'concat', 'interleave_probs': None, 'overwrite_cache': False, 'preprocessing_batch_size': 1000, 'preprocessing_num_workers': 16, 'max_samples': 100000000, 'eval_num_beams': None, 'ignore_pad_token_for_loss': True, 'val_size': 0.0, 'eval_on_each_dataset': False, 'packing': True, 'neat_packing': False, 'tool_format': None, 'default_system': None, 'enable_thinking': False, 'tokenized_path': None, 'data_shared_file_system': False}, 'finetuning_args': {'freeze_trainable_layers': 2, 'freeze_trainable_modules': ['all'], 'freeze_extra_modules': None, 'additional_target': None, 'module_dropout': 0.0, 'oft_rank': 0, 'oft_block_size': 32, 'oft_target': ['all'], 'create_new_adapter': False, 'lora_alpha': 32, 'lora_dropout': 0.03, 'lora_rank': 16, 'lora_target': ['all'], 'loraplus_lr_ratio': None, 'loraplus_lr_embedding': 1e-06, 'use_rslora': False, 'use_dora': False, 'pissa_init': False, 'pissa_iter': 16, 'pissa_convert': False, 'pref_beta': 0.1, 'pref_ftx': 0.0, 'pref_bco_weight': 0.0, 'pref_loss': 'sigmoid', 'dpo_label_smoothing': 0.0, 'kto_chosen_weight': 1.0, 'kto_rejected_weight': 1.0, 'simpo_gamma': 0.5, 'ppo_buffer_size': 1, 'ppo_epochs': 4, 'ppo_score_norm': False, 'ppo_target': 6.0, 'ppo_whiten_rewards': False, 'ref_model': None, 'ref_model_adapters': None, 'ref_model_quantization_bit': None, 'reward_model': None, 'reward_model_adapters': None, 'reward_model_quantization_bit': None, 'reward_model_type': 'lora', 'ld_alpha': None, 'use_galore': False, 'galore_target': ['all'], 'galore_rank': 16, 'galore_update_interval': 200, 'galore_scale': 2.0, 'galore_proj_type': 'std', 'galore_layerwise': False, 'use_apollo': False, 'apollo_target': ['all'], 'apollo_rank': 16, 'apollo_update_interval': 200, 'apollo_scale': 32.0, 'apollo_proj': 'random', 'apollo_proj_type': 'std', 'apollo_scale_type': 'channel', 'apollo_layerwise': False, 'apollo_scale_front': False, 'use_badam': False, 'badam_mode': 'layer', 'badam_start_block': None, 'badam_switch_mode': 'ascending', 'badam_switch_interval': 50, 'badam_update_ratio': 0.05, 'badam_mask_mode': 'adjacent', 'badam_verbose': 0, 'use_swanlab': False, 'swanlab_project': 'llamafactory', 'swanlab_workspace': None, 'swanlab_run_name': None, 'swanlab_mode': 'cloud', 'swanlab_api_key': '<SWANLAB_API_KEY>', 'swanlab_logdir': None, 'swanlab_lark_webhook_url': None, 'swanlab_lark_secret': None, 'pure_bf16': False, 'stage': 'pt', 'finetuning_type': 'lora', 'use_llama_pro': False, 'use_adam_mini': False, 'use_mca': False, 'use_muon': False, 'use_dft_loss': False, 'use_eaft_loss': False, 'eaft_alpha': 1.0, 'freeze_vision_tower': True, 'freeze_multi_modal_projector': True, 'freeze_language_model': False, 'compute_accuracy': False, 'disable_shuffling': False, 'early_stopping_steps': None, 'plot_loss': True, 'include_effective_tokens_per_second': False}, 'generating_args': {'do_sample': True, 'temperature': 0.95, 'top_p': 0.7, 'top_k': 50, 'num_beams': 1, 'max_new_tokens': 1024, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'skip_special_tokens': True}}
24
+ 2026-02-07 04:28:00,950 INFO wandb-AsyncioManager-main:476 [service_client.py:_forward_responses():94] Reached EOF.
25
+ 2026-02-07 04:28:00,951 INFO wandb-AsyncioManager-main:476 [mailbox.py:close():154] Closing mailbox, abandoning 1 handles.
LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/config.yaml ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: /workspace/Qwen/Qwen3-8B-Base
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.24.2
6
+ e:
7
+ n349ejidifao2bsbyv1vqxhikiyr57gh:
8
+ args:
9
+ - /workspace/v127rc_exp2/B_dup.yaml
10
+ cpu_count: 16
11
+ cpu_count_logical: 32
12
+ cudaVersion: "12.8"
13
+ disk:
14
+ /:
15
+ total: "21474836480"
16
+ used: "2060386304"
17
+ email: markmochi200@gmail.com
18
+ executable: /usr/bin/python
19
+ git:
20
+ commit: 1a02717fa84c270d1c156c4c4a391c2f95525a63
21
+ remote: https://github.com/hiyouga/LlamaFactory.git
22
+ gpu: NVIDIA GeForce RTX 4090
23
+ gpu_count: 1
24
+ gpu_nvidia:
25
+ - architecture: Ada
26
+ cudaCores: 16384
27
+ memoryTotal: "25757220864"
28
+ name: NVIDIA GeForce RTX 4090
29
+ uuid: GPU-5c963f5e-1505-b56d-adf2-0ab9c4a32166
30
+ host: 7bc9cc925966
31
+ memory:
32
+ total: "134123229184"
33
+ os: Linux-6.8.0-79-generic-x86_64-with-glibc2.35
34
+ program: /usr/local/bin/llamafactory-cli
35
+ python: CPython 3.11.10
36
+ root: /workspace/LlamaFactory
37
+ startedAt: "2026-02-09T07:31:58.346206Z"
38
+ writerId: n349ejidifao2bsbyv1vqxhikiyr57gh
39
+ m:
40
+ - "1": train/global_step
41
+ "6":
42
+ - 3
43
+ "7": []
44
+ - "2": '*'
45
+ "5": 1
46
+ "6":
47
+ - 1
48
+ "7": []
49
+ python_version: 3.11.10
50
+ t:
51
+ "1":
52
+ - 1
53
+ - 11
54
+ - 41
55
+ - 49
56
+ - 51
57
+ - 71
58
+ - 84
59
+ - 98
60
+ - 105
61
+ "2":
62
+ - 1
63
+ - 11
64
+ - 41
65
+ - 49
66
+ - 51
67
+ - 71
68
+ - 84
69
+ - 98
70
+ - 105
71
+ "3":
72
+ - 7
73
+ - 19
74
+ - 62
75
+ - 66
76
+ "4": 3.11.10
77
+ "5": 0.24.2
78
+ "6": 5.0.0
79
+ "9":
80
+ "1": transformers_trainer
81
+ "12": 0.24.2
82
+ "13": linux-x86_64
83
+ accelerator_config:
84
+ value:
85
+ dispatch_batches: null
86
+ even_batches: true
87
+ gradient_accumulation_kwargs: null
88
+ non_blocking: false
89
+ split_batches: false
90
+ use_seedable_sampler: true
91
+ adam_beta1:
92
+ value: 0.9
93
+ adam_beta2:
94
+ value: 0.95
95
+ adam_epsilon:
96
+ value: 1e-08
97
+ architectures:
98
+ value:
99
+ - Qwen3ForCausalLM
100
+ attention_bias:
101
+ value: false
102
+ attention_dropout:
103
+ value: 0
104
+ auto_find_batch_size:
105
+ value: false
106
+ average_tokens_across_devices:
107
+ value: true
108
+ batch_eval_metrics:
109
+ value: false
110
+ bf16:
111
+ value: true
112
+ bf16_full_eval:
113
+ value: false
114
+ bos_token_id:
115
+ value: null
116
+ chunk_size_feed_forward:
117
+ value: 0
118
+ data_args:
119
+ value:
120
+ buffer_size: 16384
121
+ cutoff_len: 2047
122
+ data_shared_file_system: false
123
+ dataset:
124
+ - Markie_Voss_t0_d34_r300
125
+ dataset_dir: /workspace/LlamaFactory/data
126
+ default_system: null
127
+ enable_thinking: false
128
+ eval_dataset: null
129
+ eval_num_beams: null
130
+ eval_on_each_dataset: false
131
+ ignore_pad_token_for_loss: true
132
+ interleave_probs: null
133
+ mask_history: false
134
+ max_samples: 100000000
135
+ media_dir: /workspace/LlamaFactory/data
136
+ mix_strategy: concat
137
+ neat_packing: false
138
+ overwrite_cache: false
139
+ packing: true
140
+ preprocessing_batch_size: 1000
141
+ preprocessing_num_workers: 16
142
+ streaming: false
143
+ template: qwen3_nothink
144
+ tokenized_path: null
145
+ tool_format: null
146
+ train_on_prompt: false
147
+ val_size: 0
148
+ data_seed:
149
+ value: null
150
+ dataloader_drop_last:
151
+ value: false
152
+ dataloader_num_workers:
153
+ value: 0
154
+ dataloader_persistent_workers:
155
+ value: false
156
+ dataloader_pin_memory:
157
+ value: true
158
+ dataloader_prefetch_factor:
159
+ value: null
160
+ ddp_backend:
161
+ value: null
162
+ ddp_broadcast_buffers:
163
+ value: null
164
+ ddp_bucket_cap_mb:
165
+ value: null
166
+ ddp_find_unused_parameters:
167
+ value: null
168
+ ddp_timeout:
169
+ value: 180000000
170
+ debug:
171
+ value: []
172
+ deepspeed:
173
+ value: null
174
+ disable_tqdm:
175
+ value: false
176
+ do_eval:
177
+ value: false
178
+ do_predict:
179
+ value: false
180
+ do_train:
181
+ value: true
182
+ dtype:
183
+ value: bfloat16
184
+ enable_jit_checkpoint:
185
+ value: false
186
+ eos_token_id:
187
+ value: 151645
188
+ eval_accumulation_steps:
189
+ value: null
190
+ eval_delay:
191
+ value: 0
192
+ eval_do_concat_batches:
193
+ value: true
194
+ eval_on_start:
195
+ value: false
196
+ eval_steps:
197
+ value: null
198
+ eval_strategy:
199
+ value: "no"
200
+ eval_use_gather_object:
201
+ value: false
202
+ finetuning_args:
203
+ value:
204
+ additional_target: null
205
+ apollo_layerwise: false
206
+ apollo_proj: random
207
+ apollo_proj_type: std
208
+ apollo_rank: 16
209
+ apollo_scale: 32
210
+ apollo_scale_front: false
211
+ apollo_scale_type: channel
212
+ apollo_target:
213
+ - all
214
+ apollo_update_interval: 200
215
+ badam_mask_mode: adjacent
216
+ badam_mode: layer
217
+ badam_start_block: null
218
+ badam_switch_interval: 50
219
+ badam_switch_mode: ascending
220
+ badam_update_ratio: 0.05
221
+ badam_verbose: 0
222
+ compute_accuracy: false
223
+ create_new_adapter: false
224
+ disable_shuffling: false
225
+ dpo_label_smoothing: 0
226
+ eaft_alpha: 1
227
+ early_stopping_steps: null
228
+ finetuning_type: lora
229
+ freeze_extra_modules: null
230
+ freeze_language_model: false
231
+ freeze_multi_modal_projector: true
232
+ freeze_trainable_layers: 2
233
+ freeze_trainable_modules:
234
+ - all
235
+ freeze_vision_tower: true
236
+ galore_layerwise: false
237
+ galore_proj_type: std
238
+ galore_rank: 16
239
+ galore_scale: 2
240
+ galore_target:
241
+ - all
242
+ galore_update_interval: 200
243
+ include_effective_tokens_per_second: false
244
+ kto_chosen_weight: 1
245
+ kto_rejected_weight: 1
246
+ ld_alpha: null
247
+ lora_alpha: 32
248
+ lora_dropout: 0.03
249
+ lora_rank: 16
250
+ lora_target:
251
+ - all
252
+ loraplus_lr_embedding: 1e-06
253
+ loraplus_lr_ratio: null
254
+ module_dropout: 0
255
+ oft_block_size: 32
256
+ oft_rank: 0
257
+ oft_target:
258
+ - all
259
+ pissa_convert: false
260
+ pissa_init: false
261
+ pissa_iter: 16
262
+ plot_loss: true
263
+ ppo_buffer_size: 1
264
+ ppo_epochs: 4
265
+ ppo_score_norm: false
266
+ ppo_target: 6
267
+ ppo_whiten_rewards: false
268
+ pref_bco_weight: 0
269
+ pref_beta: 0.1
270
+ pref_ftx: 0
271
+ pref_loss: sigmoid
272
+ pure_bf16: false
273
+ ref_model: null
274
+ ref_model_adapters: null
275
+ ref_model_quantization_bit: null
276
+ reward_model: null
277
+ reward_model_adapters: null
278
+ reward_model_quantization_bit: null
279
+ reward_model_type: lora
280
+ simpo_gamma: 0.5
281
+ stage: pt
282
+ swanlab_api_key: <SWANLAB_API_KEY>
283
+ swanlab_lark_secret: null
284
+ swanlab_lark_webhook_url: null
285
+ swanlab_logdir: null
286
+ swanlab_mode: cloud
287
+ swanlab_project: llamafactory
288
+ swanlab_run_name: null
289
+ swanlab_workspace: null
290
+ use_adam_mini: false
291
+ use_apollo: false
292
+ use_badam: false
293
+ use_dft_loss: false
294
+ use_dora: false
295
+ use_eaft_loss: false
296
+ use_galore: false
297
+ use_llama_pro: false
298
+ use_mca: false
299
+ use_muon: false
300
+ use_rslora: false
301
+ use_swanlab: false
302
+ fp8:
303
+ value: false
304
+ fp8_backend:
305
+ value: auto
306
+ fp8_enable_fsdp_float8_all_gather:
307
+ value: false
308
+ fp16:
309
+ value: false
310
+ fp16_full_eval:
311
+ value: false
312
+ fsdp:
313
+ value: []
314
+ fsdp_config:
315
+ value:
316
+ min_num_params: 0
317
+ xla: false
318
+ xla_fsdp_grad_ckpt: false
319
+ xla_fsdp_v2: false
320
+ full_determinism:
321
+ value: false
322
+ generating_args:
323
+ value:
324
+ do_sample: true
325
+ length_penalty: 1
326
+ max_new_tokens: 1024
327
+ num_beams: 1
328
+ repetition_penalty: 1
329
+ skip_special_tokens: true
330
+ temperature: 0.95
331
+ top_k: 50
332
+ top_p: 0.7
333
+ generation_config:
334
+ value: null
335
+ generation_max_length:
336
+ value: 2047
337
+ generation_num_beams:
338
+ value: null
339
+ gradient_accumulation_steps:
340
+ value: 1
341
+ gradient_checkpointing:
342
+ value: false
343
+ gradient_checkpointing_kwargs:
344
+ value: null
345
+ greater_is_better:
346
+ value: null
347
+ group_by_length:
348
+ value: false
349
+ head_dim:
350
+ value: 128
351
+ hidden_act:
352
+ value: silu
353
+ hidden_size:
354
+ value: 4096
355
+ hub_always_push:
356
+ value: false
357
+ hub_model_id:
358
+ value: null
359
+ hub_private_repo:
360
+ value: null
361
+ hub_revision:
362
+ value: null
363
+ hub_strategy:
364
+ value: every_save
365
+ hub_token:
366
+ value: <HUB_TOKEN>
367
+ id2label:
368
+ value:
369
+ "0": LABEL_0
370
+ "1": LABEL_1
371
+ ignore_data_skip:
372
+ value: false
373
+ include_for_metrics:
374
+ value: []
375
+ include_num_input_tokens_seen:
376
+ value: all
377
+ initializer_range:
378
+ value: 0.02
379
+ intermediate_size:
380
+ value: 12288
381
+ is_encoder_decoder:
382
+ value: false
383
+ label_names:
384
+ value:
385
+ - labels
386
+ label_smoothing_factor:
387
+ value: 0
388
+ label2id:
389
+ value:
390
+ LABEL_0: 0
391
+ LABEL_1: 1
392
+ layer_types:
393
+ value:
394
+ - full_attention
395
+ - full_attention
396
+ - full_attention
397
+ - full_attention
398
+ - full_attention
399
+ - full_attention
400
+ - full_attention
401
+ - full_attention
402
+ - full_attention
403
+ - full_attention
404
+ - full_attention
405
+ - full_attention
406
+ - full_attention
407
+ - full_attention
408
+ - full_attention
409
+ - full_attention
410
+ - full_attention
411
+ - full_attention
412
+ - full_attention
413
+ - full_attention
414
+ - full_attention
415
+ - full_attention
416
+ - full_attention
417
+ - full_attention
418
+ - full_attention
419
+ - full_attention
420
+ - full_attention
421
+ - full_attention
422
+ - full_attention
423
+ - full_attention
424
+ - full_attention
425
+ - full_attention
426
+ - full_attention
427
+ - full_attention
428
+ - full_attention
429
+ - full_attention
430
+ learning_rate:
431
+ value: 5e-05
432
+ length_column_name:
433
+ value: length
434
+ liger_kernel_config:
435
+ value: null
436
+ load_best_model_at_end:
437
+ value: false
438
+ local_rank:
439
+ value: -1
440
+ log_level:
441
+ value: passive
442
+ log_level_replica:
443
+ value: warning
444
+ log_on_each_node:
445
+ value: true
446
+ logging_dir:
447
+ value: null
448
+ logging_first_step:
449
+ value: false
450
+ logging_nan_inf_filter:
451
+ value: true
452
+ logging_steps:
453
+ value: 1
454
+ logging_strategy:
455
+ value: steps
456
+ lr_scheduler_kwargs:
457
+ value: null
458
+ lr_scheduler_type:
459
+ value: cosine
460
+ master_addr:
461
+ value: null
462
+ master_port:
463
+ value: null
464
+ max_grad_norm:
465
+ value: 1
466
+ max_position_embeddings:
467
+ value: 32768
468
+ max_steps:
469
+ value: -1
470
+ max_window_layers:
471
+ value: 36
472
+ metric_for_best_model:
473
+ value: null
474
+ model/num_parameters:
475
+ value: 8234382336
476
+ model_args:
477
+ value:
478
+ adapter_folder: null
479
+ adapter_name_or_path: null
480
+ add_special_tokens: null
481
+ add_tokens: null
482
+ audio_sampling_rate: 16000
483
+ block_diag_attn: false
484
+ cache_dir: null
485
+ chunk_size: 8192
486
+ compute_dtype: torch.bfloat16
487
+ cpu_infer: 32
488
+ crop_to_patches: false
489
+ device_map:
490
+ "": cuda:0
491
+ disable_gradient_checkpointing: false
492
+ double_quantization: true
493
+ enable_liger_kernel: false
494
+ export_device: cpu
495
+ export_dir: null
496
+ export_hub_model_id: null
497
+ export_legacy_format: false
498
+ export_quantization_bit: null
499
+ export_quantization_dataset: null
500
+ export_quantization_maxlen: 1024
501
+ export_quantization_nsamples: 128
502
+ export_size: 5
503
+ flash_attn: auto
504
+ hf_hub_token: <HF_HUB_TOKEN>
505
+ image_do_pan_and_scan: false
506
+ image_max_pixels: 589824
507
+ image_min_pixels: 1024
508
+ infer_backend: HF
509
+ infer_dtype: auto
510
+ init_special_tokens: noise_init
511
+ kt_force_think: false
512
+ kt_maxlen: 4096
513
+ kt_mode: normal
514
+ kt_optimize_rule: null
515
+ kt_use_cuda_graph: true
516
+ low_cpu_mem_usage: true
517
+ mixture_of_depths: null
518
+ mode: normal
519
+ model_max_length: 2047
520
+ model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
521
+ model_revision: main
522
+ moe_aux_loss_coef: null
523
+ ms_hub_token: <MS_HUB_TOKEN>
524
+ new_special_tokens_config: null
525
+ offload_folder: offload
526
+ om_hub_token: <OM_HUB_TOKEN>
527
+ print_param_status: false
528
+ quantization_bit: null
529
+ quantization_device_map: null
530
+ quantization_method: BNB
531
+ quantization_type: nf4
532
+ resize_vocab: false
533
+ rope_scaling: null
534
+ sglang_config: null
535
+ sglang_lora_backend: triton
536
+ sglang_maxlen: 4096
537
+ sglang_mem_fraction: 0.7
538
+ sglang_tp_size: -1
539
+ shift_attn: false
540
+ split_special_tokens: false
541
+ train_from_scratch: false
542
+ trust_remote_code: true
543
+ upcast_layernorm: false
544
+ upcast_lmhead_output: false
545
+ use_audio_in_video: false
546
+ use_fast_tokenizer: true
547
+ use_kt: false
548
+ use_kv_cache: true
549
+ use_reentrant_gc: true
550
+ use_unsloth: false
551
+ use_unsloth_gc: false
552
+ use_v1_kernels: false
553
+ video_fps: 2
554
+ video_max_pixels: 65536
555
+ video_maxlen: 128
556
+ video_min_pixels: 256
557
+ vllm_config: null
558
+ vllm_enforce_eager: false
559
+ vllm_gpu_util: 0.7
560
+ vllm_max_lora_rank: 32
561
+ vllm_maxlen: 4096
562
+ model_type:
563
+ value: qwen3
564
+ neftune_noise_alpha:
565
+ value: null
566
+ num_attention_heads:
567
+ value: 32
568
+ num_hidden_layers:
569
+ value: 36
570
+ num_key_value_heads:
571
+ value: 8
572
+ num_train_epochs:
573
+ value: 10000
574
+ optim:
575
+ value: adamw_torch
576
+ optim_args:
577
+ value: null
578
+ optim_target_modules:
579
+ value: null
580
+ output_attentions:
581
+ value: false
582
+ output_dir:
583
+ value: /workspace/v127rc_exp2/B_dup
584
+ output_hidden_states:
585
+ value: false
586
+ overwrite_output_dir:
587
+ value: false
588
+ pad_token_id:
589
+ value: 151643
590
+ parallelism_config:
591
+ value: null
592
+ peft_config:
593
+ value:
594
+ default:
595
+ alora_invocation_tokens: null
596
+ arrow_config: null
597
+ auto_mapping: null
598
+ base_model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
599
+ bias: none
600
+ corda_config: null
601
+ ensure_weight_tying: false
602
+ eva_config: null
603
+ exclude_modules: null
604
+ fan_in_fan_out: false
605
+ inference_mode: false
606
+ init_lora_weights: true
607
+ layer_replication: null
608
+ layers_pattern: null
609
+ layers_to_transform: null
610
+ lora_alpha: 32
611
+ lora_bias: false
612
+ lora_dropout: 0.03
613
+ megatron_config: null
614
+ megatron_core: megatron.core
615
+ modules_to_save: null
616
+ peft_type: LORA
617
+ peft_version: 0.18.1
618
+ qalora_group_size: 16
619
+ r: 16
620
+ revision: null
621
+ runtime_config:
622
+ ephemeral_gpu_offload: false
623
+ target_modules:
624
+ - v_proj
625
+ - up_proj
626
+ - o_proj
627
+ - k_proj
628
+ - down_proj
629
+ - q_proj
630
+ - gate_proj
631
+ target_parameters: null
632
+ task_type: CAUSAL_LM
633
+ trainable_token_indices: null
634
+ use_dora: false
635
+ use_qalora: false
636
+ use_rslora: false
637
+ per_device_eval_batch_size:
638
+ value: 8
639
+ per_device_train_batch_size:
640
+ value: 1
641
+ predict_with_generate:
642
+ value: false
643
+ prediction_loss_only:
644
+ value: false
645
+ problem_type:
646
+ value: null
647
+ project:
648
+ value: huggingface
649
+ push_to_hub:
650
+ value: false
651
+ ray_init_kwargs:
652
+ value: null
653
+ ray_num_workers:
654
+ value: 1
655
+ remove_unused_columns:
656
+ value: false
657
+ report_to:
658
+ value:
659
+ - wandb
660
+ restore_callback_states_from_checkpoint:
661
+ value: false
662
+ resume_from_checkpoint:
663
+ value: null
664
+ return_dict:
665
+ value: true
666
+ rms_norm_eps:
667
+ value: 1e-06
668
+ rope_parameters:
669
+ value:
670
+ rope_theta: 1000000
671
+ rope_type: default
672
+ run_name:
673
+ value: null
674
+ save_on_each_node:
675
+ value: false
676
+ save_only_model:
677
+ value: true
678
+ save_steps:
679
+ value: 1000
680
+ save_strategy:
681
+ value: steps
682
+ save_total_limit:
683
+ value: null
684
+ seed:
685
+ value: 42
686
+ skip_memory_metrics:
687
+ value: true
688
+ sliding_window:
689
+ value: null
690
+ sortish_sampler:
691
+ value: false
692
+ tf32:
693
+ value: null
694
+ tie_word_embeddings:
695
+ value: false
696
+ torch_compile:
697
+ value: false
698
+ torch_compile_backend:
699
+ value: null
700
+ torch_compile_mode:
701
+ value: null
702
+ torch_empty_cache_steps:
703
+ value: null
704
+ trackio_space_id:
705
+ value: trackio
706
+ transformers_version:
707
+ value: 5.0.0
708
+ use_cache:
709
+ value: false
710
+ use_cpu:
711
+ value: false
712
+ use_liger_kernel:
713
+ value: false
714
+ use_sliding_window:
715
+ value: false
716
+ vocab_size:
717
+ value: 151936
718
+ warmup_ratio:
719
+ value: 0.02
720
+ warmup_steps:
721
+ value: 0.02
722
+ weight_decay:
723
+ value: 0
LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/output.log ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 0%| | 0/111360000 [00:00<?, ?it/s]/usr/local/lib/python3.11/dist-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
2
+ with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
3
+
4
+ {'loss': '1.197', 'grad_norm': '0.2202', 'learning_rate': '0', 'epoch': '8.98e-05', 'num_input_tokens_seen': 2047, 'train_runtime': '2.785', 'train_tokens_per_second': '735.1'}
5
+ {'loss': '0.8941', 'grad_norm': '0.2464', 'learning_rate': '2.245e-11', 'epoch': '0.0001796', 'num_input_tokens_seen': 4094, 'train_runtime': '3.804', 'train_tokens_per_second': '1076'}
6
+ {'loss': '1.578', 'grad_norm': '0.276', 'learning_rate': '4.49e-11', 'epoch': '0.0002694', 'num_input_tokens_seen': 6141, 'train_runtime': '4.825', 'train_tokens_per_second': '1273'}
7
+ {'loss': '1.147', 'grad_norm': '0.2449', 'learning_rate': '6.735e-11', 'epoch': '0.0003592', 'num_input_tokens_seen': 8188, 'train_runtime': '5.847', 'train_tokens_per_second': '1400'}
8
+ {'loss': '1.177', 'grad_norm': '0.2389', 'learning_rate': '8.98e-11', 'epoch': '0.000449', 'num_input_tokens_seen': 10235, 'train_runtime': '6.868', 'train_tokens_per_second': '1490'}
9
+ {'loss': '0.8562', 'grad_norm': '0.2245', 'learning_rate': '1.122e-10', 'epoch': '0.0005388', 'num_input_tokens_seen': 12282, 'train_runtime': '7.888', 'train_tokens_per_second': '1557'}
10
+ {'loss': '0.8777', 'grad_norm': '0.2269', 'learning_rate': '1.347e-10', 'epoch': '0.0006286', 'num_input_tokens_seen': 14329, 'train_runtime': '8.908', 'train_tokens_per_second': '1609'}
11
+ {'loss': '1.979', 'grad_norm': '0.3048', 'learning_rate': '1.571e-10', 'epoch': '0.0007184', 'num_input_tokens_seen': 16376, 'train_runtime': '9.929', 'train_tokens_per_second': '1649'}
12
+ {'loss': '2.057', 'grad_norm': '0.4388', 'learning_rate': '1.796e-10', 'epoch': '0.0008082', 'num_input_tokens_seen': 18423, 'train_runtime': '10.95', 'train_tokens_per_second': '1683'}
13
+ {'loss': '1.456', 'grad_norm': '0.2451', 'learning_rate': '2.02e-10', 'epoch': '0.000898', 'num_input_tokens_seen': 20470, 'train_runtime': '11.97', 'train_tokens_per_second': '1710'}
14
+ {'loss': '1.288', 'grad_norm': '0.2915', 'learning_rate': '2.245e-10', 'epoch': '0.0009878', 'num_input_tokens_seen': 22517, 'train_runtime': '12.99', 'train_tokens_per_second': '1734'}
15
+ {'loss': '1.279', 'grad_norm': '0.2925', 'learning_rate': '2.469e-10', 'epoch': '0.001078', 'num_input_tokens_seen': 24564, 'train_runtime': '14.01', 'train_tokens_per_second': '1753'}
16
+ {'loss': '0.8454', 'grad_norm': '0.2334', 'learning_rate': '2.694e-10', 'epoch': '0.001167', 'num_input_tokens_seen': 26611, 'train_runtime': '15.03', 'train_tokens_per_second': '1770'}
17
+ {'loss': '2.022', 'grad_norm': '0.3623', 'learning_rate': '2.918e-10', 'epoch': '0.001257', 'num_input_tokens_seen': 28658, 'train_runtime': '16.05', 'train_tokens_per_second': '1785'}
18
+ {'loss': '0.9637', 'grad_norm': '0.2062', 'learning_rate': '3.143e-10', 'epoch': '0.001347', 'num_input_tokens_seen': 30705, 'train_runtime': '17.08', 'train_tokens_per_second': '1798'}
19
+ {'loss': '1.163', 'grad_norm': '0.2303', 'learning_rate': '3.367e-10', 'epoch': '0.001437', 'num_input_tokens_seen': 32752, 'train_runtime': '18.1', 'train_tokens_per_second': '1809'}
20
+ {'loss': '1.264', 'grad_norm': '0.2524', 'learning_rate': '3.592e-10', 'epoch': '0.001527', 'num_input_tokens_seen': 34799, 'train_runtime': '19.12', 'train_tokens_per_second': '1820'}
21
+ {'loss': '0.9085', 'grad_norm': '0.2419', 'learning_rate': '3.816e-10', 'epoch': '0.001616', 'num_input_tokens_seen': 36846, 'train_runtime': '20.14', 'train_tokens_per_second': '1829'}
22
+ {'loss': '1.177', 'grad_norm': '0.2252', 'learning_rate': '4.041e-10', 'epoch': '0.001706', 'num_input_tokens_seen': 38893, 'train_runtime': '21.17', 'train_tokens_per_second': '1838'}
23
+ {'loss': '1.28', 'grad_norm': '0.2416', 'learning_rate': '4.265e-10', 'epoch': '0.001796', 'num_input_tokens_seen': 40940, 'train_runtime': '22.19', 'train_tokens_per_second': '1845'}
24
+ {'loss': '1.876', 'grad_norm': '0.2939', 'learning_rate': '4.49e-10', 'epoch': '0.001886', 'num_input_tokens_seen': 42987, 'train_runtime': '23.21', 'train_tokens_per_second': '1852'}
25
+ {'loss': '0.9678', 'grad_norm': '0.2171', 'learning_rate': '4.714e-10', 'epoch': '0.001976', 'num_input_tokens_seen': 45034, 'train_runtime': '24.23', 'train_tokens_per_second': '1858'}
26
+ {'loss': '1.523', 'grad_norm': '0.27', 'learning_rate': '4.939e-10', 'epoch': '0.002065', 'num_input_tokens_seen': 47081, 'train_runtime': '25.25', 'train_tokens_per_second': '1864'}
27
+ {'loss': '0.8785', 'grad_norm': '0.2627', 'learning_rate': '5.163e-10', 'epoch': '0.002155', 'num_input_tokens_seen': 49128, 'train_runtime': '26.28', 'train_tokens_per_second': '1870'}
28
+ {'loss': '1.577', 'grad_norm': '0.2754', 'learning_rate': '5.388e-10', 'epoch': '0.002245', 'num_input_tokens_seen': 51175, 'train_runtime': '27.3', 'train_tokens_per_second': '1874'}
29
+ {'loss': '0.8777', 'grad_norm': '0.2231', 'learning_rate': '5.612e-10', 'epoch': '0.002335', 'num_input_tokens_seen': 53222, 'train_runtime': '28.33', 'train_tokens_per_second': '1879'}
30
+ {'loss': '1.505', 'grad_norm': '0.2939', 'learning_rate': '5.837e-10', 'epoch': '0.002425', 'num_input_tokens_seen': 55269, 'train_runtime': '29.35', 'train_tokens_per_second': '1883'}
31
+ {'loss': '1.254', 'grad_norm': '0.2646', 'learning_rate': '6.061e-10', 'epoch': '0.002514', 'num_input_tokens_seen': 57316, 'train_runtime': '30.37', 'train_tokens_per_second': '1887'}
32
+ {'loss': '1.6', 'grad_norm': '0.2925', 'learning_rate': '6.286e-10', 'epoch': '0.002604', 'num_input_tokens_seen': 59363, 'train_runtime': '31.4', 'train_tokens_per_second': '1890'}
33
+ {'loss': '1.529', 'grad_norm': '0.2785', 'learning_rate': '6.51e-10', 'epoch': '0.002694', 'num_input_tokens_seen': 61410, 'train_runtime': '32.43', 'train_tokens_per_second': '1894'}
34
+ {'loss': '1.289', 'grad_norm': '0.2849', 'learning_rate': '6.735e-10', 'epoch': '0.002784', 'num_input_tokens_seen': 63457, 'train_runtime': '33.45', 'train_tokens_per_second': '1897'}
35
+ {'loss': '1.434', 'grad_norm': '0.239', 'learning_rate': '6.959e-10', 'epoch': '0.002874', 'num_input_tokens_seen': 65504, 'train_runtime': '34.47', 'train_tokens_per_second': '1900'}
36
+ {'loss': '1.93', 'grad_norm': '0.3212', 'learning_rate': '7.184e-10', 'epoch': '0.002963', 'num_input_tokens_seen': 67551, 'train_runtime': '35.51', 'train_tokens_per_second': '1903'}
37
+ {'loss': '1.003', 'grad_norm': '0.2553', 'learning_rate': '7.408e-10', 'epoch': '0.003053', 'num_input_tokens_seen': 69598, 'train_runtime': '36.53', 'train_tokens_per_second': '1905'}
38
+ {'loss': '1.746', 'grad_norm': '0.2927', 'learning_rate': '7.633e-10', 'epoch': '0.003143', 'num_input_tokens_seen': 71645, 'train_runtime': '37.55', 'train_tokens_per_second': '1908'}
39
+ {'loss': '1.205', 'grad_norm': '0.265', 'learning_rate': '7.857e-10', 'epoch': '0.003233', 'num_input_tokens_seen': 73692, 'train_runtime': '38.57', 'train_tokens_per_second': '1910'}
40
+ {'loss': '1.239', 'grad_norm': '0.2803', 'learning_rate': '8.082e-10', 'epoch': '0.003323', 'num_input_tokens_seen': 75739, 'train_runtime': '39.6', 'train_tokens_per_second': '1912'}
41
+ {'loss': '1.28', 'grad_norm': '0.2506', 'learning_rate': '8.306e-10', 'epoch': '0.003412', 'num_input_tokens_seen': 77786, 'train_runtime': '40.63', 'train_tokens_per_second': '1915'}
42
+ {'loss': '1.88', 'grad_norm': '0.2964', 'learning_rate': '8.531e-10', 'epoch': '0.003502', 'num_input_tokens_seen': 79833, 'train_runtime': '41.65', 'train_tokens_per_second': '1917'}
43
+ {'loss': '1.558', 'grad_norm': '0.2695', 'learning_rate': '8.755e-10', 'epoch': '0.003592', 'num_input_tokens_seen': 81880, 'train_runtime': '42.67', 'train_tokens_per_second': '1919'}
44
+ {'loss': '1.182', 'grad_norm': '0.2552', 'learning_rate': '8.98e-10', 'epoch': '0.003682', 'num_input_tokens_seen': 83927, 'train_runtime': '43.7', 'train_tokens_per_second': '1920'}
45
+ {'loss': '1.167', 'grad_norm': '0.2584', 'learning_rate': '9.204e-10', 'epoch': '0.003772', 'num_input_tokens_seen': 85974, 'train_runtime': '44.73', 'train_tokens_per_second': '1922'}
46
+ {'loss': '2.291', 'grad_norm': '0.3307', 'learning_rate': '9.429e-10', 'epoch': '0.003861', 'num_input_tokens_seen': 88021, 'train_runtime': '45.75', 'train_tokens_per_second': '1924'}
47
+ {'loss': '1.287', 'grad_norm': '0.2455', 'learning_rate': '9.653e-10', 'epoch': '0.003951', 'num_input_tokens_seen': 90068, 'train_runtime': '46.78', 'train_tokens_per_second': '1925'}
48
+ {'loss': '1.574', 'grad_norm': '0.2717', 'learning_rate': '9.878e-10', 'epoch': '0.004041', 'num_input_tokens_seen': 92115, 'train_runtime': '47.8', 'train_tokens_per_second': '1927'}
49
+ {'loss': '1.246', 'grad_norm': '0.8396', 'learning_rate': '1.01e-09', 'epoch': '0.004131', 'num_input_tokens_seen': 94162, 'train_runtime': '48.83', 'train_tokens_per_second': '1928'}
50
+ {'loss': '1.322', 'grad_norm': '0.2655', 'learning_rate': '1.033e-09', 'epoch': '0.004221', 'num_input_tokens_seen': 96209, 'train_runtime': '49.85', 'train_tokens_per_second': '1930'}
51
+ {'loss': '0.9961', 'grad_norm': '0.2129', 'learning_rate': '1.055e-09', 'epoch': '0.00431', 'num_input_tokens_seen': 98256, 'train_runtime': '50.88', 'train_tokens_per_second': '1931'}
52
+ {'loss': '2.288', 'grad_norm': '0.3014', 'learning_rate': '1.078e-09', 'epoch': '0.0044', 'num_input_tokens_seen': 100303, 'train_runtime': '51.9', 'train_tokens_per_second': '1933'}
53
+ {'loss': '2.249', 'grad_norm': '0.3105', 'learning_rate': '1.1e-09', 'epoch': '0.00449', 'num_input_tokens_seen': 102350, 'train_runtime': '52.93', 'train_tokens_per_second': '1934'}
54
+ {'loss': '1.755', 'grad_norm': '0.6081', 'learning_rate': '1.122e-09', 'epoch': '0.00458', 'num_input_tokens_seen': 104397, 'train_runtime': '53.95', 'train_tokens_per_second': '1935'}
55
+ {'loss': '1.622', 'grad_norm': '0.3713', 'learning_rate': '1.145e-09', 'epoch': '0.00467', 'num_input_tokens_seen': 106444, 'train_runtime': '54.97', 'train_tokens_per_second': '1936'}
56
+ {'loss': '1.553', 'grad_norm': '0.2788', 'learning_rate': '1.167e-09', 'epoch': '0.004759', 'num_input_tokens_seen': 108491, 'train_runtime': '56.22', 'train_tokens_per_second': '1930'}
57
+ {'loss': '0.9888', 'grad_norm': '0.2236', 'learning_rate': '1.19e-09', 'epoch': '0.004849', 'num_input_tokens_seen': 110538, 'train_runtime': '57.25', 'train_tokens_per_second': '1931'}
58
+ {'loss': '1.248', 'grad_norm': '0.2913', 'learning_rate': '1.212e-09', 'epoch': '0.004939', 'num_input_tokens_seen': 112585, 'train_runtime': '58.27', 'train_tokens_per_second': '1932'}
59
+ {'loss': '2.028', 'grad_norm': '0.3286', 'learning_rate': '1.235e-09', 'epoch': '0.005029', 'num_input_tokens_seen': 114632, 'train_runtime': '59.3', 'train_tokens_per_second': '1933'}
60
+ {'loss': '1.266', 'grad_norm': '0.2464', 'learning_rate': '1.257e-09', 'epoch': '0.005119', 'num_input_tokens_seen': 116679, 'train_runtime': '60.32', 'train_tokens_per_second': '1934'}
61
+ {'loss': '2.03', 'grad_norm': '0.3106', 'learning_rate': '1.28e-09', 'epoch': '0.005208', 'num_input_tokens_seen': 118726, 'train_runtime': '61.35', 'train_tokens_per_second': '1935'}
62
+ {'loss': '2.258', 'grad_norm': '0.2985', 'learning_rate': '1.302e-09', 'epoch': '0.005298', 'num_input_tokens_seen': 120773, 'train_runtime': '62.37', 'train_tokens_per_second': '1936'}
63
+ {'loss': '1.408', 'grad_norm': '0.2653', 'learning_rate': '1.325e-09', 'epoch': '0.005388', 'num_input_tokens_seen': 122820, 'train_runtime': '63.4', 'train_tokens_per_second': '1937'}
64
+ {'loss': '1.47', 'grad_norm': '0.2386', 'learning_rate': '1.347e-09', 'epoch': '0.005478', 'num_input_tokens_seen': 124867, 'train_runtime': '64.43', 'train_tokens_per_second': '1938'}
65
+ {'loss': '1.25', 'grad_norm': '0.2393', 'learning_rate': '1.369e-09', 'epoch': '0.005568', 'num_input_tokens_seen': 126914, 'train_runtime': '65.45', 'train_tokens_per_second': '1939'}
66
+ {'loss': '1.192', 'grad_norm': '0.22', 'learning_rate': '1.392e-09', 'epoch': '0.005657', 'num_input_tokens_seen': 128961, 'train_runtime': '66.48', 'train_tokens_per_second': '1940'}
67
+ {'loss': '2.005', 'grad_norm': '0.3749', 'learning_rate': '1.414e-09', 'epoch': '0.005747', 'num_input_tokens_seen': 131008, 'train_runtime': '67.5', 'train_tokens_per_second': '1941'}
68
+ {'loss': '1.187', 'grad_norm': '0.2649', 'learning_rate': '1.437e-09', 'epoch': '0.005837', 'num_input_tokens_seen': 133055, 'train_runtime': '68.53', 'train_tokens_per_second': '1942'}
69
+ {'loss': '1.942', 'grad_norm': '0.316', 'learning_rate': '1.459e-09', 'epoch': '0.005927', 'num_input_tokens_seen': 135102, 'train_runtime': '69.55', 'train_tokens_per_second': '1942'}
70
+ {'loss': '1.356', 'grad_norm': '0.2641', 'learning_rate': '1.482e-09', 'epoch': '0.006017', 'num_input_tokens_seen': 137149, 'train_runtime': '70.58', 'train_tokens_per_second': '1943'}
71
+ {'loss': '1.521', 'grad_norm': '0.2801', 'learning_rate': '1.504e-09', 'epoch': '0.006106', 'num_input_tokens_seen': 139196, 'train_runtime': '71.6', 'train_tokens_per_second': '1944'}
72
+ {'loss': '1.269', 'grad_norm': '0.2894', 'learning_rate': '1.527e-09', 'epoch': '0.006196', 'num_input_tokens_seen': 141243, 'train_runtime': '72.63', 'train_tokens_per_second': '1945'}
73
+ {'loss': '1.512', 'grad_norm': '0.3255', 'learning_rate': '1.549e-09', 'epoch': '0.006286', 'num_input_tokens_seen': 143290, 'train_runtime': '73.66', 'train_tokens_per_second': '1945'}
74
+ {'loss': '1.499', 'grad_norm': '0.3646', 'learning_rate': '1.571e-09', 'epoch': '0.006376', 'num_input_tokens_seen': 145337, 'train_runtime': '74.68', 'train_tokens_per_second': '1946'}
75
+ {'loss': '1.203', 'grad_norm': '0.2309', 'learning_rate': '1.594e-09', 'epoch': '0.006466', 'num_input_tokens_seen': 147384, 'train_runtime': '75.71', 'train_tokens_per_second': '1947'}
76
+ {'loss': '1.836', 'grad_norm': '0.3327', 'learning_rate': '1.616e-09', 'epoch': '0.006555', 'num_input_tokens_seen': 149431, 'train_runtime': '76.73', 'train_tokens_per_second': '1947'}
77
+ {'loss': '2.345', 'grad_norm': '0.3427', 'learning_rate': '1.639e-09', 'epoch': '0.006645', 'num_input_tokens_seen': 151478, 'train_runtime': '77.76', 'train_tokens_per_second': '1948'}
78
+ {'loss': '0.9391', 'grad_norm': '0.1971', 'learning_rate': '1.661e-09', 'epoch': '0.006735', 'num_input_tokens_seen': 153525, 'train_runtime': '78.79', 'train_tokens_per_second': '1949'}
79
+ {'loss': '1.194', 'grad_norm': '0.2691', 'learning_rate': '1.684e-09', 'epoch': '0.006825', 'num_input_tokens_seen': 155572, 'train_runtime': '79.81', 'train_tokens_per_second': '1949'}
80
+ {'loss': '0.861', 'grad_norm': '0.2215', 'learning_rate': '1.706e-09', 'epoch': '0.006915', 'num_input_tokens_seen': 157619, 'train_runtime': '80.84', 'train_tokens_per_second': '1950'}
81
+ {'loss': '1.283', 'grad_norm': '0.2361', 'learning_rate': '1.729e-09', 'epoch': '0.007004', 'num_input_tokens_seen': 159666, 'train_runtime': '81.86', 'train_tokens_per_second': '1950'}
82
+ {'loss': '1.156', 'grad_norm': '0.3232', 'learning_rate': '1.751e-09', 'epoch': '0.007094', 'num_input_tokens_seen': 161713, 'train_runtime': '82.89', 'train_tokens_per_second': '1951'}
83
+ {'loss': '1.224', 'grad_norm': '0.236', 'learning_rate': '1.774e-09', 'epoch': '0.007184', 'num_input_tokens_seen': 163760, 'train_runtime': '83.91', 'train_tokens_per_second': '1952'}
84
+ {'loss': '1.574', 'grad_norm': '0.2811', 'learning_rate': '1.796e-09', 'epoch': '0.007274', 'num_input_tokens_seen': 165807, 'train_runtime': '84.94', 'train_tokens_per_second': '1952'}
85
+ {'loss': '1.184', 'grad_norm': '0.2666', 'learning_rate': '1.818e-09', 'epoch': '0.007364', 'num_input_tokens_seen': 167854, 'train_runtime': '85.97', 'train_tokens_per_second': '1953'}
86
+ {'loss': '1.584', 'grad_norm': '0.2654', 'learning_rate': '1.841e-09', 'epoch': '0.007453', 'num_input_tokens_seen': 169901, 'train_runtime': '86.99', 'train_tokens_per_second': '1953'}
87
+ {'loss': '1.232', 'grad_norm': '0.2376', 'learning_rate': '1.863e-09', 'epoch': '0.007543', 'num_input_tokens_seen': 171948, 'train_runtime': '88.02', 'train_tokens_per_second': '1954'}
88
+ {'loss': '1.896', 'grad_norm': '0.2882', 'learning_rate': '1.886e-09', 'epoch': '0.007633', 'num_input_tokens_seen': 173995, 'train_runtime': '89.04', 'train_tokens_per_second': '1954'}
89
+ {'loss': '1.251', 'grad_norm': '0.2442', 'learning_rate': '1.908e-09', 'epoch': '0.007723', 'num_input_tokens_seen': 176042, 'train_runtime': '90.07', 'train_tokens_per_second': '1955'}
90
+ {'loss': '1.534', 'grad_norm': '0.2569', 'learning_rate': '1.931e-09', 'epoch': '0.007812', 'num_input_tokens_seen': 178089, 'train_runtime': '91.1', 'train_tokens_per_second': '1955'}
91
+ {'loss': '1.197', 'grad_norm': '0.2276', 'learning_rate': '1.953e-09', 'epoch': '0.007902', 'num_input_tokens_seen': 180136, 'train_runtime': '92.13', 'train_tokens_per_second': '1955'}
92
+ {'loss': '1.212', 'grad_norm': '0.2436', 'learning_rate': '1.976e-09', 'epoch': '0.007992', 'num_input_tokens_seen': 182183, 'train_runtime': '93.16', 'train_tokens_per_second': '1956'}
93
+ {'loss': '1.526', 'grad_norm': '0.2623', 'learning_rate': '1.998e-09', 'epoch': '0.008082', 'num_input_tokens_seen': 184230, 'train_runtime': '94.18', 'train_tokens_per_second': '1956'}
94
+ {'loss': '1.537', 'grad_norm': '0.2703', 'learning_rate': '2.02e-09', 'epoch': '0.008172', 'num_input_tokens_seen': 186277, 'train_runtime': '95.21', 'train_tokens_per_second': '1956'}
95
+ {'loss': '1.004', 'grad_norm': '0.2214', 'learning_rate': '2.043e-09', 'epoch': '0.008261', 'num_input_tokens_seen': 188324, 'train_runtime': '96.24', 'train_tokens_per_second': '1957'}
96
+ {'loss': '2.116', 'grad_norm': '0.352', 'learning_rate': '2.065e-09', 'epoch': '0.008351', 'num_input_tokens_seen': 190371, 'train_runtime': '97.26', 'train_tokens_per_second': '1957'}
97
+ {'loss': '1.496', 'grad_norm': '0.285', 'learning_rate': '2.088e-09', 'epoch': '0.008441', 'num_input_tokens_seen': 192418, 'train_runtime': '98.29', 'train_tokens_per_second': '1958'}
98
+ {'loss': '1.231', 'grad_norm': '0.2467', 'learning_rate': '2.11e-09', 'epoch': '0.008531', 'num_input_tokens_seen': 194465, 'train_runtime': '99.32', 'train_tokens_per_second': '1958'}
99
+ {'loss': '1.195', 'grad_norm': '0.2601', 'learning_rate': '2.133e-09', 'epoch': '0.008621', 'num_input_tokens_seen': 196512, 'train_runtime': '100.3', 'train_tokens_per_second': '1958'}
100
+ {'loss': '1.218', 'grad_norm': '0.2609', 'learning_rate': '2.155e-09', 'epoch': '0.00871', 'num_input_tokens_seen': 198559, 'train_runtime': '101.4', 'train_tokens_per_second': '1959'}
101
+ {'loss': '1.22', 'grad_norm': '0.2472', 'learning_rate': '2.178e-09', 'epoch': '0.0088', 'num_input_tokens_seen': 200606, 'train_runtime': '102.4', 'train_tokens_per_second': '1959'}
102
+ {'loss': '0.8798', 'grad_norm': '0.2429', 'learning_rate': '2.2e-09', 'epoch': '0.00889', 'num_input_tokens_seen': 202653, 'train_runtime': '103.4', 'train_tokens_per_second': '1959'}
103
+ {'loss': '0.9451', 'grad_norm': '0.2135', 'learning_rate': '2.223e-09', 'epoch': '0.00898', 'num_input_tokens_seen': 204700, 'train_runtime': '104.5', 'train_tokens_per_second': '1960'}
104
+ {'loss': '1.225', 'grad_norm': '0.2264', 'learning_rate': '2.245e-09', 'epoch': '0.00907', 'num_input_tokens_seen': 206747, 'train_runtime': '105.5', 'train_tokens_per_second': '1960'}
105
+ {'loss': '0.8673', 'grad_norm': '0.213', 'learning_rate': '2.267e-09', 'epoch': '0.009159', 'num_input_tokens_seen': 208794, 'train_runtime': '106.5', 'train_tokens_per_second': '1960'}
106
+ {'loss': '1.314', 'grad_norm': '0.2288', 'learning_rate': '2.29e-09', 'epoch': '0.009249', 'num_input_tokens_seen': 210841, 'train_runtime': '107.5', 'train_tokens_per_second': '1961'}
107
+ {'loss': '1.418', 'grad_norm': '0.2865', 'learning_rate': '2.312e-09', 'epoch': '0.009339', 'num_input_tokens_seen': 212888, 'train_runtime': '108.6', 'train_tokens_per_second': '1961'}
108
+ {'loss': '2.008', 'grad_norm': '0.2958', 'learning_rate': '2.335e-09', 'epoch': '0.009429', 'num_input_tokens_seen': 214935, 'train_runtime': '109.6', 'train_tokens_per_second': '1961'}
109
+ {'loss': '0.846', 'grad_norm': '0.2233', 'learning_rate': '2.357e-09', 'epoch': '0.009519', 'num_input_tokens_seen': 216982, 'train_runtime': '110.6', 'train_tokens_per_second': '1962'}
110
+ {'loss': '1.202', 'grad_norm': '0.2704', 'learning_rate': '2.38e-09', 'epoch': '0.009608', 'num_input_tokens_seen': 219029, 'train_runtime': '111.6', 'train_tokens_per_second': '1962'}
111
+ {'loss': '1.796', 'grad_norm': '0.2815', 'learning_rate': '2.402e-09', 'epoch': '0.009698', 'num_input_tokens_seen': 221076, 'train_runtime': '112.7', 'train_tokens_per_second': '1962'}
112
+ {'loss': '1.237', 'grad_norm': '0.4281', 'learning_rate': '2.425e-09', 'epoch': '0.009788', 'num_input_tokens_seen': 223123, 'train_runtime': '113.7', 'train_tokens_per_second': '1962'}
113
+ {'loss': '1.193', 'grad_norm': '0.2278', 'learning_rate': '2.447e-09', 'epoch': '0.009878', 'num_input_tokens_seen': 225170, 'train_runtime': '114.7', 'train_tokens_per_second': '1963'}
114
+ {'loss': '1.209', 'grad_norm': '0.2445', 'learning_rate': '2.469e-09', 'epoch': '0.009968', 'num_input_tokens_seen': 227217, 'train_runtime': '115.8', 'train_tokens_per_second': '1963'}
115
+ {'loss': '1.406', 'grad_norm': '0.2828', 'learning_rate': '2.492e-09', 'epoch': '0.01006', 'num_input_tokens_seen': 229264, 'train_runtime': '116.8', 'train_tokens_per_second': '1963'}
116
+ {'loss': '1.498', 'grad_norm': '0.2713', 'learning_rate': '2.514e-09', 'epoch': '0.01015', 'num_input_tokens_seen': 231311, 'train_runtime': '117.8', 'train_tokens_per_second': '1963'}
117
+ {'loss': '0.9127', 'grad_norm': '0.2485', 'learning_rate': '2.537e-09', 'epoch': '0.01024', 'num_input_tokens_seen': 233358, 'train_runtime': '118.8', 'train_tokens_per_second': '1964'}
118
+ {'loss': '1.277', 'grad_norm': '0.2386', 'learning_rate': '2.559e-09', 'epoch': '0.01033', 'num_input_tokens_seen': 235405, 'train_runtime': '119.9', 'train_tokens_per_second': '1964'}
119
+ {'loss': '1.474', 'grad_norm': '0.2714', 'learning_rate': '2.582e-09', 'epoch': '0.01042', 'num_input_tokens_seen': 237452, 'train_runtime': '120.9', 'train_tokens_per_second': '1964'}
120
+ {'loss': '1.279', 'grad_norm': '0.2655', 'learning_rate': '2.604e-09', 'epoch': '0.01051', 'num_input_tokens_seen': 239499, 'train_runtime': '121.9', 'train_tokens_per_second': '1964'}
121
+ {'loss': '1.242', 'grad_norm': '0.2135', 'learning_rate': '2.627e-09', 'epoch': '0.0106', 'num_input_tokens_seen': 241546, 'train_runtime': '123', 'train_tokens_per_second': '1964'}
122
+ {'loss': '0.9684', 'grad_norm': '0.2191', 'learning_rate': '2.649e-09', 'epoch': '0.01069', 'num_input_tokens_seen': 243593, 'train_runtime': '124', 'train_tokens_per_second': '1965'}
123
+ {'loss': '1.534', 'grad_norm': '0.2701', 'learning_rate': '2.672e-09', 'epoch': '0.01078', 'num_input_tokens_seen': 245640, 'train_runtime': '125', 'train_tokens_per_second': '1965'}
124
+ {'loss': '1.48', 'grad_norm': '0.2619', 'learning_rate': '2.694e-09', 'epoch': '0.01087', 'num_input_tokens_seen': 247687, 'train_runtime': '126', 'train_tokens_per_second': '1965'}
125
+ {'loss': '1.413', 'grad_norm': '0.2847', 'learning_rate': '2.716e-09', 'epoch': '0.01096', 'num_input_tokens_seen': 249734, 'train_runtime': '127.1', 'train_tokens_per_second': '1965'}
126
+ {'loss': '0.9694', 'grad_norm': '0.232', 'learning_rate': '2.739e-09', 'epoch': '0.01105', 'num_input_tokens_seen': 251781, 'train_runtime': '128.1', 'train_tokens_per_second': '1965'}
127
+ {'loss': '2.018', 'grad_norm': '0.3075', 'learning_rate': '2.761e-09', 'epoch': '0.01114', 'num_input_tokens_seen': 253828, 'train_runtime': '129.1', 'train_tokens_per_second': '1966'}
128
+ {'loss': '2.31', 'grad_norm': '0.3629', 'learning_rate': '2.784e-09', 'epoch': '0.01122', 'num_input_tokens_seen': 255875, 'train_runtime': '130.2', 'train_tokens_per_second': '1966'}
129
+ {'loss': '1.221', 'grad_norm': '0.2345', 'learning_rate': '2.806e-09', 'epoch': '0.01131', 'num_input_tokens_seen': 257922, 'train_runtime': '131.2', 'train_tokens_per_second': '1966'}
130
+ {'loss': '1.624', 'grad_norm': '0.3335', 'learning_rate': '2.829e-09', 'epoch': '0.0114', 'num_input_tokens_seen': 259969, 'train_runtime': '132.2', 'train_tokens_per_second': '1966'}
131
+ {'loss': '1.244', 'grad_norm': '0.2363', 'learning_rate': '2.851e-09', 'epoch': '0.01149', 'num_input_tokens_seen': 262016, 'train_runtime': '133.2', 'train_tokens_per_second': '1966'}
132
+ {'loss': '1.459', 'grad_norm': '0.2841', 'learning_rate': '2.874e-09', 'epoch': '0.01158', 'num_input_tokens_seen': 264063, 'train_runtime': '134.3', 'train_tokens_per_second': '1967'}
133
+ {'loss': '1.21', 'grad_norm': '0.3024', 'learning_rate': '2.896e-09', 'epoch': '0.01167', 'num_input_tokens_seen': 266110, 'train_runtime': '135.3', 'train_tokens_per_second': '1967'}
134
+ {'loss': '0.8682', 'grad_norm': '0.2301', 'learning_rate': '2.918e-09', 'epoch': '0.01176', 'num_input_tokens_seen': 268157, 'train_runtime': '136.3', 'train_tokens_per_second': '1967'}
135
+ {'loss': '1.269', 'grad_norm': '0.247', 'learning_rate': '2.941e-09', 'epoch': '0.01185', 'num_input_tokens_seen': 270204, 'train_runtime': '137.4', 'train_tokens_per_second': '1967'}
136
+ {'loss': '0.8011', 'grad_norm': '0.1994', 'learning_rate': '2.963e-09', 'epoch': '0.01194', 'num_input_tokens_seen': 272251, 'train_runtime': '138.4', 'train_tokens_per_second': '1967'}
137
+ {'loss': '0.8823', 'grad_norm': '0.2392', 'learning_rate': '2.986e-09', 'epoch': '0.01203', 'num_input_tokens_seen': 274298, 'train_runtime': '139.4', 'train_tokens_per_second': '1967'}
138
+ {'loss': '1.231', 'grad_norm': '0.2665', 'learning_rate': '3.008e-09', 'epoch': '0.01212', 'num_input_tokens_seen': 276345, 'train_runtime': '140.5', 'train_tokens_per_second': '1968'}
139
+ {'loss': '1.388', 'grad_norm': '0.2979', 'learning_rate': '3.031e-09', 'epoch': '0.01221', 'num_input_tokens_seen': 278392, 'train_runtime': '141.5', 'train_tokens_per_second': '1968'}
140
+ {'loss': '1.497', 'grad_norm': '0.2555', 'learning_rate': '3.053e-09', 'epoch': '0.0123', 'num_input_tokens_seen': 280439, 'train_runtime': '142.5', 'train_tokens_per_second': '1968'}
141
+ {'loss': '1.512', 'grad_norm': '0.2733', 'learning_rate': '3.076e-09', 'epoch': '0.01239', 'num_input_tokens_seen': 282486, 'train_runtime': '143.5', 'train_tokens_per_second': '1968'}
142
+ {'loss': '1.268', 'grad_norm': '0.229', 'learning_rate': '3.098e-09', 'epoch': '0.01248', 'num_input_tokens_seen': 284533, 'train_runtime': '144.6', 'train_tokens_per_second': '1968'}
143
+ {'loss': '0.9832', 'grad_norm': '0.2128', 'learning_rate': '3.121e-09', 'epoch': '0.01257', 'num_input_tokens_seen': 286580, 'train_runtime': '145.6', 'train_tokens_per_second': '1968'}
144
+ {'loss': '0.9476', 'grad_norm': '0.2454', 'learning_rate': '3.143e-09', 'epoch': '0.01266', 'num_input_tokens_seen': 288627, 'train_runtime': '146.6', 'train_tokens_per_second': '1968'}
145
+ {'loss': '1.232', 'grad_norm': '0.2289', 'learning_rate': '3.165e-09', 'epoch': '0.01275', 'num_input_tokens_seen': 290674, 'train_runtime': '147.7', 'train_tokens_per_second': '1969'}
146
+ {'loss': '1.206', 'grad_norm': '0.2379', 'learning_rate': '3.188e-09', 'epoch': '0.01284', 'num_input_tokens_seen': 292721, 'train_runtime': '148.7', 'train_tokens_per_second': '1969'}
147
+ {'loss': '1.253', 'grad_norm': '0.2237', 'learning_rate': '3.21e-09', 'epoch': '0.01293', 'num_input_tokens_seen': 294768, 'train_runtime': '149.7', 'train_tokens_per_second': '1969'}
148
+ {'loss': '0.9208', 'grad_norm': '0.3971', 'learning_rate': '3.233e-09', 'epoch': '0.01302', 'num_input_tokens_seen': 296815, 'train_runtime': '150.7', 'train_tokens_per_second': '1969'}
149
+ {'loss': '1.425', 'grad_norm': '0.3095', 'learning_rate': '3.255e-09', 'epoch': '0.01311', 'num_input_tokens_seen': 298862, 'train_runtime': '151.8', 'train_tokens_per_second': '1969'}
150
+ {'loss': '1.955', 'grad_norm': '0.2932', 'learning_rate': '3.278e-09', 'epoch': '0.0132', 'num_input_tokens_seen': 300909, 'train_runtime': '152.8', 'train_tokens_per_second': '1969'}
151
+ {'loss': '1.245', 'grad_norm': '0.2263', 'learning_rate': '3.3e-09', 'epoch': '0.01329', 'num_input_tokens_seen': 302956, 'train_runtime': '153.8', 'train_tokens_per_second': '1969'}
152
+ {'loss': '1.898', 'grad_norm': '0.2977', 'learning_rate': '3.323e-09', 'epoch': '0.01338', 'num_input_tokens_seen': 305003, 'train_runtime': '154.9', 'train_tokens_per_second': '1970'}
153
+ {'loss': '1.516', 'grad_norm': '0.2514', 'learning_rate': '3.345e-09', 'epoch': '0.01347', 'num_input_tokens_seen': 307050, 'train_runtime': '155.9', 'train_tokens_per_second': '1970'}
154
+ {'loss': '2.36', 'grad_norm': '0.3138', 'learning_rate': '3.367e-09', 'epoch': '0.01356', 'num_input_tokens_seen': 309097, 'train_runtime': '156.9', 'train_tokens_per_second': '1970'}
155
+ {'loss': '1.257', 'grad_norm': '0.2549', 'learning_rate': '3.39e-09', 'epoch': '0.01365', 'num_input_tokens_seen': 311144, 'train_runtime': '157.9', 'train_tokens_per_second': '1970'}
156
+ {'loss': '1.242', 'grad_norm': '0.2303', 'learning_rate': '3.412e-09', 'epoch': '0.01374', 'num_input_tokens_seen': 313191, 'train_runtime': '159', 'train_tokens_per_second': '1970'}
157
+ {'loss': '1.442', 'grad_norm': '0.2934', 'learning_rate': '3.435e-09', 'epoch': '0.01383', 'num_input_tokens_seen': 315238, 'train_runtime': '160', 'train_tokens_per_second': '1970'}
158
+ {'loss': '1.877', 'grad_norm': '0.3016', 'learning_rate': '3.457e-09', 'epoch': '0.01392', 'num_input_tokens_seen': 317285, 'train_runtime': '161', 'train_tokens_per_second': '1970'}
159
+ {'loss': '2.36', 'grad_norm': '0.3315', 'learning_rate': '3.48e-09', 'epoch': '0.01401', 'num_input_tokens_seen': 319332, 'train_runtime': '162.1', 'train_tokens_per_second': '1970'}
160
+ {'loss': '1.251', 'grad_norm': '0.251', 'learning_rate': '3.502e-09', 'epoch': '0.0141', 'num_input_tokens_seen': 321379, 'train_runtime': '163.1', 'train_tokens_per_second': '1971'}
161
+ {'loss': '0.9893', 'grad_norm': '0.2395', 'learning_rate': '3.525e-09', 'epoch': '0.01419', 'num_input_tokens_seen': 323426, 'train_runtime': '164.1', 'train_tokens_per_second': '1971'}
162
+ {'loss': '1.187', 'grad_norm': '0.2726', 'learning_rate': '3.547e-09', 'epoch': '0.01428', 'num_input_tokens_seen': 325473, 'train_runtime': '165.2', 'train_tokens_per_second': '1971'}
163
+ {'loss': '1.598', 'grad_norm': '0.2685', 'learning_rate': '3.57e-09', 'epoch': '0.01437', 'num_input_tokens_seen': 327520, 'train_runtime': '166.2', 'train_tokens_per_second': '1971'}
164
+ {'loss': '1.216', 'grad_norm': '0.2363', 'learning_rate': '3.592e-09', 'epoch': '0.01446', 'num_input_tokens_seen': 329567, 'train_runtime': '167.2', 'train_tokens_per_second': '1971'}
165
+ {'loss': '1.504', 'grad_norm': '0.2871', 'learning_rate': '3.614e-09', 'epoch': '0.01455', 'num_input_tokens_seen': 331614, 'train_runtime': '168.2', 'train_tokens_per_second': '1971'}
166
+ {'loss': '1.256', 'grad_norm': '0.2675', 'learning_rate': '3.637e-09', 'epoch': '0.01464', 'num_input_tokens_seen': 333661, 'train_runtime': '169.3', 'train_tokens_per_second': '1971'}
167
+ {'loss': '0.8719', 'grad_norm': '0.2252', 'learning_rate': '3.659e-09', 'epoch': '0.01473', 'num_input_tokens_seen': 335708, 'train_runtime': '170.3', 'train_tokens_per_second': '1971'}
168
+ {'loss': '1.524', 'grad_norm': '0.2718', 'learning_rate': '3.682e-09', 'epoch': '0.01482', 'num_input_tokens_seen': 337755, 'train_runtime': '171.3', 'train_tokens_per_second': '1971'}
169
+ {'loss': '1.866', 'grad_norm': '0.3113', 'learning_rate': '3.704e-09', 'epoch': '0.01491', 'num_input_tokens_seen': 339802, 'train_runtime': '172.4', 'train_tokens_per_second': '1971'}
170
+ {'loss': '1.45', 'grad_norm': '0.2719', 'learning_rate': '3.727e-09', 'epoch': '0.015', 'num_input_tokens_seen': 341849, 'train_runtime': '173.4', 'train_tokens_per_second': '1972'}
171
+ {'loss': '1.305', 'grad_norm': '0.2458', 'learning_rate': '3.749e-09', 'epoch': '0.01509', 'num_input_tokens_seen': 343896, 'train_runtime': '174.4', 'train_tokens_per_second': '1972'}
172
+ {'loss': '1.537', 'grad_norm': '0.3213', 'learning_rate': '3.772e-09', 'epoch': '0.01518', 'num_input_tokens_seen': 345943, 'train_runtime': '175.4', 'train_tokens_per_second': '1972'}
173
+ {'loss': '1.481', 'grad_norm': '0.2596', 'learning_rate': '3.794e-09', 'epoch': '0.01527', 'num_input_tokens_seen': 347990, 'train_runtime': '176.5', 'train_tokens_per_second': '1972'}
174
+ {'loss': '0.9901', 'grad_norm': '0.2333', 'learning_rate': '3.816e-09', 'epoch': '0.01536', 'num_input_tokens_seen': 350037, 'train_runtime': '177.5', 'train_tokens_per_second': '1972'}
175
+ {'loss': '1.945', 'grad_norm': '0.3357', 'learning_rate': '3.839e-09', 'epoch': '0.01545', 'num_input_tokens_seen': 352084, 'train_runtime': '178.5', 'train_tokens_per_second': '1972'}
176
+ {'loss': '1.322', 'grad_norm': '0.2917', 'learning_rate': '3.861e-09', 'epoch': '0.01554', 'num_input_tokens_seen': 354131, 'train_runtime': '179.6', 'train_tokens_per_second': '1972'}
177
+ {'loss': '1.149', 'grad_norm': '0.2248', 'learning_rate': '3.884e-09', 'epoch': '0.01562', 'num_input_tokens_seen': 356178, 'train_runtime': '180.6', 'train_tokens_per_second': '1972'}
178
+ {'loss': '1.945', 'grad_norm': '0.3084', 'learning_rate': '3.906e-09', 'epoch': '0.01571', 'num_input_tokens_seen': 358225, 'train_runtime': '181.6', 'train_tokens_per_second': '1972'}
179
+ {'loss': '1.442', 'grad_norm': '0.3129', 'learning_rate': '3.929e-09', 'epoch': '0.0158', 'num_input_tokens_seen': 360272, 'train_runtime': '182.7', 'train_tokens_per_second': '1972'}
180
+ {'loss': '0.9567', 'grad_norm': '0.2554', 'learning_rate': '3.951e-09', 'epoch': '0.01589', 'num_input_tokens_seen': 362319, 'train_runtime': '183.7', 'train_tokens_per_second': '1972'}
181
+ {'loss': '0.9676', 'grad_norm': '0.5356', 'learning_rate': '3.974e-09', 'epoch': '0.01598', 'num_input_tokens_seen': 364366, 'train_runtime': '184.7', 'train_tokens_per_second': '1973'}
182
+ {'loss': '1.373', 'grad_norm': '0.2589', 'learning_rate': '3.996e-09', 'epoch': '0.01607', 'num_input_tokens_seen': 366413, 'train_runtime': '185.8', 'train_tokens_per_second': '1973'}
183
+ {'loss': '1.662', 'grad_norm': '0.2725', 'learning_rate': '4.018e-09', 'epoch': '0.01616', 'num_input_tokens_seen': 368460, 'train_runtime': '186.8', 'train_tokens_per_second': '1973'}
184
+ {'loss': '1.858', 'grad_norm': '0.2993', 'learning_rate': '4.041e-09', 'epoch': '0.01625', 'num_input_tokens_seen': 370507, 'train_runtime': '187.8', 'train_tokens_per_second': '1973'}
185
+ {'loss': '2.123', 'grad_norm': '0.2936', 'learning_rate': '4.063e-09', 'epoch': '0.01634', 'num_input_tokens_seen': 372554, 'train_runtime': '188.8', 'train_tokens_per_second': '1973'}
186
+ {'loss': '1.248', 'grad_norm': '0.2247', 'learning_rate': '4.086e-09', 'epoch': '0.01643', 'num_input_tokens_seen': 374601, 'train_runtime': '189.9', 'train_tokens_per_second': '1973'}
187
+ {'loss': '1.222', 'grad_norm': '0.2639', 'learning_rate': '4.108e-09', 'epoch': '0.01652', 'num_input_tokens_seen': 376648, 'train_runtime': '190.9', 'train_tokens_per_second': '1973'}
188
+ {'loss': '2.085', 'grad_norm': '0.3095', 'learning_rate': '4.131e-09', 'epoch': '0.01661', 'num_input_tokens_seen': 378695, 'train_runtime': '191.9', 'train_tokens_per_second': '1973'}
189
+ {'loss': '1.467', 'grad_norm': '0.2568', 'learning_rate': '4.153e-09', 'epoch': '0.0167', 'num_input_tokens_seen': 380742, 'train_runtime': '193', 'train_tokens_per_second': '1973'}
190
+ {'loss': '0.8529', 'grad_norm': '0.2352', 'learning_rate': '4.176e-09', 'epoch': '0.01679', 'num_input_tokens_seen': 382789, 'train_runtime': '194', 'train_tokens_per_second': '1973'}
191
+ {'loss': '0.9057', 'grad_norm': '0.1991', 'learning_rate': '4.198e-09', 'epoch': '0.01688', 'num_input_tokens_seen': 384836, 'train_runtime': '195', 'train_tokens_per_second': '1973'}
192
+ {'loss': '0.8787', 'grad_norm': '0.2213', 'learning_rate': '4.221e-09', 'epoch': '0.01697', 'num_input_tokens_seen': 386883, 'train_runtime': '196.1', 'train_tokens_per_second': '1973'}
193
+ {'loss': '1.267', 'grad_norm': '0.266', 'learning_rate': '4.243e-09', 'epoch': '0.01706', 'num_input_tokens_seen': 388930, 'train_runtime': '197.1', 'train_tokens_per_second': '1973'}
194
+ {'loss': '2.175', 'grad_norm': '0.2836', 'learning_rate': '4.265e-09', 'epoch': '0.01715', 'num_input_tokens_seen': 390977, 'train_runtime': '198.1', 'train_tokens_per_second': '1974'}
195
+ {'loss': '1.494', 'grad_norm': '0.2626', 'learning_rate': '4.288e-09', 'epoch': '0.01724', 'num_input_tokens_seen': 393024, 'train_runtime': '199.1', 'train_tokens_per_second': '1974'}
196
+ {'loss': '0.9855', 'grad_norm': '0.2556', 'learning_rate': '4.31e-09', 'epoch': '0.01733', 'num_input_tokens_seen': 395071, 'train_runtime': '200.2', 'train_tokens_per_second': '1974'}
197
+ {'loss': '1.002', 'grad_norm': '0.2337', 'learning_rate': '4.333e-09', 'epoch': '0.01742', 'num_input_tokens_seen': 397118, 'train_runtime': '201.2', 'train_tokens_per_second': '1974'}
198
+ {'loss': '2.023', 'grad_norm': '0.3868', 'learning_rate': '4.355e-09', 'epoch': '0.01751', 'num_input_tokens_seen': 399165, 'train_runtime': '202.2', 'train_tokens_per_second': '1974'}
199
+ {'loss': '1.428', 'grad_norm': '0.2586', 'learning_rate': '4.378e-09', 'epoch': '0.0176', 'num_input_tokens_seen': 401212, 'train_runtime': '203.3', 'train_tokens_per_second': '1974'}
200
+ {'loss': '1.278', 'grad_norm': '0.2335', 'learning_rate': '4.4e-09', 'epoch': '0.01769', 'num_input_tokens_seen': 403259, 'train_runtime': '204.3', 'train_tokens_per_second': '1974'}
201
+ {'loss': '1.2', 'grad_norm': '0.2343', 'learning_rate': '4.423e-09', 'epoch': '0.01778', 'num_input_tokens_seen': 405306, 'train_runtime': '205.3', 'train_tokens_per_second': '1974'}
202
+ {'loss': '1.246', 'grad_norm': '0.2836', 'learning_rate': '4.445e-09', 'epoch': '0.01787', 'num_input_tokens_seen': 407353, 'train_runtime': '206.4', 'train_tokens_per_second': '1974'}
203
+ {'loss': '1.552', 'grad_norm': '0.2757', 'learning_rate': '4.467e-09', 'epoch': '0.01796', 'num_input_tokens_seen': 409400, 'train_runtime': '207.4', 'train_tokens_per_second': '1974'}
204
+ {'loss': '1.455', 'grad_norm': '0.3238', 'learning_rate': '4.49e-09', 'epoch': '0.01805', 'num_input_tokens_seen': 411447, 'train_runtime': '208.4', 'train_tokens_per_second': '1974'}
205
+ {'loss': '1.288', 'grad_norm': '0.2519', 'learning_rate': '4.512e-09', 'epoch': '0.01814', 'num_input_tokens_seen': 413494, 'train_runtime': '209.4', 'train_tokens_per_second': '1974'}
206
+ {'loss': '1.231', 'grad_norm': '0.2137', 'learning_rate': '4.535e-09', 'epoch': '0.01823', 'num_input_tokens_seen': 415541, 'train_runtime': '210.5', 'train_tokens_per_second': '1974'}
207
+ {'loss': '2.014', 'grad_norm': '0.3149', 'learning_rate': '4.557e-09', 'epoch': '0.01832', 'num_input_tokens_seen': 417588, 'train_runtime': '211.5', 'train_tokens_per_second': '1974'}
208
+ {'loss': '1.479', 'grad_norm': '0.2591', 'learning_rate': '4.58e-09', 'epoch': '0.01841', 'num_input_tokens_seen': 419635, 'train_runtime': '212.5', 'train_tokens_per_second': '1974'}
209
+ {'loss': '0.921', 'grad_norm': '0.2037', 'learning_rate': '4.602e-09', 'epoch': '0.0185', 'num_input_tokens_seen': 421682, 'train_runtime': '213.6', 'train_tokens_per_second': '1975'}
210
+ {'loss': '1.79', 'grad_norm': '0.2913', 'learning_rate': '4.625e-09', 'epoch': '0.01859', 'num_input_tokens_seen': 423729, 'train_runtime': '214.6', 'train_tokens_per_second': '1975'}
211
+ {'loss': '1.267', 'grad_norm': '0.256', 'learning_rate': '4.647e-09', 'epoch': '0.01868', 'num_input_tokens_seen': 425776, 'train_runtime': '215.6', 'train_tokens_per_second': '1975'}
212
+ {'loss': '1.873', 'grad_norm': '0.2851', 'learning_rate': '4.67e-09', 'epoch': '0.01877', 'num_input_tokens_seen': 427823, 'train_runtime': '216.7', 'train_tokens_per_second': '1975'}
213
+ {'loss': '0.9967', 'grad_norm': '0.256', 'learning_rate': '4.692e-09', 'epoch': '0.01886', 'num_input_tokens_seen': 429870, 'train_runtime': '217.7', 'train_tokens_per_second': '1975'}
214
+ {'loss': '1.227', 'grad_norm': '0.2439', 'learning_rate': '4.714e-09', 'epoch': '0.01895', 'num_input_tokens_seen': 431917, 'train_runtime': '218.7', 'train_tokens_per_second': '1975'}
215
+ {'loss': '1.904', 'grad_norm': '0.2904', 'learning_rate': '4.737e-09', 'epoch': '0.01904', 'num_input_tokens_seen': 433964, 'train_runtime': '219.7', 'train_tokens_per_second': '1975'}
216
+ {'loss': '1.58', 'grad_norm': '0.3392', 'learning_rate': '4.759e-09', 'epoch': '0.01913', 'num_input_tokens_seen': 436011, 'train_runtime': '220.8', 'train_tokens_per_second': '1975'}
217
+ {'loss': '0.8605', 'grad_norm': '0.211', 'learning_rate': '4.782e-09', 'epoch': '0.01922', 'num_input_tokens_seen': 438058, 'train_runtime': '221.8', 'train_tokens_per_second': '1975'}
218
+ {'loss': '0.8921', 'grad_norm': '0.2665', 'learning_rate': '4.804e-09', 'epoch': '0.01931', 'num_input_tokens_seen': 440105, 'train_runtime': '222.8', 'train_tokens_per_second': '1975'}
219
+ {'loss': '1.968', 'grad_norm': '0.3234', 'learning_rate': '4.827e-09', 'epoch': '0.0194', 'num_input_tokens_seen': 442152, 'train_runtime': '223.9', 'train_tokens_per_second': '1975'}
220
+ {'loss': '1.907', 'grad_norm': '0.3344', 'learning_rate': '4.849e-09', 'epoch': '0.01949', 'num_input_tokens_seen': 444199, 'train_runtime': '224.9', 'train_tokens_per_second': '1975'}
221
+ {'loss': '1.236', 'grad_norm': '0.2373', 'learning_rate': '4.872e-09', 'epoch': '0.01958', 'num_input_tokens_seen': 446246, 'train_runtime': '225.9', 'train_tokens_per_second': '1975'}
222
+ {'loss': '1.157', 'grad_norm': '0.2416', 'learning_rate': '4.894e-09', 'epoch': '0.01967', 'num_input_tokens_seen': 448293, 'train_runtime': '227', 'train_tokens_per_second': '1975'}
223
+ {'loss': '1.294', 'grad_norm': '0.2793', 'learning_rate': '4.916e-09', 'epoch': '0.01976', 'num_input_tokens_seen': 450340, 'train_runtime': '228', 'train_tokens_per_second': '1975'}
224
+ {'loss': '1.227', 'grad_norm': '0.3238', 'learning_rate': '4.939e-09', 'epoch': '0.01985', 'num_input_tokens_seen': 452387, 'train_runtime': '229', 'train_tokens_per_second': '1975'}
225
+ {'loss': '0.9845', 'grad_norm': '0.2103', 'learning_rate': '4.961e-09', 'epoch': '0.01994', 'num_input_tokens_seen': 454434, 'train_runtime': '230.1', 'train_tokens_per_second': '1975'}
226
+ {'loss': '1.277', 'grad_norm': '0.2558', 'learning_rate': '4.984e-09', 'epoch': '0.02003', 'num_input_tokens_seen': 456481, 'train_runtime': '231.1', 'train_tokens_per_second': '1975'}
227
+ {'loss': '1.224', 'grad_norm': '0.2459', 'learning_rate': '5.006e-09', 'epoch': '0.02011', 'num_input_tokens_seen': 458528, 'train_runtime': '232.1', 'train_tokens_per_second': '1975'}
228
+ {'loss': '1.507', 'grad_norm': '0.3769', 'learning_rate': '5.029e-09', 'epoch': '0.0202', 'num_input_tokens_seen': 460575, 'train_runtime': '233.1', 'train_tokens_per_second': '1976'}
229
+ {'loss': '0.9777', 'grad_norm': '0.2383', 'learning_rate': '5.051e-09', 'epoch': '0.02029', 'num_input_tokens_seen': 462622, 'train_runtime': '234.2', 'train_tokens_per_second': '1976'}
230
+ {'loss': '1.639', 'grad_norm': '7.145', 'learning_rate': '5.074e-09', 'epoch': '0.02038', 'num_input_tokens_seen': 464669, 'train_runtime': '235.2', 'train_tokens_per_second': '1976'}
231
+ {'loss': '1.54', 'grad_norm': '0.2886', 'learning_rate': '5.096e-09', 'epoch': '0.02047', 'num_input_tokens_seen': 466716, 'train_runtime': '236.2', 'train_tokens_per_second': '1976'}
232
+ {'loss': '1.495', 'grad_norm': '0.2755', 'learning_rate': '5.119e-09', 'epoch': '0.02056', 'num_input_tokens_seen': 468763, 'train_runtime': '237.3', 'train_tokens_per_second': '1976'}
233
+ {'loss': '1.568', 'grad_norm': '0.3269', 'learning_rate': '5.141e-09', 'epoch': '0.02065', 'num_input_tokens_seen': 470810, 'train_runtime': '238.3', 'train_tokens_per_second': '1976'}
234
+ {'loss': '1.03', 'grad_norm': '0.2274', 'learning_rate': '5.163e-09', 'epoch': '0.02074', 'num_input_tokens_seen': 472857, 'train_runtime': '239.3', 'train_tokens_per_second': '1976'}
235
+ {'loss': '1.547', 'grad_norm': '0.3207', 'learning_rate': '5.186e-09', 'epoch': '0.02083', 'num_input_tokens_seen': 474904, 'train_runtime': '240.4', 'train_tokens_per_second': '1976'}
236
+ {'loss': '0.9411', 'grad_norm': '0.2122', 'learning_rate': '5.208e-09', 'epoch': '0.02092', 'num_input_tokens_seen': 476951, 'train_runtime': '241.4', 'train_tokens_per_second': '1976'}
237
+ {'loss': '1.497', 'grad_norm': '0.2546', 'learning_rate': '5.231e-09', 'epoch': '0.02101', 'num_input_tokens_seen': 478998, 'train_runtime': '242.4', 'train_tokens_per_second': '1976'}
238
+ {'loss': '1.556', 'grad_norm': '0.3133', 'learning_rate': '5.253e-09', 'epoch': '0.0211', 'num_input_tokens_seen': 481045, 'train_runtime': '243.4', 'train_tokens_per_second': '1976'}
239
+ File "/usr/local/bin/llamafactory-cli", line 8, in <module>
240
+ sys.exit(main())
241
+ ^^^^^^
242
+ File "/workspace/LlamaFactory/src/llamafactory/cli.py", line 24, in main
243
+ launcher.launch()
244
+ File "/workspace/LlamaFactory/src/llamafactory/launcher.py", line 157, in launch
245
+ run_exp()
246
+ File "/workspace/LlamaFactory/src/llamafactory/train/tuner.py", line 125, in run_exp
247
+ _training_function(config={"args": args, "callbacks": callbacks})
248
+ File "/workspace/LlamaFactory/src/llamafactory/train/tuner.py", line 91, in _training_function
249
+ run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
250
+ File "/workspace/LlamaFactory/src/llamafactory/train/pt/workflow.py", line 63, in run_pt
251
+ train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
252
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
253
+ File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2174, in train
254
+ return inner_training_loop(
255
+ ^^^^^^^^^^^^^^^^^^^^
256
+ File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2536, in _inner_training_loop
257
+ tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
258
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
259
+ File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3809, in training_step
260
+ loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
261
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
262
+ File "/workspace/LlamaFactory/src/llamafactory/train/pt/trainer.py", line 93, in compute_loss
263
+ return super().compute_loss(model, inputs, *args, **kwargs)
264
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
265
+ File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3880, in compute_loss
266
+ outputs = model(**inputs)
267
+ ^^^^^^^^^^^^^^^
268
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
269
+ return self._call_impl(*args, **kwargs)
270
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
271
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
272
+ return forward_call(*args, **kwargs)
273
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
274
+ File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/operations.py", line 819, in forward
275
+ return model_forward(*args, **kwargs)
276
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
277
+ File "/usr/local/lib/python3.11/dist-packages/accelerate/utils/operations.py", line 807, in __call__
278
+ return convert_to_fp32(self.model_forward(*args, **kwargs))
279
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
280
+ File "/usr/local/lib/python3.11/dist-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
281
+ return func(*args, **kwargs)
282
+ ^^^^^^^^^^^^^^^^^^^^^
283
+ File "/usr/local/lib/python3.11/dist-packages/peft/peft_model.py", line 1923, in forward
284
+ return self.base_model(
285
+ ^^^^^^^^^^^^^^^^
286
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
287
+ return self._call_impl(*args, **kwargs)
288
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
289
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
290
+ return forward_call(*args, **kwargs)
291
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
292
+ File "/usr/local/lib/python3.11/dist-packages/peft/tuners/tuners_utils.py", line 311, in forward
293
+ return self.model.forward(*args, **kwargs)
294
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
295
+ File "/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py", line 835, in wrapper
296
+ output = func(self, *args, **kwargs)
297
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
298
+ File "/usr/local/lib/python3.11/dist-packages/transformers/models/qwen3/modeling_qwen3.py", line 505, in forward
299
+ outputs: BaseModelOutputWithPast = self.model(
300
+ ^^^^^^^^^^^
301
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
302
+ return self._call_impl(*args, **kwargs)
303
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
304
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
305
+ return forward_call(*args, **kwargs)
306
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
307
+ File "/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py", line 1002, in wrapper
308
+ outputs = func(self, *args, **kwargs)
309
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
310
+ File "/usr/local/lib/python3.11/dist-packages/transformers/models/qwen3/modeling_qwen3.py", line 435, in forward
311
+ hidden_states = decoder_layer(
312
+ ^^^^^^^^^^^^^^
313
+ File "/usr/local/lib/python3.11/dist-packages/transformers/modeling_layers.py", line 92, in __call__
314
+ return self._gradient_checkpointing_func(partial(super().__call__, **kwargs), *args)
315
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
316
+ File "/workspace/LlamaFactory/src/llamafactory/model/model_utils/checkpointing.py", line 99, in custom_gradient_checkpointing_func
317
+ return gradient_checkpointing_func(func, *args, **kwargs)
318
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
319
+ File "/usr/local/lib/python3.11/dist-packages/torch/_compile.py", line 31, in inner
320
+ return disable_fn(*args, **kwargs)
321
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^
322
+ File "/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py", line 600, in _fn
323
+ return fn(*args, **kwargs)
324
+ ^^^^^^^^^^^^^^^^^^^
325
+ File "/usr/local/lib/python3.11/dist-packages/torch/utils/checkpoint.py", line 481, in checkpoint
326
+ return CheckpointFunction.apply(function, preserve, *args)
327
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
328
+ File "/usr/local/lib/python3.11/dist-packages/torch/autograd/function.py", line 574, in apply
329
+ return super().apply(*args, **kwargs) # type: ignore[misc]
330
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
331
+ File "/usr/local/lib/python3.11/dist-packages/torch/utils/checkpoint.py", line 255, in forward
332
+ outputs = run_function(*args)
333
+ ^^^^^^^^^^^^^^^^^^^
334
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
335
+ return self._call_impl(*args, **kwargs)
336
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
337
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
338
+ return forward_call(*args, **kwargs)
339
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
340
+ File "/usr/local/lib/python3.11/dist-packages/transformers/models/qwen3/modeling_qwen3.py", line 323, in forward
341
+ hidden_states, _ = self.self_attn(
342
+ ^^^^^^^^^^^^^^^
343
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
344
+ return self._call_impl(*args, **kwargs)
345
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
346
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
347
+ return forward_call(*args, **kwargs)
348
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
349
+ File "/usr/local/lib/python3.11/dist-packages/transformers/models/qwen3/modeling_qwen3.py", line 264, in forward
350
+ query_states = self.q_norm(self.q_proj(hidden_states).view(hidden_shape)).transpose(1, 2)
351
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^
352
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
353
+ return self._call_impl(*args, **kwargs)
354
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
355
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
356
+ return forward_call(*args, **kwargs)
357
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
358
+ File "/usr/local/lib/python3.11/dist-packages/peft/tuners/lora/layer.py", line 807, in forward
359
+ result = result + lora_B(lora_A(dropout(x))) * scaling
360
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^
361
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
362
+ return self._call_impl(*args, **kwargs)
363
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
364
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
365
+ return forward_call(*args, **kwargs)
366
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
367
+ File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/linear.py", line 117, in forward
368
+ return F.linear(input, self.weight, self.bias)
369
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
370
+ KeyboardInterrupt
LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/requirements.txt ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pytz==2025.2
2
+ pydub==0.25.1
3
+ brotli==1.2.0
4
+ antlr4-python3-runtime==4.9.3
5
+ xxhash==3.6.0
6
+ websockets==15.0.1
7
+ tzdata==2025.3
8
+ typing_extensions==4.15.0
9
+ tqdm==4.67.3
10
+ tomlkit==0.13.3
11
+ termcolor==3.3.0
12
+ shtab==1.8.0
13
+ shellingham==1.5.4
14
+ sentencepiece==0.2.1
15
+ semantic-version==2.10.0
16
+ safetensors==0.7.0
17
+ ruff==0.15.0
18
+ regex==2026.1.15
19
+ python-multipart==0.0.22
20
+ pyparsing==3.3.2
21
+ pyarrow==23.0.0
22
+ protobuf==6.33.5
23
+ propcache==0.4.1
24
+ orjson==3.11.7
25
+ omegaconf==2.3.0
26
+ numpy==2.4.2
27
+ multidict==6.7.1
28
+ mdurl==0.1.2
29
+ kiwisolver==1.4.9
30
+ hf-xet==1.2.0
31
+ hf_transfer==0.1.9
32
+ groovy==0.1.2
33
+ frozenlist==1.8.0
34
+ fonttools==4.61.1
35
+ ffmpy==1.0.0
36
+ einops==0.8.2
37
+ docstring_parser==0.17.0
38
+ dill==0.3.8
39
+ cycler==0.12.1
40
+ click==8.3.1
41
+ av==16.0.0
42
+ annotated-types==0.7.0
43
+ annotated-doc==0.0.4
44
+ aiohappyeyeballs==2.6.1
45
+ aiofiles==24.1.0
46
+ yarl==1.22.0
47
+ uvicorn==0.40.0
48
+ typing-inspection==0.4.2
49
+ typer-slim==0.21.1
50
+ tiktoken==0.12.0
51
+ scipy==1.17.0
52
+ pydantic_core==2.41.4
53
+ pandas==2.3.3
54
+ multiprocess==0.70.16
55
+ modelscope==1.34.0
56
+ markdown-it-py==4.0.0
57
+ fire==0.7.1
58
+ contourpy==1.3.3
59
+ anyio==4.12.1
60
+ aiosignal==1.4.0
61
+ starlette==0.52.1
62
+ rich==14.3.2
63
+ pydantic==2.12.3
64
+ matplotlib==3.10.8
65
+ aiohttp==3.13.3
66
+ tyro==0.8.14
67
+ typer==0.21.1
68
+ torchdata==0.11.0
69
+ sse-starlette==3.2.0
70
+ safehttpx==0.1.7
71
+ huggingface_hub==1.4.1
72
+ fastapi==0.128.5
73
+ tokenizers==0.22.2
74
+ gradio_client==1.14.0
75
+ datasets==4.0.0
76
+ accelerate==1.11.0
77
+ transformers==5.0.0
78
+ gradio==5.50.0
79
+ trl==0.24.0
80
+ peft==0.18.1
81
+ llamafactory==0.9.5.dev0
82
+ jieba==0.42.1
83
+ rouge-chinese==1.0.3
84
+ joblib==1.5.3
85
+ nltk==3.9.2
86
+ py-cpuinfo==9.0.0
87
+ nvidia-ml-py==13.590.48
88
+ hjson==3.1.0
89
+ ninja==1.13.0
90
+ msgpack==1.1.2
91
+ deepspeed==0.16.9
92
+ smmap==5.0.2
93
+ sentry-sdk==2.52.0
94
+ gitdb==4.0.12
95
+ GitPython==3.1.46
96
+ wandb==0.24.2
97
+ entrypoints==0.4
98
+ jupyter_client==7.4.9
99
+ nbclassic==1.1.0
100
+ notebook==6.5.5
101
+ pyzmq==24.0.1
102
+ PyYAML==6.0.2
103
+ Send2Trash==1.8.3
104
+ argon2-cffi==23.1.0
105
+ argon2-cffi-bindings==21.2.0
106
+ arrow==1.3.0
107
+ asttokens==2.4.1
108
+ async-lru==2.0.4
109
+ attrs==24.2.0
110
+ babel==2.16.0
111
+ beautifulsoup4==4.12.3
112
+ bleach==6.1.0
113
+ certifi==2024.8.30
114
+ cffi==1.17.1
115
+ charset-normalizer==3.3.2
116
+ comm==0.2.2
117
+ debugpy==1.8.5
118
+ decorator==5.1.1
119
+ defusedxml==0.7.1
120
+ executing==2.1.0
121
+ fastjsonschema==2.20.0
122
+ fqdn==1.5.1
123
+ h11==0.14.0
124
+ httpcore==1.0.5
125
+ httpx==0.27.2
126
+ idna==3.10
127
+ ipykernel==6.29.5
128
+ ipython==8.27.0
129
+ ipython-genutils==0.2.0
130
+ ipywidgets==8.1.5
131
+ isoduration==20.11.0
132
+ jedi==0.19.1
133
+ json5==0.9.25
134
+ jsonpointer==3.0.0
135
+ jsonschema==4.23.0
136
+ jsonschema-specifications==2023.12.1
137
+ jupyter-archive==3.4.0
138
+ jupyter_contrib_core==0.4.2
139
+ jupyter_contrib_nbextensions==0.7.0
140
+ jupyter_core==5.7.2
141
+ jupyter-events==0.10.0
142
+ jupyter-highlight-selected-word==0.2.0
143
+ jupyter-lsp==2.2.5
144
+ jupyter_nbextensions_configurator==0.6.4
145
+ jupyter_server==2.14.2
146
+ jupyter_server_terminals==0.5.3
147
+ jupyterlab==4.2.5
148
+ jupyterlab_pygments==0.3.0
149
+ jupyterlab_server==2.27.3
150
+ jupyterlab_widgets==3.0.13
151
+ lxml==5.3.0
152
+ matplotlib-inline==0.1.7
153
+ mistune==3.0.2
154
+ nbclient==0.10.0
155
+ nbconvert==7.16.4
156
+ nbformat==5.10.4
157
+ nest-asyncio==1.6.0
158
+ notebook_shim==0.2.4
159
+ overrides==7.7.0
160
+ packaging==24.1
161
+ pandocfilters==1.5.1
162
+ parso==0.8.4
163
+ pexpect==4.9.0
164
+ platformdirs==4.3.6
165
+ prometheus_client==0.21.0
166
+ prompt_toolkit==3.0.47
167
+ psutil==6.0.0
168
+ ptyprocess==0.7.0
169
+ pure_eval==0.2.3
170
+ pycparser==2.22
171
+ Pygments==2.18.0
172
+ python-dateutil==2.9.0.post0
173
+ python-json-logger==2.0.7
174
+ referencing==0.35.1
175
+ requests==2.32.3
176
+ rfc3339-validator==0.1.4
177
+ rfc3986-validator==0.1.1
178
+ rpds-py==0.20.0
179
+ sniffio==1.3.1
180
+ soupsieve==2.6
181
+ stack-data==0.6.3
182
+ terminado==0.18.1
183
+ tinycss2==1.3.0
184
+ tornado==6.4.1
185
+ traitlets==5.14.3
186
+ types-python-dateutil==2.9.0.20240906
187
+ uri-template==1.3.0
188
+ urllib3==2.2.3
189
+ wcwidth==0.2.13
190
+ webcolors==24.8.0
191
+ webencodings==0.5.1
192
+ websocket-client==1.8.0
193
+ widgetsnbextension==4.0.13
194
+ Jinja2==3.1.3
195
+ MarkupSafe==2.1.5
196
+ filelock==3.13.1
197
+ fsspec==2024.2.0
198
+ mpmath==1.3.0
199
+ networkx==3.2.1
200
+ nvidia-cublas-cu12==12.4.2.65
201
+ nvidia-cuda-cupti-cu12==12.4.99
202
+ nvidia-cuda-nvrtc-cu12==12.4.99
203
+ nvidia-cuda-runtime-cu12==12.4.99
204
+ nvidia-cudnn-cu12==9.1.0.70
205
+ nvidia-cufft-cu12==11.2.0.44
206
+ nvidia-curand-cu12==10.3.5.119
207
+ nvidia-cusolver-cu12==11.6.0.99
208
+ nvidia-cusparse-cu12==12.3.0.142
209
+ nvidia-nccl-cu12==2.20.5
210
+ nvidia-nvjitlink-cu12==12.4.99
211
+ nvidia-nvtx-cu12==12.4.99
212
+ pillow==10.2.0
213
+ sympy==1.12
214
+ torch==2.4.1+cu124
215
+ torchaudio==2.4.1+cu124
216
+ torchvision==0.19.1+cu124
217
+ triton==3.0.0
218
+ pip==24.2
219
+ setuptools==75.1.0
220
+ wheel==0.44.0
221
+ PyGObject==3.42.1
222
+ PyJWT==2.3.0
223
+ SecretStorage==3.3.1
224
+ blinker==1.4
225
+ cryptography==3.4.8
226
+ dbus-python==1.2.18
227
+ distro==1.7.0
228
+ httplib2==0.20.2
229
+ importlib-metadata==4.6.4
230
+ jeepney==0.7.1
231
+ keyring==23.5.0
232
+ launchpadlib==1.10.16
233
+ lazr.restfulclient==0.14.4
234
+ lazr.uri==1.0.6
235
+ more-itertools==8.10.0
236
+ oauthlib==3.2.0
237
+ python-apt==2.4.0+ubuntu4
238
+ six==1.16.0
239
+ wadllib==1.3.6
240
+ zipp==1.0.0
241
+ autocommand==2.2.2
242
+ backports.tarfile==1.2.0
243
+ importlib_metadata==8.0.0
244
+ importlib_resources==6.4.0
245
+ inflect==7.3.1
246
+ jaraco.collections==5.1.0
247
+ jaraco.context==5.3.0
248
+ jaraco.functools==4.0.1
249
+ jaraco.text==3.12.1
250
+ more-itertools==10.3.0
251
+ packaging==24.1
252
+ platformdirs==4.2.2
253
+ tomli==2.0.1
254
+ typeguard==4.3.0
255
+ typing_extensions==4.12.2
256
+ wheel==0.43.0
257
+ zipp==3.19.2
LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/wandb-metadata.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-79-generic-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.11.10",
4
+ "startedAt": "2026-02-09T07:31:58.346206Z",
5
+ "args": [
6
+ "/workspace/v127rc_exp2/B_dup.yaml"
7
+ ],
8
+ "program": "/usr/local/bin/llamafactory-cli",
9
+ "git": {
10
+ "remote": "https://github.com/hiyouga/LlamaFactory.git",
11
+ "commit": "1a02717fa84c270d1c156c4c4a391c2f95525a63"
12
+ },
13
+ "email": "markmochi200@gmail.com",
14
+ "root": "/workspace/LlamaFactory",
15
+ "host": "7bc9cc925966",
16
+ "executable": "/usr/bin/python",
17
+ "cpu_count": 16,
18
+ "cpu_count_logical": 32,
19
+ "gpu": "NVIDIA GeForce RTX 4090",
20
+ "gpu_count": 1,
21
+ "disk": {
22
+ "/": {
23
+ "total": "21474836480",
24
+ "used": "2060386304"
25
+ }
26
+ },
27
+ "memory": {
28
+ "total": "134123229184"
29
+ },
30
+ "gpu_nvidia": [
31
+ {
32
+ "name": "NVIDIA GeForce RTX 4090",
33
+ "memoryTotal": "25757220864",
34
+ "cudaCores": 16384,
35
+ "architecture": "Ada",
36
+ "uuid": "GPU-5c963f5e-1505-b56d-adf2-0ab9c4a32166"
37
+ }
38
+ ],
39
+ "cudaVersion": "12.8",
40
+ "writerId": "n349ejidifao2bsbyv1vqxhikiyr57gh"
41
+ }
LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"train/global_step":235,"train/num_input_tokens_seen":481045,"train_runtime":243.4472,"train/grad_norm":0.3132905960083008,"_step":234,"_timestamp":1.770622561360074e+09,"train/learning_rate":5.25323275862069e-09,"train/loss":1.5564426183700562,"_runtime":242,"train/train_tokens_per_second":1975.972,"train/epoch":0.021102729885057472,"_wandb":{"runtime":242}}
LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/logs/debug-internal.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-02-09T07:31:58.588577255Z","level":"INFO","msg":"stream: starting","core version":"0.24.2"}
2
+ {"time":"2026-02-09T07:31:58.912408218Z","level":"INFO","msg":"stream: created new stream","id":"8c1g8ddy"}
3
+ {"time":"2026-02-09T07:31:58.913196337Z","level":"INFO","msg":"handler: started","stream_id":"8c1g8ddy"}
4
+ {"time":"2026-02-09T07:31:58.915052871Z","level":"INFO","msg":"stream: started","id":"8c1g8ddy"}
5
+ {"time":"2026-02-09T07:31:58.915056006Z","level":"INFO","msg":"writer: started","stream_id":"8c1g8ddy"}
6
+ {"time":"2026-02-09T07:31:58.915062909Z","level":"INFO","msg":"sender: started","stream_id":"8c1g8ddy"}
7
+ {"time":"2026-02-09T07:36:01.595340425Z","level":"INFO","msg":"stream: closing","id":"8c1g8ddy"}
8
+ {"time":"2026-02-09T07:36:02.247674687Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
9
+ {"time":"2026-02-09T07:36:02.523971041Z","level":"INFO","msg":"handler: closed","stream_id":"8c1g8ddy"}
10
+ {"time":"2026-02-09T07:36:02.527325636Z","level":"INFO","msg":"sender: closed","stream_id":"8c1g8ddy"}
11
+ {"time":"2026-02-09T07:36:02.528189557Z","level":"INFO","msg":"stream: closed","id":"8c1g8ddy"}
LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/logs/debug.log ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-09 07:31:58,366 INFO MainThread:685 [wandb_setup.py:_flush():81] Current SDK version is 0.24.2
2
+ 2026-02-09 07:31:58,366 INFO MainThread:685 [wandb_setup.py:_flush():81] Configure stats pid to 685
3
+ 2026-02-09 07:31:58,366 INFO MainThread:685 [wandb_setup.py:_flush():81] Loading settings from environment variables
4
+ 2026-02-09 07:31:58,367 INFO MainThread:685 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/logs/debug.log
5
+ 2026-02-09 07:31:58,368 INFO MainThread:685 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/LlamaFactory/wandb/run-20260209_073158-8c1g8ddy/logs/debug-internal.log
6
+ 2026-02-09 07:31:58,369 INFO MainThread:685 [wandb_init.py:init():844] calling init triggers
7
+ 2026-02-09 07:31:58,369 INFO MainThread:685 [wandb_init.py:init():849] wandb.init called with sweep_config: {}
8
+ config: {'_wandb': {}}
9
+ 2026-02-09 07:31:58,369 INFO MainThread:685 [wandb_init.py:init():892] starting backend
10
+ 2026-02-09 07:31:58,579 INFO MainThread:685 [wandb_init.py:init():895] sending inform_init request
11
+ 2026-02-09 07:31:58,586 INFO MainThread:685 [wandb_init.py:init():903] backend started and connected
12
+ 2026-02-09 07:31:58,588 INFO MainThread:685 [wandb_init.py:init():973] updated telemetry
13
+ 2026-02-09 07:31:58,640 INFO MainThread:685 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
14
+ 2026-02-09 07:31:59,228 INFO MainThread:685 [wandb_init.py:init():1042] starting run threads in backend
15
+ 2026-02-09 07:31:59,300 INFO MainThread:685 [wandb_run.py:_console_start():2529] atexit reg
16
+ 2026-02-09 07:31:59,300 INFO MainThread:685 [wandb_run.py:_redirect():2377] redirect: wrap_raw
17
+ 2026-02-09 07:31:59,301 INFO MainThread:685 [wandb_run.py:_redirect():2446] Wrapping output streams.
18
+ 2026-02-09 07:31:59,301 INFO MainThread:685 [wandb_run.py:_redirect():2469] Redirects installed.
19
+ 2026-02-09 07:31:59,310 INFO MainThread:685 [wandb_init.py:init():1082] run started, returning control to user process
20
+ 2026-02-09 07:31:59,312 INFO MainThread:685 [wandb_run.py:_config_callback():1404] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.1', 'base_model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'revision': None, 'inference_mode': False, 'r': 16, 'target_modules': ['v_proj', 'up_proj', 'o_proj', 'k_proj', 'down_proj', 'q_proj', 'gate_proj'], 'exclude_modules': None, 'lora_alpha': 32, 'lora_dropout': 0.03, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 4096, 'intermediate_size': 12288, 'num_hidden_layers': 36, 'num_attention_heads': 32, 'use_sliding_window': False, 'sliding_window': None, 'max_window_layers': 36, 'num_key_value_heads': 8, 'head_dim': 128, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'attention_bias': False, 'attention_dropout': 0.0, 'layer_types': ['full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention'], 'pad_token_id': 151643, 'bos_token_id': None, 'eos_token_id': 151645, 'tie_word_embeddings': False, 'rope_parameters': {'rope_theta': 1000000, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'architectures': ['Qwen3ForCausalLM'], 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'problem_type': None, '_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'transformers_version': '5.0.0', 'model_type': 'qwen3', 'output_attentions': False, 'output_dir': '/workspace/v127rc_exp2/B_dup', 'do_train': True, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 8, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_epsilon': 1e-08, 'max_grad_norm': 1, 'num_train_epochs': 10000, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.02, 'warmup_steps': 0.02, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': True, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': ['labels'], 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 180000000, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'all', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'sortish_sampler': False, 'predict_with_generate': False, 'generation_max_length': 2047, 'generation_num_beams': None, 'generation_config': None, 'ray_num_workers': 1, 'ray_init_kwargs': None, 'master_addr': None, 'master_port': None, 'fp8': False, 'fp8_backend': 'auto', 'fp8_enable_fsdp_float8_all_gather': False, 'overwrite_output_dir': False}
21
+ 2026-02-09 07:31:59,318 INFO MainThread:685 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 8234382336 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7f80449ca5d0>>
22
+ 2026-02-09 07:31:59,319 INFO MainThread:685 [wandb_run.py:_config_callback():1404] config_cb model/num_parameters 8234382336 None
23
+ 2026-02-09 07:31:59,320 INFO MainThread:685 [wandb_run.py:_config_callback():1404] config_cb None None {'model_args': {'model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'adapter_name_or_path': None, 'adapter_folder': None, 'cache_dir': None, 'use_fast_tokenizer': True, 'resize_vocab': False, 'split_special_tokens': False, 'add_tokens': None, 'add_special_tokens': None, 'new_special_tokens_config': None, 'init_special_tokens': 'noise_init', 'model_revision': 'main', 'low_cpu_mem_usage': True, 'rope_scaling': None, 'flash_attn': 'auto', 'shift_attn': False, 'mixture_of_depths': None, 'use_unsloth': False, 'use_unsloth_gc': False, 'enable_liger_kernel': False, 'moe_aux_loss_coef': None, 'disable_gradient_checkpointing': False, 'use_reentrant_gc': True, 'upcast_layernorm': False, 'upcast_lmhead_output': False, 'train_from_scratch': False, 'infer_backend': 'HF', 'offload_folder': 'offload', 'use_kv_cache': True, 'use_v1_kernels': False, 'infer_dtype': 'auto', 'hf_hub_token': '<HF_HUB_TOKEN>', 'ms_hub_token': '<MS_HUB_TOKEN>', 'om_hub_token': '<OM_HUB_TOKEN>', 'print_param_status': False, 'trust_remote_code': True, 'quantization_method': 'BNB', 'quantization_bit': None, 'quantization_type': 'nf4', 'double_quantization': True, 'quantization_device_map': None, 'image_max_pixels': 589824, 'image_min_pixels': 1024, 'image_do_pan_and_scan': False, 'crop_to_patches': False, 'video_max_pixels': 65536, 'video_min_pixels': 256, 'video_fps': 2.0, 'video_maxlen': 128, 'use_audio_in_video': False, 'audio_sampling_rate': 16000, 'export_dir': None, 'export_size': 5, 'export_device': 'cpu', 'export_quantization_bit': None, 'export_quantization_dataset': None, 'export_quantization_nsamples': 128, 'export_quantization_maxlen': 1024, 'export_legacy_format': False, 'export_hub_model_id': None, 'use_kt': False, 'kt_optimize_rule': None, 'cpu_infer': 32, 'chunk_size': 8192, 'mode': 'normal', 'kt_maxlen': 4096, 'kt_use_cuda_graph': True, 'kt_mode': 'normal', 'kt_force_think': False, 'vllm_maxlen': 4096, 'vllm_gpu_util': 0.7, 'vllm_enforce_eager': False, 'vllm_max_lora_rank': 32, 'vllm_config': None, 'sglang_maxlen': 4096, 'sglang_mem_fraction': 0.7, 'sglang_tp_size': -1, 'sglang_config': None, 'sglang_lora_backend': 'triton', 'compute_dtype': 'torch.bfloat16', 'device_map': {'': 'cuda:0'}, 'model_max_length': 2047, 'block_diag_attn': False}, 'data_args': {'template': 'qwen3_nothink', 'dataset': ['Markie_Voss_t0_d34_r300'], 'eval_dataset': None, 'dataset_dir': '/workspace/LlamaFactory/data', 'media_dir': '/workspace/LlamaFactory/data', 'cutoff_len': 2047, 'train_on_prompt': False, 'mask_history': False, 'streaming': False, 'buffer_size': 16384, 'mix_strategy': 'concat', 'interleave_probs': None, 'overwrite_cache': False, 'preprocessing_batch_size': 1000, 'preprocessing_num_workers': 16, 'max_samples': 100000000, 'eval_num_beams': None, 'ignore_pad_token_for_loss': True, 'val_size': 0.0, 'eval_on_each_dataset': False, 'packing': True, 'neat_packing': False, 'tool_format': None, 'default_system': None, 'enable_thinking': False, 'tokenized_path': None, 'data_shared_file_system': False}, 'finetuning_args': {'freeze_trainable_layers': 2, 'freeze_trainable_modules': ['all'], 'freeze_extra_modules': None, 'additional_target': None, 'module_dropout': 0.0, 'oft_rank': 0, 'oft_block_size': 32, 'oft_target': ['all'], 'create_new_adapter': False, 'lora_alpha': 32, 'lora_dropout': 0.03, 'lora_rank': 16, 'lora_target': ['all'], 'loraplus_lr_ratio': None, 'loraplus_lr_embedding': 1e-06, 'use_rslora': False, 'use_dora': False, 'pissa_init': False, 'pissa_iter': 16, 'pissa_convert': False, 'pref_beta': 0.1, 'pref_ftx': 0.0, 'pref_bco_weight': 0.0, 'pref_loss': 'sigmoid', 'dpo_label_smoothing': 0.0, 'kto_chosen_weight': 1.0, 'kto_rejected_weight': 1.0, 'simpo_gamma': 0.5, 'ppo_buffer_size': 1, 'ppo_epochs': 4, 'ppo_score_norm': False, 'ppo_target': 6.0, 'ppo_whiten_rewards': False, 'ref_model': None, 'ref_model_adapters': None, 'ref_model_quantization_bit': None, 'reward_model': None, 'reward_model_adapters': None, 'reward_model_quantization_bit': None, 'reward_model_type': 'lora', 'ld_alpha': None, 'use_galore': False, 'galore_target': ['all'], 'galore_rank': 16, 'galore_update_interval': 200, 'galore_scale': 2.0, 'galore_proj_type': 'std', 'galore_layerwise': False, 'use_apollo': False, 'apollo_target': ['all'], 'apollo_rank': 16, 'apollo_update_interval': 200, 'apollo_scale': 32.0, 'apollo_proj': 'random', 'apollo_proj_type': 'std', 'apollo_scale_type': 'channel', 'apollo_layerwise': False, 'apollo_scale_front': False, 'use_badam': False, 'badam_mode': 'layer', 'badam_start_block': None, 'badam_switch_mode': 'ascending', 'badam_switch_interval': 50, 'badam_update_ratio': 0.05, 'badam_mask_mode': 'adjacent', 'badam_verbose': 0, 'use_swanlab': False, 'swanlab_project': 'llamafactory', 'swanlab_workspace': None, 'swanlab_run_name': None, 'swanlab_mode': 'cloud', 'swanlab_api_key': '<SWANLAB_API_KEY>', 'swanlab_logdir': None, 'swanlab_lark_webhook_url': None, 'swanlab_lark_secret': None, 'pure_bf16': False, 'stage': 'pt', 'finetuning_type': 'lora', 'use_llama_pro': False, 'use_adam_mini': False, 'use_mca': False, 'use_muon': False, 'use_dft_loss': False, 'use_eaft_loss': False, 'eaft_alpha': 1.0, 'freeze_vision_tower': True, 'freeze_multi_modal_projector': True, 'freeze_language_model': False, 'compute_accuracy': False, 'disable_shuffling': False, 'early_stopping_steps': None, 'plot_loss': True, 'include_effective_tokens_per_second': False}, 'generating_args': {'do_sample': True, 'temperature': 0.95, 'top_p': 0.7, 'top_k': 50, 'num_beams': 1, 'max_new_tokens': 1024, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'skip_special_tokens': True}}
24
+ 2026-02-09 07:36:01,595 INFO wandb-AsyncioManager-main:685 [service_client.py:_forward_responses():94] Reached EOF.
25
+ 2026-02-09 07:36:01,596 INFO wandb-AsyncioManager-main:685 [mailbox.py:close():154] Closing mailbox, abandoning 1 handles.
LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/config.yaml ADDED
@@ -0,0 +1,723 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ _name_or_path:
2
+ value: /workspace/Qwen/Qwen3-8B-Base
3
+ _wandb:
4
+ value:
5
+ cli_version: 0.24.2
6
+ e:
7
+ 1eab48yhuu1cnh8oebani4cfaxehtql2:
8
+ args:
9
+ - /workspace/v127rc_exp2/B_dup.yaml
10
+ cpu_count: 16
11
+ cpu_count_logical: 32
12
+ cudaVersion: "12.9"
13
+ disk:
14
+ /:
15
+ total: "21474836480"
16
+ used: "2060378112"
17
+ email: markmochi200@gmail.com
18
+ executable: /usr/bin/python
19
+ git:
20
+ commit: 1a02717fa84c270d1c156c4c4a391c2f95525a63
21
+ remote: https://github.com/hiyouga/LlamaFactory.git
22
+ gpu: NVIDIA GeForce RTX 4090
23
+ gpu_count: 1
24
+ gpu_nvidia:
25
+ - architecture: Ada
26
+ cudaCores: 16384
27
+ memoryTotal: "25757220864"
28
+ name: NVIDIA GeForce RTX 4090
29
+ uuid: GPU-6c1e98c2-1b34-cfd8-5de5-319e272f1d1e
30
+ host: 3bebe963f251
31
+ memory:
32
+ total: "134156767232"
33
+ os: Linux-6.8.0-60-generic-x86_64-with-glibc2.35
34
+ program: /usr/local/bin/llamafactory-cli
35
+ python: CPython 3.11.10
36
+ root: /workspace/LlamaFactory
37
+ startedAt: "2026-02-09T08:03:05.241495Z"
38
+ writerId: 1eab48yhuu1cnh8oebani4cfaxehtql2
39
+ m:
40
+ - "1": train/global_step
41
+ "6":
42
+ - 3
43
+ "7": []
44
+ - "2": '*'
45
+ "5": 1
46
+ "6":
47
+ - 1
48
+ "7": []
49
+ python_version: 3.11.10
50
+ t:
51
+ "1":
52
+ - 1
53
+ - 11
54
+ - 41
55
+ - 49
56
+ - 51
57
+ - 71
58
+ - 84
59
+ - 98
60
+ - 105
61
+ "2":
62
+ - 1
63
+ - 11
64
+ - 41
65
+ - 49
66
+ - 51
67
+ - 71
68
+ - 84
69
+ - 98
70
+ - 105
71
+ "3":
72
+ - 7
73
+ - 19
74
+ - 62
75
+ - 66
76
+ "4": 3.11.10
77
+ "5": 0.24.2
78
+ "6": 5.0.0
79
+ "9":
80
+ "1": transformers_trainer
81
+ "12": 0.24.2
82
+ "13": linux-x86_64
83
+ accelerator_config:
84
+ value:
85
+ dispatch_batches: null
86
+ even_batches: true
87
+ gradient_accumulation_kwargs: null
88
+ non_blocking: false
89
+ split_batches: false
90
+ use_seedable_sampler: true
91
+ adam_beta1:
92
+ value: 0.9
93
+ adam_beta2:
94
+ value: 0.95
95
+ adam_epsilon:
96
+ value: 1e-08
97
+ architectures:
98
+ value:
99
+ - Qwen3ForCausalLM
100
+ attention_bias:
101
+ value: false
102
+ attention_dropout:
103
+ value: 0
104
+ auto_find_batch_size:
105
+ value: false
106
+ average_tokens_across_devices:
107
+ value: true
108
+ batch_eval_metrics:
109
+ value: false
110
+ bf16:
111
+ value: true
112
+ bf16_full_eval:
113
+ value: false
114
+ bos_token_id:
115
+ value: null
116
+ chunk_size_feed_forward:
117
+ value: 0
118
+ data_args:
119
+ value:
120
+ buffer_size: 16384
121
+ cutoff_len: 2047
122
+ data_shared_file_system: false
123
+ dataset:
124
+ - Markie_Voss_t0_d34_r300
125
+ dataset_dir: /workspace/LlamaFactory/data
126
+ default_system: null
127
+ enable_thinking: false
128
+ eval_dataset: null
129
+ eval_num_beams: null
130
+ eval_on_each_dataset: false
131
+ ignore_pad_token_for_loss: true
132
+ interleave_probs: null
133
+ mask_history: false
134
+ max_samples: 100000000
135
+ media_dir: /workspace/LlamaFactory/data
136
+ mix_strategy: concat
137
+ neat_packing: false
138
+ overwrite_cache: false
139
+ packing: true
140
+ preprocessing_batch_size: 1000
141
+ preprocessing_num_workers: 16
142
+ streaming: false
143
+ template: qwen3_nothink
144
+ tokenized_path: null
145
+ tool_format: null
146
+ train_on_prompt: false
147
+ val_size: 0
148
+ data_seed:
149
+ value: null
150
+ dataloader_drop_last:
151
+ value: false
152
+ dataloader_num_workers:
153
+ value: 0
154
+ dataloader_persistent_workers:
155
+ value: false
156
+ dataloader_pin_memory:
157
+ value: true
158
+ dataloader_prefetch_factor:
159
+ value: null
160
+ ddp_backend:
161
+ value: null
162
+ ddp_broadcast_buffers:
163
+ value: null
164
+ ddp_bucket_cap_mb:
165
+ value: null
166
+ ddp_find_unused_parameters:
167
+ value: null
168
+ ddp_timeout:
169
+ value: 180000000
170
+ debug:
171
+ value: []
172
+ deepspeed:
173
+ value: null
174
+ disable_tqdm:
175
+ value: false
176
+ do_eval:
177
+ value: false
178
+ do_predict:
179
+ value: false
180
+ do_train:
181
+ value: true
182
+ dtype:
183
+ value: bfloat16
184
+ enable_jit_checkpoint:
185
+ value: false
186
+ eos_token_id:
187
+ value: 151645
188
+ eval_accumulation_steps:
189
+ value: null
190
+ eval_delay:
191
+ value: 0
192
+ eval_do_concat_batches:
193
+ value: true
194
+ eval_on_start:
195
+ value: false
196
+ eval_steps:
197
+ value: null
198
+ eval_strategy:
199
+ value: "no"
200
+ eval_use_gather_object:
201
+ value: false
202
+ finetuning_args:
203
+ value:
204
+ additional_target: null
205
+ apollo_layerwise: false
206
+ apollo_proj: random
207
+ apollo_proj_type: std
208
+ apollo_rank: 16
209
+ apollo_scale: 32
210
+ apollo_scale_front: false
211
+ apollo_scale_type: channel
212
+ apollo_target:
213
+ - all
214
+ apollo_update_interval: 200
215
+ badam_mask_mode: adjacent
216
+ badam_mode: layer
217
+ badam_start_block: null
218
+ badam_switch_interval: 50
219
+ badam_switch_mode: ascending
220
+ badam_update_ratio: 0.05
221
+ badam_verbose: 0
222
+ compute_accuracy: false
223
+ create_new_adapter: false
224
+ disable_shuffling: false
225
+ dpo_label_smoothing: 0
226
+ eaft_alpha: 1
227
+ early_stopping_steps: null
228
+ finetuning_type: lora
229
+ freeze_extra_modules: null
230
+ freeze_language_model: false
231
+ freeze_multi_modal_projector: true
232
+ freeze_trainable_layers: 2
233
+ freeze_trainable_modules:
234
+ - all
235
+ freeze_vision_tower: true
236
+ galore_layerwise: false
237
+ galore_proj_type: std
238
+ galore_rank: 16
239
+ galore_scale: 2
240
+ galore_target:
241
+ - all
242
+ galore_update_interval: 200
243
+ include_effective_tokens_per_second: false
244
+ kto_chosen_weight: 1
245
+ kto_rejected_weight: 1
246
+ ld_alpha: null
247
+ lora_alpha: 32
248
+ lora_dropout: 0.03
249
+ lora_rank: 16
250
+ lora_target:
251
+ - all
252
+ loraplus_lr_embedding: 1e-06
253
+ loraplus_lr_ratio: null
254
+ module_dropout: 0
255
+ oft_block_size: 32
256
+ oft_rank: 0
257
+ oft_target:
258
+ - all
259
+ pissa_convert: false
260
+ pissa_init: false
261
+ pissa_iter: 16
262
+ plot_loss: true
263
+ ppo_buffer_size: 1
264
+ ppo_epochs: 4
265
+ ppo_score_norm: false
266
+ ppo_target: 6
267
+ ppo_whiten_rewards: false
268
+ pref_bco_weight: 0
269
+ pref_beta: 0.1
270
+ pref_ftx: 0
271
+ pref_loss: sigmoid
272
+ pure_bf16: false
273
+ ref_model: null
274
+ ref_model_adapters: null
275
+ ref_model_quantization_bit: null
276
+ reward_model: null
277
+ reward_model_adapters: null
278
+ reward_model_quantization_bit: null
279
+ reward_model_type: lora
280
+ simpo_gamma: 0.5
281
+ stage: pt
282
+ swanlab_api_key: <SWANLAB_API_KEY>
283
+ swanlab_lark_secret: null
284
+ swanlab_lark_webhook_url: null
285
+ swanlab_logdir: null
286
+ swanlab_mode: cloud
287
+ swanlab_project: llamafactory
288
+ swanlab_run_name: null
289
+ swanlab_workspace: null
290
+ use_adam_mini: false
291
+ use_apollo: false
292
+ use_badam: false
293
+ use_dft_loss: false
294
+ use_dora: false
295
+ use_eaft_loss: false
296
+ use_galore: false
297
+ use_llama_pro: false
298
+ use_mca: false
299
+ use_muon: false
300
+ use_rslora: false
301
+ use_swanlab: false
302
+ fp8:
303
+ value: false
304
+ fp8_backend:
305
+ value: auto
306
+ fp8_enable_fsdp_float8_all_gather:
307
+ value: false
308
+ fp16:
309
+ value: false
310
+ fp16_full_eval:
311
+ value: false
312
+ fsdp:
313
+ value: []
314
+ fsdp_config:
315
+ value:
316
+ min_num_params: 0
317
+ xla: false
318
+ xla_fsdp_grad_ckpt: false
319
+ xla_fsdp_v2: false
320
+ full_determinism:
321
+ value: false
322
+ generating_args:
323
+ value:
324
+ do_sample: true
325
+ length_penalty: 1
326
+ max_new_tokens: 1024
327
+ num_beams: 1
328
+ repetition_penalty: 1
329
+ skip_special_tokens: true
330
+ temperature: 0.95
331
+ top_k: 50
332
+ top_p: 0.7
333
+ generation_config:
334
+ value: null
335
+ generation_max_length:
336
+ value: 2047
337
+ generation_num_beams:
338
+ value: null
339
+ gradient_accumulation_steps:
340
+ value: 1
341
+ gradient_checkpointing:
342
+ value: false
343
+ gradient_checkpointing_kwargs:
344
+ value: null
345
+ greater_is_better:
346
+ value: null
347
+ group_by_length:
348
+ value: false
349
+ head_dim:
350
+ value: 128
351
+ hidden_act:
352
+ value: silu
353
+ hidden_size:
354
+ value: 4096
355
+ hub_always_push:
356
+ value: false
357
+ hub_model_id:
358
+ value: null
359
+ hub_private_repo:
360
+ value: null
361
+ hub_revision:
362
+ value: null
363
+ hub_strategy:
364
+ value: every_save
365
+ hub_token:
366
+ value: <HUB_TOKEN>
367
+ id2label:
368
+ value:
369
+ "0": LABEL_0
370
+ "1": LABEL_1
371
+ ignore_data_skip:
372
+ value: false
373
+ include_for_metrics:
374
+ value: []
375
+ include_num_input_tokens_seen:
376
+ value: all
377
+ initializer_range:
378
+ value: 0.02
379
+ intermediate_size:
380
+ value: 12288
381
+ is_encoder_decoder:
382
+ value: false
383
+ label_names:
384
+ value:
385
+ - labels
386
+ label_smoothing_factor:
387
+ value: 0
388
+ label2id:
389
+ value:
390
+ LABEL_0: 0
391
+ LABEL_1: 1
392
+ layer_types:
393
+ value:
394
+ - full_attention
395
+ - full_attention
396
+ - full_attention
397
+ - full_attention
398
+ - full_attention
399
+ - full_attention
400
+ - full_attention
401
+ - full_attention
402
+ - full_attention
403
+ - full_attention
404
+ - full_attention
405
+ - full_attention
406
+ - full_attention
407
+ - full_attention
408
+ - full_attention
409
+ - full_attention
410
+ - full_attention
411
+ - full_attention
412
+ - full_attention
413
+ - full_attention
414
+ - full_attention
415
+ - full_attention
416
+ - full_attention
417
+ - full_attention
418
+ - full_attention
419
+ - full_attention
420
+ - full_attention
421
+ - full_attention
422
+ - full_attention
423
+ - full_attention
424
+ - full_attention
425
+ - full_attention
426
+ - full_attention
427
+ - full_attention
428
+ - full_attention
429
+ - full_attention
430
+ learning_rate:
431
+ value: 5e-05
432
+ length_column_name:
433
+ value: length
434
+ liger_kernel_config:
435
+ value: null
436
+ load_best_model_at_end:
437
+ value: false
438
+ local_rank:
439
+ value: -1
440
+ log_level:
441
+ value: passive
442
+ log_level_replica:
443
+ value: warning
444
+ log_on_each_node:
445
+ value: true
446
+ logging_dir:
447
+ value: null
448
+ logging_first_step:
449
+ value: false
450
+ logging_nan_inf_filter:
451
+ value: true
452
+ logging_steps:
453
+ value: 1
454
+ logging_strategy:
455
+ value: steps
456
+ lr_scheduler_kwargs:
457
+ value: null
458
+ lr_scheduler_type:
459
+ value: cosine
460
+ master_addr:
461
+ value: null
462
+ master_port:
463
+ value: null
464
+ max_grad_norm:
465
+ value: 1
466
+ max_position_embeddings:
467
+ value: 32768
468
+ max_steps:
469
+ value: -1
470
+ max_window_layers:
471
+ value: 36
472
+ metric_for_best_model:
473
+ value: null
474
+ model/num_parameters:
475
+ value: 8234382336
476
+ model_args:
477
+ value:
478
+ adapter_folder: null
479
+ adapter_name_or_path: null
480
+ add_special_tokens: null
481
+ add_tokens: null
482
+ audio_sampling_rate: 16000
483
+ block_diag_attn: false
484
+ cache_dir: null
485
+ chunk_size: 8192
486
+ compute_dtype: torch.bfloat16
487
+ cpu_infer: 32
488
+ crop_to_patches: false
489
+ device_map:
490
+ "": cuda:0
491
+ disable_gradient_checkpointing: false
492
+ double_quantization: true
493
+ enable_liger_kernel: false
494
+ export_device: cpu
495
+ export_dir: null
496
+ export_hub_model_id: null
497
+ export_legacy_format: false
498
+ export_quantization_bit: null
499
+ export_quantization_dataset: null
500
+ export_quantization_maxlen: 1024
501
+ export_quantization_nsamples: 128
502
+ export_size: 5
503
+ flash_attn: auto
504
+ hf_hub_token: <HF_HUB_TOKEN>
505
+ image_do_pan_and_scan: false
506
+ image_max_pixels: 589824
507
+ image_min_pixels: 1024
508
+ infer_backend: HF
509
+ infer_dtype: auto
510
+ init_special_tokens: noise_init
511
+ kt_force_think: false
512
+ kt_maxlen: 4096
513
+ kt_mode: normal
514
+ kt_optimize_rule: null
515
+ kt_use_cuda_graph: true
516
+ low_cpu_mem_usage: true
517
+ mixture_of_depths: null
518
+ mode: normal
519
+ model_max_length: 2047
520
+ model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
521
+ model_revision: main
522
+ moe_aux_loss_coef: null
523
+ ms_hub_token: <MS_HUB_TOKEN>
524
+ new_special_tokens_config: null
525
+ offload_folder: offload
526
+ om_hub_token: <OM_HUB_TOKEN>
527
+ print_param_status: false
528
+ quantization_bit: null
529
+ quantization_device_map: null
530
+ quantization_method: BNB
531
+ quantization_type: nf4
532
+ resize_vocab: false
533
+ rope_scaling: null
534
+ sglang_config: null
535
+ sglang_lora_backend: triton
536
+ sglang_maxlen: 4096
537
+ sglang_mem_fraction: 0.7
538
+ sglang_tp_size: -1
539
+ shift_attn: false
540
+ split_special_tokens: false
541
+ train_from_scratch: false
542
+ trust_remote_code: true
543
+ upcast_layernorm: false
544
+ upcast_lmhead_output: false
545
+ use_audio_in_video: false
546
+ use_fast_tokenizer: true
547
+ use_kt: false
548
+ use_kv_cache: true
549
+ use_reentrant_gc: true
550
+ use_unsloth: false
551
+ use_unsloth_gc: false
552
+ use_v1_kernels: false
553
+ video_fps: 2
554
+ video_max_pixels: 65536
555
+ video_maxlen: 128
556
+ video_min_pixels: 256
557
+ vllm_config: null
558
+ vllm_enforce_eager: false
559
+ vllm_gpu_util: 0.7
560
+ vllm_max_lora_rank: 32
561
+ vllm_maxlen: 4096
562
+ model_type:
563
+ value: qwen3
564
+ neftune_noise_alpha:
565
+ value: null
566
+ num_attention_heads:
567
+ value: 32
568
+ num_hidden_layers:
569
+ value: 36
570
+ num_key_value_heads:
571
+ value: 8
572
+ num_train_epochs:
573
+ value: 10
574
+ optim:
575
+ value: adamw_torch
576
+ optim_args:
577
+ value: null
578
+ optim_target_modules:
579
+ value: null
580
+ output_attentions:
581
+ value: false
582
+ output_dir:
583
+ value: /workspace/v127rc_exp2/B_dup
584
+ output_hidden_states:
585
+ value: false
586
+ overwrite_output_dir:
587
+ value: false
588
+ pad_token_id:
589
+ value: 151643
590
+ parallelism_config:
591
+ value: null
592
+ peft_config:
593
+ value:
594
+ default:
595
+ alora_invocation_tokens: null
596
+ arrow_config: null
597
+ auto_mapping: null
598
+ base_model_name_or_path: /workspace/Qwen/Qwen3-8B-Base
599
+ bias: none
600
+ corda_config: null
601
+ ensure_weight_tying: false
602
+ eva_config: null
603
+ exclude_modules: null
604
+ fan_in_fan_out: false
605
+ inference_mode: false
606
+ init_lora_weights: true
607
+ layer_replication: null
608
+ layers_pattern: null
609
+ layers_to_transform: null
610
+ lora_alpha: 32
611
+ lora_bias: false
612
+ lora_dropout: 0.03
613
+ megatron_config: null
614
+ megatron_core: megatron.core
615
+ modules_to_save: null
616
+ peft_type: LORA
617
+ peft_version: 0.18.1
618
+ qalora_group_size: 16
619
+ r: 16
620
+ revision: null
621
+ runtime_config:
622
+ ephemeral_gpu_offload: false
623
+ target_modules:
624
+ - gate_proj
625
+ - q_proj
626
+ - k_proj
627
+ - v_proj
628
+ - down_proj
629
+ - o_proj
630
+ - up_proj
631
+ target_parameters: null
632
+ task_type: CAUSAL_LM
633
+ trainable_token_indices: null
634
+ use_dora: false
635
+ use_qalora: false
636
+ use_rslora: false
637
+ per_device_eval_batch_size:
638
+ value: 8
639
+ per_device_train_batch_size:
640
+ value: 1
641
+ predict_with_generate:
642
+ value: false
643
+ prediction_loss_only:
644
+ value: false
645
+ problem_type:
646
+ value: null
647
+ project:
648
+ value: huggingface
649
+ push_to_hub:
650
+ value: false
651
+ ray_init_kwargs:
652
+ value: null
653
+ ray_num_workers:
654
+ value: 1
655
+ remove_unused_columns:
656
+ value: false
657
+ report_to:
658
+ value:
659
+ - wandb
660
+ restore_callback_states_from_checkpoint:
661
+ value: false
662
+ resume_from_checkpoint:
663
+ value: null
664
+ return_dict:
665
+ value: true
666
+ rms_norm_eps:
667
+ value: 1e-06
668
+ rope_parameters:
669
+ value:
670
+ rope_theta: 1000000
671
+ rope_type: default
672
+ run_name:
673
+ value: null
674
+ save_on_each_node:
675
+ value: false
676
+ save_only_model:
677
+ value: true
678
+ save_steps:
679
+ value: 1000
680
+ save_strategy:
681
+ value: steps
682
+ save_total_limit:
683
+ value: null
684
+ seed:
685
+ value: 42
686
+ skip_memory_metrics:
687
+ value: true
688
+ sliding_window:
689
+ value: null
690
+ sortish_sampler:
691
+ value: false
692
+ tf32:
693
+ value: null
694
+ tie_word_embeddings:
695
+ value: false
696
+ torch_compile:
697
+ value: false
698
+ torch_compile_backend:
699
+ value: null
700
+ torch_compile_mode:
701
+ value: null
702
+ torch_empty_cache_steps:
703
+ value: null
704
+ trackio_space_id:
705
+ value: trackio
706
+ transformers_version:
707
+ value: 5.0.0
708
+ use_cache:
709
+ value: false
710
+ use_cpu:
711
+ value: false
712
+ use_liger_kernel:
713
+ value: false
714
+ use_sliding_window:
715
+ value: false
716
+ vocab_size:
717
+ value: 151936
718
+ warmup_ratio:
719
+ value: 0.01
720
+ warmup_steps:
721
+ value: 0.01
722
+ weight_decay:
723
+ value: 0
LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/output.log ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 0%| | 0/111360 [00:00<?, ?it/s]/usr/local/lib/python3.11/dist-packages/torch/utils/checkpoint.py:295: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.
2
+ with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
3
+
4
+ {'loss': '1.197', 'grad_norm': '0.2201', 'learning_rate': '0', 'epoch': '8.98e-05', 'num_input_tokens_seen': 2047, 'train_runtime': '2.88', 'train_tokens_per_second': '710.7'}
5
+ {'loss': '0.8941', 'grad_norm': '0.2468', 'learning_rate': '4.488e-08', 'epoch': '0.0001796', 'num_input_tokens_seen': 4094, 'train_runtime': '3.913', 'train_tokens_per_second': '1046'}
6
+ {'loss': '1.579', 'grad_norm': '0.2684', 'learning_rate': '8.977e-08', 'epoch': '0.0002694', 'num_input_tokens_seen': 6141, 'train_runtime': '4.948', 'train_tokens_per_second': '1241'}
7
+ {'loss': '1.147', 'grad_norm': '0.2446', 'learning_rate': '1.346e-07', 'epoch': '0.0003592', 'num_input_tokens_seen': 8188, 'train_runtime': '5.982', 'train_tokens_per_second': '1369'}
8
+ {'loss': '1.177', 'grad_norm': '0.2678', 'learning_rate': '1.795e-07', 'epoch': '0.000449', 'num_input_tokens_seen': 10235, 'train_runtime': '7.017', 'train_tokens_per_second': '1459'}
9
+ {'loss': '0.8573', 'grad_norm': '0.2322', 'learning_rate': '2.244e-07', 'epoch': '0.0005388', 'num_input_tokens_seen': 12282, 'train_runtime': '8.052', 'train_tokens_per_second': '1525'}
10
+ {'loss': '0.8787', 'grad_norm': '0.2275', 'learning_rate': '2.693e-07', 'epoch': '0.0006286', 'num_input_tokens_seen': 14329, 'train_runtime': '9.087', 'train_tokens_per_second': '1577'}
11
+ {'loss': '1.98', 'grad_norm': '0.2846', 'learning_rate': '3.142e-07', 'epoch': '0.0007184', 'num_input_tokens_seen': 16376, 'train_runtime': '10.12', 'train_tokens_per_second': '1618'}
12
+ {'loss': '2.053', 'grad_norm': '0.4371', 'learning_rate': '3.591e-07', 'epoch': '0.0008082', 'num_input_tokens_seen': 18423, 'train_runtime': '11.16', 'train_tokens_per_second': '1651'}
13
+ {'loss': '1.455', 'grad_norm': '0.2537', 'learning_rate': '4.039e-07', 'epoch': '0.000898', 'num_input_tokens_seen': 20470, 'train_runtime': '12.2', 'train_tokens_per_second': '1678'}
14
+ {'loss': '1.286', 'grad_norm': '0.2939', 'learning_rate': '4.488e-07', 'epoch': '0.0009878', 'num_input_tokens_seen': 22517, 'train_runtime': '13.24', 'train_tokens_per_second': '1701'}
15
+ {'loss': '1.278', 'grad_norm': '0.2923', 'learning_rate': '4.937e-07', 'epoch': '0.001078', 'num_input_tokens_seen': 24564, 'train_runtime': '14.27', 'train_tokens_per_second': '1721'}
16
+ {'loss': '0.844', 'grad_norm': '0.2332', 'learning_rate': '5.386e-07', 'epoch': '0.001167', 'num_input_tokens_seen': 26611, 'train_runtime': '15.31', 'train_tokens_per_second': '1738'}
17
+ {'loss': '2.021', 'grad_norm': '0.3572', 'learning_rate': '5.835e-07', 'epoch': '0.001257', 'num_input_tokens_seen': 28658, 'train_runtime': '16.35', 'train_tokens_per_second': '1753'}
18
+ {'loss': '0.9653', 'grad_norm': '0.2073', 'learning_rate': '6.284e-07', 'epoch': '0.001347', 'num_input_tokens_seen': 30705, 'train_runtime': '17.38', 'train_tokens_per_second': '1766'}
19
+ {'loss': '1.164', 'grad_norm': '0.2285', 'learning_rate': '6.732e-07', 'epoch': '0.001437', 'num_input_tokens_seen': 32752, 'train_runtime': '18.42', 'train_tokens_per_second': '1778'}
20
+ {'loss': '1.265', 'grad_norm': '0.2519', 'learning_rate': '7.181e-07', 'epoch': '0.001527', 'num_input_tokens_seen': 34799, 'train_runtime': '19.46', 'train_tokens_per_second': '1788'}
21
+ {'loss': '0.907', 'grad_norm': '0.2419', 'learning_rate': '7.63e-07', 'epoch': '0.001616', 'num_input_tokens_seen': 36846, 'train_runtime': '20.5', 'train_tokens_per_second': '1797'}
22
+ {'loss': '1.177', 'grad_norm': '0.2575', 'learning_rate': '8.079e-07', 'epoch': '0.001706', 'num_input_tokens_seen': 38893, 'train_runtime': '21.54', 'train_tokens_per_second': '1806'}
23
+ {'loss': '1.278', 'grad_norm': '0.2442', 'learning_rate': '8.528e-07', 'epoch': '0.001796', 'num_input_tokens_seen': 40940, 'train_runtime': '22.57', 'train_tokens_per_second': '1814'}
24
+ {'loss': '1.878', 'grad_norm': '0.3', 'learning_rate': '8.977e-07', 'epoch': '0.001886', 'num_input_tokens_seen': 42987, 'train_runtime': '23.61', 'train_tokens_per_second': '1821'}
25
+ {'loss': '0.9665', 'grad_norm': '0.2153', 'learning_rate': '9.425e-07', 'epoch': '0.001976', 'num_input_tokens_seen': 45034, 'train_runtime': '24.65', 'train_tokens_per_second': '1827'}
26
+ {'loss': '1.523', 'grad_norm': '0.2703', 'learning_rate': '9.874e-07', 'epoch': '0.002065', 'num_input_tokens_seen': 47081, 'train_runtime': '25.68', 'train_tokens_per_second': '1833'}
27
+ {'loss': '0.8783', 'grad_norm': '0.3028', 'learning_rate': '1.032e-06', 'epoch': '0.002155', 'num_input_tokens_seen': 49128, 'train_runtime': '26.72', 'train_tokens_per_second': '1839'}
28
+ {'loss': '1.575', 'grad_norm': '0.2798', 'learning_rate': '1.077e-06', 'epoch': '0.002245', 'num_input_tokens_seen': 51175, 'train_runtime': '27.76', 'train_tokens_per_second': '1844'}
29
+ {'loss': '0.8776', 'grad_norm': '0.2335', 'learning_rate': '1.122e-06', 'epoch': '0.002335', 'num_input_tokens_seen': 53222, 'train_runtime': '28.8', 'train_tokens_per_second': '1848'}
30
+ {'loss': '1.506', 'grad_norm': '0.2822', 'learning_rate': '1.167e-06', 'epoch': '0.002425', 'num_input_tokens_seen': 55269, 'train_runtime': '29.83', 'train_tokens_per_second': '1853'}
31
+ {'loss': '1.254', 'grad_norm': '0.2654', 'learning_rate': '1.212e-06', 'epoch': '0.002514', 'num_input_tokens_seen': 57316, 'train_runtime': '30.87', 'train_tokens_per_second': '1857'}
32
+ {'loss': '1.6', 'grad_norm': '0.2917', 'learning_rate': '1.257e-06', 'epoch': '0.002604', 'num_input_tokens_seen': 59363, 'train_runtime': '31.91', 'train_tokens_per_second': '1860'}
33
+ {'loss': '1.528', 'grad_norm': '0.2771', 'learning_rate': '1.302e-06', 'epoch': '0.002694', 'num_input_tokens_seen': 61410, 'train_runtime': '32.95', 'train_tokens_per_second': '1864'}
34
+ {'loss': '1.289', 'grad_norm': '0.2888', 'learning_rate': '1.346e-06', 'epoch': '0.002784', 'num_input_tokens_seen': 63457, 'train_runtime': '34', 'train_tokens_per_second': '1867'}
35
+ {'loss': '1.432', 'grad_norm': '0.2445', 'learning_rate': '1.391e-06', 'epoch': '0.002874', 'num_input_tokens_seen': 65504, 'train_runtime': '35.03', 'train_tokens_per_second': '1870'}
36
+ {'loss': '1.927', 'grad_norm': '0.3254', 'learning_rate': '1.436e-06', 'epoch': '0.002963', 'num_input_tokens_seen': 67551, 'train_runtime': '36.07', 'train_tokens_per_second': '1873'}
37
+ {'loss': '1.006', 'grad_norm': '0.2496', 'learning_rate': '1.481e-06', 'epoch': '0.003053', 'num_input_tokens_seen': 69598, 'train_runtime': '37.11', 'train_tokens_per_second': '1876'}
38
+ {'loss': '1.744', 'grad_norm': '0.2966', 'learning_rate': '1.526e-06', 'epoch': '0.003143', 'num_input_tokens_seen': 71645, 'train_runtime': '38.14', 'train_tokens_per_second': '1878'}
39
+ {'loss': '1.206', 'grad_norm': '0.2677', 'learning_rate': '1.571e-06', 'epoch': '0.003233', 'num_input_tokens_seen': 73692, 'train_runtime': '39.18', 'train_tokens_per_second': '1881'}
40
+ {'loss': '1.238', 'grad_norm': '0.2755', 'learning_rate': '1.616e-06', 'epoch': '0.003323', 'num_input_tokens_seen': 75739, 'train_runtime': '40.22', 'train_tokens_per_second': '1883'}
41
+ {'loss': '1.28', 'grad_norm': '0.2542', 'learning_rate': '1.661e-06', 'epoch': '0.003412', 'num_input_tokens_seen': 77786, 'train_runtime': '41.26', 'train_tokens_per_second': '1885'}
42
+ {'loss': '1.879', 'grad_norm': '0.3915', 'learning_rate': '1.706e-06', 'epoch': '0.003502', 'num_input_tokens_seen': 79833, 'train_runtime': '42.3', 'train_tokens_per_second': '1887'}
43
+ {'loss': '1.556', 'grad_norm': '0.2721', 'learning_rate': '1.75e-06', 'epoch': '0.003592', 'num_input_tokens_seen': 81880, 'train_runtime': '43.34', 'train_tokens_per_second': '1889'}
44
+ {'loss': '1.182', 'grad_norm': '0.235', 'learning_rate': '1.795e-06', 'epoch': '0.003682', 'num_input_tokens_seen': 83927, 'train_runtime': '44.38', 'train_tokens_per_second': '1891'}
45
+ {'loss': '1.167', 'grad_norm': '0.2633', 'learning_rate': '1.84e-06', 'epoch': '0.003772', 'num_input_tokens_seen': 85974, 'train_runtime': '45.42', 'train_tokens_per_second': '1893'}
46
+ {'loss': '2.292', 'grad_norm': '0.3385', 'learning_rate': '1.885e-06', 'epoch': '0.003861', 'num_input_tokens_seen': 88021, 'train_runtime': '46.45', 'train_tokens_per_second': '1895'}
47
+ {'loss': '1.286', 'grad_norm': '0.2488', 'learning_rate': '1.93e-06', 'epoch': '0.003951', 'num_input_tokens_seen': 90068, 'train_runtime': '47.5', 'train_tokens_per_second': '1896'}
48
+ {'loss': '1.573', 'grad_norm': '0.279', 'learning_rate': '1.975e-06', 'epoch': '0.004041', 'num_input_tokens_seen': 92115, 'train_runtime': '48.54', 'train_tokens_per_second': '1898'}
49
+ {'loss': '1.246', 'grad_norm': '0.2553', 'learning_rate': '2.02e-06', 'epoch': '0.004131', 'num_input_tokens_seen': 94162, 'train_runtime': '49.58', 'train_tokens_per_second': '1899'}
50
+ {'loss': '1.32', 'grad_norm': '0.2738', 'learning_rate': '2.065e-06', 'epoch': '0.004221', 'num_input_tokens_seen': 96209, 'train_runtime': '50.61', 'train_tokens_per_second': '1901'}
51
+ {'loss': '0.9932', 'grad_norm': '0.4337', 'learning_rate': '2.11e-06', 'epoch': '0.00431', 'num_input_tokens_seen': 98256, 'train_runtime': '51.66', 'train_tokens_per_second': '1902'}
52
+ {'loss': '2.289', 'grad_norm': '0.312', 'learning_rate': '2.154e-06', 'epoch': '0.0044', 'num_input_tokens_seen': 100303, 'train_runtime': '52.7', 'train_tokens_per_second': '1903'}
53
+ {'loss': '2.25', 'grad_norm': '0.3376', 'learning_rate': '2.199e-06', 'epoch': '0.00449', 'num_input_tokens_seen': 102350, 'train_runtime': '53.74', 'train_tokens_per_second': '1905'}
54
+ {'loss': '1.754', 'grad_norm': '0.6014', 'learning_rate': '2.244e-06', 'epoch': '0.00458', 'num_input_tokens_seen': 104397, 'train_runtime': '54.78', 'train_tokens_per_second': '1906'}
55
+ {'loss': '1.62', 'grad_norm': '0.3797', 'learning_rate': '2.289e-06', 'epoch': '0.00467', 'num_input_tokens_seen': 106444, 'train_runtime': '55.81', 'train_tokens_per_second': '1907'}
56
+ {'loss': '1.552', 'grad_norm': '0.2929', 'learning_rate': '2.334e-06', 'epoch': '0.004759', 'num_input_tokens_seen': 108491, 'train_runtime': '56.86', 'train_tokens_per_second': '1908'}
57
+ {'loss': '0.989', 'grad_norm': '0.2345', 'learning_rate': '2.379e-06', 'epoch': '0.004849', 'num_input_tokens_seen': 110538, 'train_runtime': '57.9', 'train_tokens_per_second': '1909'}
58
+ {'loss': '1.245', 'grad_norm': '0.2963', 'learning_rate': '2.424e-06', 'epoch': '0.004939', 'num_input_tokens_seen': 112585, 'train_runtime': '58.94', 'train_tokens_per_second': '1910'}
59
+ {'loss': '2.028', 'grad_norm': '0.3705', 'learning_rate': '2.469e-06', 'epoch': '0.005029', 'num_input_tokens_seen': 114632, 'train_runtime': '59.98', 'train_tokens_per_second': '1911'}
60
+ {'loss': '1.265', 'grad_norm': '0.2613', 'learning_rate': '2.513e-06', 'epoch': '0.005119', 'num_input_tokens_seen': 116679, 'train_runtime': '61.02', 'train_tokens_per_second': '1912'}
61
+ {'loss': '2.028', 'grad_norm': '0.325', 'learning_rate': '2.558e-06', 'epoch': '0.005208', 'num_input_tokens_seen': 118726, 'train_runtime': '62.06', 'train_tokens_per_second': '1913'}
62
+ {'loss': '2.258', 'grad_norm': '0.316', 'learning_rate': '2.603e-06', 'epoch': '0.005298', 'num_input_tokens_seen': 120773, 'train_runtime': '63.1', 'train_tokens_per_second': '1914'}
63
+ {'loss': '1.406', 'grad_norm': '0.279', 'learning_rate': '2.648e-06', 'epoch': '0.005388', 'num_input_tokens_seen': 122820, 'train_runtime': '64.14', 'train_tokens_per_second': '1915'}
64
+ {'loss': '1.469', 'grad_norm': '0.2524', 'learning_rate': '2.693e-06', 'epoch': '0.005478', 'num_input_tokens_seen': 124867, 'train_runtime': '65.18', 'train_tokens_per_second': '1916'}
65
+ {'loss': '1.248', 'grad_norm': '0.2516', 'learning_rate': '2.738e-06', 'epoch': '0.005568', 'num_input_tokens_seen': 126914, 'train_runtime': '66.22', 'train_tokens_per_second': '1917'}
66
+ {'loss': '1.191', 'grad_norm': '0.2327', 'learning_rate': '2.783e-06', 'epoch': '0.005657', 'num_input_tokens_seen': 128961, 'train_runtime': '67.26', 'train_tokens_per_second': '1917'}
67
+ {'loss': '2.004', 'grad_norm': '0.3521', 'learning_rate': '2.828e-06', 'epoch': '0.005747', 'num_input_tokens_seen': 131008, 'train_runtime': '68.3', 'train_tokens_per_second': '1918'}
68
+ {'loss': '1.184', 'grad_norm': '0.2845', 'learning_rate': '2.873e-06', 'epoch': '0.005837', 'num_input_tokens_seen': 133055, 'train_runtime': '69.34', 'train_tokens_per_second': '1919'}
69
+ {'loss': '1.941', 'grad_norm': '0.3393', 'learning_rate': '2.917e-06', 'epoch': '0.005927', 'num_input_tokens_seen': 135102, 'train_runtime': '70.38', 'train_tokens_per_second': '1920'}
70
+ {'loss': '1.352', 'grad_norm': '0.3023', 'learning_rate': '2.962e-06', 'epoch': '0.006017', 'num_input_tokens_seen': 137149, 'train_runtime': '71.42', 'train_tokens_per_second': '1920'}
71
+ {'loss': '1.517', 'grad_norm': '0.3025', 'learning_rate': '3.007e-06', 'epoch': '0.006106', 'num_input_tokens_seen': 139196, 'train_runtime': '72.47', 'train_tokens_per_second': '1921'}
72
+ {'loss': '1.266', 'grad_norm': '0.3097', 'learning_rate': '3.052e-06', 'epoch': '0.006196', 'num_input_tokens_seen': 141243, 'train_runtime': '73.51', 'train_tokens_per_second': '1921'}
73
+ {'loss': '1.509', 'grad_norm': '0.3378', 'learning_rate': '3.097e-06', 'epoch': '0.006286', 'num_input_tokens_seen': 143290, 'train_runtime': '74.55', 'train_tokens_per_second': '1922'}
74
+ {'loss': '1.498', 'grad_norm': '0.3045', 'learning_rate': '3.142e-06', 'epoch': '0.006376', 'num_input_tokens_seen': 145337, 'train_runtime': '75.6', 'train_tokens_per_second': '1922'}
75
+ {'loss': '1.202', 'grad_norm': '0.2492', 'learning_rate': '3.187e-06', 'epoch': '0.006466', 'num_input_tokens_seen': 147384, 'train_runtime': '76.64', 'train_tokens_per_second': '1923'}
76
+ {'loss': '1.833', 'grad_norm': '0.3606', 'learning_rate': '3.232e-06', 'epoch': '0.006555', 'num_input_tokens_seen': 149431, 'train_runtime': '77.68', 'train_tokens_per_second': '1924'}
77
+ {'loss': '2.342', 'grad_norm': '0.3516', 'learning_rate': '3.276e-06', 'epoch': '0.006645', 'num_input_tokens_seen': 151478, 'train_runtime': '78.72', 'train_tokens_per_second': '1924'}
78
+ {'loss': '0.9371', 'grad_norm': '0.2143', 'learning_rate': '3.321e-06', 'epoch': '0.006735', 'num_input_tokens_seen': 153525, 'train_runtime': '79.77', 'train_tokens_per_second': '1925'}
79
+ {'loss': '1.192', 'grad_norm': '0.2637', 'learning_rate': '3.366e-06', 'epoch': '0.006825', 'num_input_tokens_seen': 155572, 'train_runtime': '80.81', 'train_tokens_per_second': '1925'}
80
+ {'loss': '0.8591', 'grad_norm': '0.2422', 'learning_rate': '3.411e-06', 'epoch': '0.006915', 'num_input_tokens_seen': 157619, 'train_runtime': '81.85', 'train_tokens_per_second': '1926'}
81
+ {'loss': '1.28', 'grad_norm': '0.3029', 'learning_rate': '3.456e-06', 'epoch': '0.007004', 'num_input_tokens_seen': 159666, 'train_runtime': '82.9', 'train_tokens_per_second': '1926'}
82
+ {'loss': '1.152', 'grad_norm': '0.3476', 'learning_rate': '3.501e-06', 'epoch': '0.007094', 'num_input_tokens_seen': 161713, 'train_runtime': '83.94', 'train_tokens_per_second': '1927'}
83
+ {'loss': '1.221', 'grad_norm': '0.255', 'learning_rate': '3.546e-06', 'epoch': '0.007184', 'num_input_tokens_seen': 163760, 'train_runtime': '84.98', 'train_tokens_per_second': '1927'}
84
+ {'loss': '1.569', 'grad_norm': '0.3112', 'learning_rate': '3.591e-06', 'epoch': '0.007274', 'num_input_tokens_seen': 165807, 'train_runtime': '86.02', 'train_tokens_per_second': '1928'}
85
+ {'loss': '1.185', 'grad_norm': '0.2869', 'learning_rate': '3.636e-06', 'epoch': '0.007364', 'num_input_tokens_seen': 167854, 'train_runtime': '87.06', 'train_tokens_per_second': '1928'}
86
+ {'loss': '1.581', 'grad_norm': '0.3221', 'learning_rate': '3.68e-06', 'epoch': '0.007453', 'num_input_tokens_seen': 169901, 'train_runtime': '88.1', 'train_tokens_per_second': '1928'}
87
+ {'loss': '1.226', 'grad_norm': '0.2599', 'learning_rate': '3.725e-06', 'epoch': '0.007543', 'num_input_tokens_seen': 171948, 'train_runtime': '89.15', 'train_tokens_per_second': '1929'}
88
+ {'loss': '1.89', 'grad_norm': '0.3248', 'learning_rate': '3.77e-06', 'epoch': '0.007633', 'num_input_tokens_seen': 173995, 'train_runtime': '90.19', 'train_tokens_per_second': '1929'}
89
+ {'loss': '1.249', 'grad_norm': '0.2722', 'learning_rate': '3.815e-06', 'epoch': '0.007723', 'num_input_tokens_seen': 176042, 'train_runtime': '91.23', 'train_tokens_per_second': '1930'}
90
+ {'loss': '1.531', 'grad_norm': '0.2899', 'learning_rate': '3.86e-06', 'epoch': '0.007812', 'num_input_tokens_seen': 178089, 'train_runtime': '92.27', 'train_tokens_per_second': '1930'}
91
+ {'loss': '1.192', 'grad_norm': '0.2562', 'learning_rate': '3.905e-06', 'epoch': '0.007902', 'num_input_tokens_seen': 180136, 'train_runtime': '93.31', 'train_tokens_per_second': '1931'}
92
+ {'loss': '1.206', 'grad_norm': '0.2712', 'learning_rate': '3.95e-06', 'epoch': '0.007992', 'num_input_tokens_seen': 182183, 'train_runtime': '94.35', 'train_tokens_per_second': '1931'}
93
+ {'loss': '1.52', 'grad_norm': '0.2991', 'learning_rate': '3.995e-06', 'epoch': '0.008082', 'num_input_tokens_seen': 184230, 'train_runtime': '95.4', 'train_tokens_per_second': '1931'}
94
+ {'loss': '1.533', 'grad_norm': '0.3098', 'learning_rate': '4.039e-06', 'epoch': '0.008172', 'num_input_tokens_seen': 186277, 'train_runtime': '96.44', 'train_tokens_per_second': '1932'}
95
+ {'loss': '1.001', 'grad_norm': '0.2536', 'learning_rate': '4.084e-06', 'epoch': '0.008261', 'num_input_tokens_seen': 188324, 'train_runtime': '97.48', 'train_tokens_per_second': '1932'}
96
+ {'loss': '2.11', 'grad_norm': '0.4001', 'learning_rate': '4.129e-06', 'epoch': '0.008351', 'num_input_tokens_seen': 190371, 'train_runtime': '98.52', 'train_tokens_per_second': '1932'}
97
+ {'loss': '1.491', 'grad_norm': '0.3261', 'learning_rate': '4.174e-06', 'epoch': '0.008441', 'num_input_tokens_seen': 192418, 'train_runtime': '99.56', 'train_tokens_per_second': '1933'}
98
+ {'loss': '1.224', 'grad_norm': '0.2763', 'learning_rate': '4.219e-06', 'epoch': '0.008531', 'num_input_tokens_seen': 194465, 'train_runtime': '100.6', 'train_tokens_per_second': '1933'}
99
+ {'loss': '1.193', 'grad_norm': '0.3007', 'learning_rate': '4.264e-06', 'epoch': '0.008621', 'num_input_tokens_seen': 196512, 'train_runtime': '101.6', 'train_tokens_per_second': '1933'}
100
+ {'loss': '1.214', 'grad_norm': '0.3006', 'learning_rate': '4.309e-06', 'epoch': '0.00871', 'num_input_tokens_seen': 198559, 'train_runtime': '102.7', 'train_tokens_per_second': '1933'}
101
+ {'loss': '1.213', 'grad_norm': '0.2925', 'learning_rate': '4.354e-06', 'epoch': '0.0088', 'num_input_tokens_seen': 200606, 'train_runtime': '103.7', 'train_tokens_per_second': '1934'}
102
+ {'loss': '0.8758', 'grad_norm': '0.259', 'learning_rate': '4.399e-06', 'epoch': '0.00889', 'num_input_tokens_seen': 202653, 'train_runtime': '104.8', 'train_tokens_per_second': '1934'}
103
+ {'loss': '0.943', 'grad_norm': '0.2397', 'learning_rate': '4.443e-06', 'epoch': '0.00898', 'num_input_tokens_seen': 204700, 'train_runtime': '105.8', 'train_tokens_per_second': '1934'}
104
+ {'loss': '1.219', 'grad_norm': '0.2605', 'learning_rate': '4.488e-06', 'epoch': '0.00907', 'num_input_tokens_seen': 206747, 'train_runtime': '106.9', 'train_tokens_per_second': '1935'}
105
+ {'loss': '0.863', 'grad_norm': '0.2432', 'learning_rate': '4.533e-06', 'epoch': '0.009159', 'num_input_tokens_seen': 208794, 'train_runtime': '107.9', 'train_tokens_per_second': '1935'}
106
+ {'loss': '1.31', 'grad_norm': '0.2713', 'learning_rate': '4.578e-06', 'epoch': '0.009249', 'num_input_tokens_seen': 210841, 'train_runtime': '108.9', 'train_tokens_per_second': '1935'}
107
+ {'loss': '1.412', 'grad_norm': '0.3386', 'learning_rate': '4.623e-06', 'epoch': '0.009339', 'num_input_tokens_seen': 212888, 'train_runtime': '110', 'train_tokens_per_second': '1935'}
108
+ {'loss': '2.002', 'grad_norm': '0.3573', 'learning_rate': '4.668e-06', 'epoch': '0.009429', 'num_input_tokens_seen': 214935, 'train_runtime': '111', 'train_tokens_per_second': '1936'}
109
+ {'loss': '0.8411', 'grad_norm': '0.2601', 'learning_rate': '4.713e-06', 'epoch': '0.009519', 'num_input_tokens_seen': 216982, 'train_runtime': '112.1', 'train_tokens_per_second': '1936'}
110
+ {'loss': '1.195', 'grad_norm': '0.3129', 'learning_rate': '4.758e-06', 'epoch': '0.009608', 'num_input_tokens_seen': 219029, 'train_runtime': '113.1', 'train_tokens_per_second': '1936'}
111
+ {'loss': '1.785', 'grad_norm': '0.3384', 'learning_rate': '4.803e-06', 'epoch': '0.009698', 'num_input_tokens_seen': 221076, 'train_runtime': '114.2', 'train_tokens_per_second': '1936'}
112
+ {'loss': '1.227', 'grad_norm': '0.2873', 'learning_rate': '4.847e-06', 'epoch': '0.009788', 'num_input_tokens_seen': 223123, 'train_runtime': '115.2', 'train_tokens_per_second': '1937'}
113
+ {'loss': '1.183', 'grad_norm': '0.2764', 'learning_rate': '4.892e-06', 'epoch': '0.009878', 'num_input_tokens_seen': 225170, 'train_runtime': '116.3', 'train_tokens_per_second': '1937'}
114
+ {'loss': '1.201', 'grad_norm': '0.289', 'learning_rate': '4.937e-06', 'epoch': '0.009968', 'num_input_tokens_seen': 227217, 'train_runtime': '117.3', 'train_tokens_per_second': '1937'}
115
+ {'loss': '1.396', 'grad_norm': '0.3248', 'learning_rate': '4.982e-06', 'epoch': '0.01006', 'num_input_tokens_seen': 229264, 'train_runtime': '118.3', 'train_tokens_per_second': '1937'}
116
+ {'loss': '1.483', 'grad_norm': '0.3318', 'learning_rate': '5.027e-06', 'epoch': '0.01015', 'num_input_tokens_seen': 231311, 'train_runtime': '119.4', 'train_tokens_per_second': '1938'}
117
+ {'loss': '0.9033', 'grad_norm': '0.2992', 'learning_rate': '5.072e-06', 'epoch': '0.01024', 'num_input_tokens_seen': 233358, 'train_runtime': '120.4', 'train_tokens_per_second': '1938'}
118
+ {'loss': '1.267', 'grad_norm': '0.2815', 'learning_rate': '5.117e-06', 'epoch': '0.01033', 'num_input_tokens_seen': 235405, 'train_runtime': '121.5', 'train_tokens_per_second': '1938'}
119
+ {'loss': '1.461', 'grad_norm': '0.3357', 'learning_rate': '5.162e-06', 'epoch': '0.01042', 'num_input_tokens_seen': 237452, 'train_runtime': '122.5', 'train_tokens_per_second': '1938'}
120
+ {'loss': '1.264', 'grad_norm': '0.3197', 'learning_rate': '5.206e-06', 'epoch': '0.01051', 'num_input_tokens_seen': 239499, 'train_runtime': '123.6', 'train_tokens_per_second': '1938'}
121
+ {'loss': '1.229', 'grad_norm': '0.2616', 'learning_rate': '5.251e-06', 'epoch': '0.0106', 'num_input_tokens_seen': 241546, 'train_runtime': '124.6', 'train_tokens_per_second': '1939'}
122
+ {'loss': '0.9597', 'grad_norm': '0.2802', 'learning_rate': '5.296e-06', 'epoch': '0.01069', 'num_input_tokens_seen': 243593, 'train_runtime': '125.6', 'train_tokens_per_second': '1939'}
123
+ {'loss': '1.521', 'grad_norm': '0.3153', 'learning_rate': '5.341e-06', 'epoch': '0.01078', 'num_input_tokens_seen': 245640, 'train_runtime': '126.7', 'train_tokens_per_second': '1939'}
124
+ {'loss': '1.466', 'grad_norm': '0.3256', 'learning_rate': '5.386e-06', 'epoch': '0.01087', 'num_input_tokens_seen': 247687, 'train_runtime': '127.7', 'train_tokens_per_second': '1939'}
125
+ File "/usr/local/bin/llamafactory-cli", line 8, in <module>
126
+ sys.exit(main())
127
+ ^^^^^^
128
+ File "/workspace/LlamaFactory/src/llamafactory/cli.py", line 24, in main
129
+ launcher.launch()
130
+ File "/workspace/LlamaFactory/src/llamafactory/launcher.py", line 157, in launch
131
+ run_exp()
132
+ File "/workspace/LlamaFactory/src/llamafactory/train/tuner.py", line 125, in run_exp
133
+ _training_function(config={"args": args, "callbacks": callbacks})
134
+ File "/workspace/LlamaFactory/src/llamafactory/train/tuner.py", line 91, in _training_function
135
+ run_pt(model_args, data_args, training_args, finetuning_args, callbacks)
136
+ File "/workspace/LlamaFactory/src/llamafactory/train/pt/workflow.py", line 63, in run_pt
137
+ train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
138
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
139
+ File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2174, in train
140
+ return inner_training_loop(
141
+ ^^^^^^^^^^^^^^^^^^^^
142
+ File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2536, in _inner_training_loop
143
+ tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
144
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
145
+ File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3837, in training_step
146
+ self.accelerator.backward(loss, **kwargs)
147
+ File "/usr/local/lib/python3.11/dist-packages/accelerate/accelerator.py", line 2740, in backward
148
+ loss.backward(**kwargs)
149
+ File "/usr/local/lib/python3.11/dist-packages/torch/_tensor.py", line 521, in backward
150
+ torch.autograd.backward(
151
+ File "/usr/local/lib/python3.11/dist-packages/torch/autograd/__init__.py", line 289, in backward
152
+ _engine_run_backward(
153
+ File "/usr/local/lib/python3.11/dist-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
154
+ return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
155
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
156
+ KeyboardInterrupt
LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/requirements.txt ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pytz==2025.2
2
+ pydub==0.25.1
3
+ brotli==1.2.0
4
+ antlr4-python3-runtime==4.9.3
5
+ xxhash==3.6.0
6
+ websockets==15.0.1
7
+ tzdata==2025.3
8
+ typing_extensions==4.15.0
9
+ tqdm==4.67.3
10
+ tomlkit==0.13.3
11
+ termcolor==3.3.0
12
+ shtab==1.8.0
13
+ shellingham==1.5.4
14
+ sentencepiece==0.2.1
15
+ semantic-version==2.10.0
16
+ safetensors==0.7.0
17
+ ruff==0.15.0
18
+ regex==2026.1.15
19
+ python-multipart==0.0.22
20
+ pyparsing==3.3.2
21
+ pyarrow==23.0.0
22
+ protobuf==6.33.5
23
+ propcache==0.4.1
24
+ orjson==3.11.7
25
+ omegaconf==2.3.0
26
+ numpy==2.4.2
27
+ multidict==6.7.1
28
+ mdurl==0.1.2
29
+ kiwisolver==1.4.9
30
+ hf-xet==1.2.0
31
+ hf_transfer==0.1.9
32
+ groovy==0.1.2
33
+ frozenlist==1.8.0
34
+ fonttools==4.61.1
35
+ ffmpy==1.0.0
36
+ einops==0.8.2
37
+ docstring_parser==0.17.0
38
+ dill==0.3.8
39
+ cycler==0.12.1
40
+ click==8.3.1
41
+ av==16.0.0
42
+ annotated-types==0.7.0
43
+ annotated-doc==0.0.4
44
+ aiohappyeyeballs==2.6.1
45
+ aiofiles==24.1.0
46
+ yarl==1.22.0
47
+ uvicorn==0.40.0
48
+ typing-inspection==0.4.2
49
+ typer-slim==0.21.1
50
+ tiktoken==0.12.0
51
+ scipy==1.17.0
52
+ pydantic_core==2.41.4
53
+ pandas==2.3.3
54
+ multiprocess==0.70.16
55
+ modelscope==1.34.0
56
+ markdown-it-py==4.0.0
57
+ fire==0.7.1
58
+ contourpy==1.3.3
59
+ anyio==4.12.1
60
+ aiosignal==1.4.0
61
+ starlette==0.52.1
62
+ rich==14.3.2
63
+ pydantic==2.12.3
64
+ matplotlib==3.10.8
65
+ aiohttp==3.13.3
66
+ tyro==0.8.14
67
+ typer==0.21.1
68
+ torchdata==0.11.0
69
+ sse-starlette==3.2.0
70
+ safehttpx==0.1.7
71
+ huggingface_hub==1.4.1
72
+ fastapi==0.128.5
73
+ tokenizers==0.22.2
74
+ gradio_client==1.14.0
75
+ datasets==4.0.0
76
+ accelerate==1.11.0
77
+ transformers==5.0.0
78
+ gradio==5.50.0
79
+ trl==0.24.0
80
+ peft==0.18.1
81
+ llamafactory==0.9.5.dev0
82
+ jieba==0.42.1
83
+ rouge-chinese==1.0.3
84
+ joblib==1.5.3
85
+ nltk==3.9.2
86
+ py-cpuinfo==9.0.0
87
+ nvidia-ml-py==13.590.48
88
+ hjson==3.1.0
89
+ ninja==1.13.0
90
+ msgpack==1.1.2
91
+ deepspeed==0.16.9
92
+ smmap==5.0.2
93
+ sentry-sdk==2.52.0
94
+ gitdb==4.0.12
95
+ GitPython==3.1.46
96
+ wandb==0.24.2
97
+ entrypoints==0.4
98
+ jupyter_client==7.4.9
99
+ nbclassic==1.1.0
100
+ notebook==6.5.5
101
+ pyzmq==24.0.1
102
+ PyYAML==6.0.2
103
+ Send2Trash==1.8.3
104
+ argon2-cffi==23.1.0
105
+ argon2-cffi-bindings==21.2.0
106
+ arrow==1.3.0
107
+ asttokens==2.4.1
108
+ async-lru==2.0.4
109
+ attrs==24.2.0
110
+ babel==2.16.0
111
+ beautifulsoup4==4.12.3
112
+ bleach==6.1.0
113
+ certifi==2024.8.30
114
+ cffi==1.17.1
115
+ charset-normalizer==3.3.2
116
+ comm==0.2.2
117
+ debugpy==1.8.5
118
+ decorator==5.1.1
119
+ defusedxml==0.7.1
120
+ executing==2.1.0
121
+ fastjsonschema==2.20.0
122
+ fqdn==1.5.1
123
+ h11==0.14.0
124
+ httpcore==1.0.5
125
+ httpx==0.27.2
126
+ idna==3.10
127
+ ipykernel==6.29.5
128
+ ipython==8.27.0
129
+ ipython-genutils==0.2.0
130
+ ipywidgets==8.1.5
131
+ isoduration==20.11.0
132
+ jedi==0.19.1
133
+ json5==0.9.25
134
+ jsonpointer==3.0.0
135
+ jsonschema==4.23.0
136
+ jsonschema-specifications==2023.12.1
137
+ jupyter-archive==3.4.0
138
+ jupyter_contrib_core==0.4.2
139
+ jupyter_contrib_nbextensions==0.7.0
140
+ jupyter_core==5.7.2
141
+ jupyter-events==0.10.0
142
+ jupyter-highlight-selected-word==0.2.0
143
+ jupyter-lsp==2.2.5
144
+ jupyter_nbextensions_configurator==0.6.4
145
+ jupyter_server==2.14.2
146
+ jupyter_server_terminals==0.5.3
147
+ jupyterlab==4.2.5
148
+ jupyterlab_pygments==0.3.0
149
+ jupyterlab_server==2.27.3
150
+ jupyterlab_widgets==3.0.13
151
+ lxml==5.3.0
152
+ matplotlib-inline==0.1.7
153
+ mistune==3.0.2
154
+ nbclient==0.10.0
155
+ nbconvert==7.16.4
156
+ nbformat==5.10.4
157
+ nest-asyncio==1.6.0
158
+ notebook_shim==0.2.4
159
+ overrides==7.7.0
160
+ packaging==24.1
161
+ pandocfilters==1.5.1
162
+ parso==0.8.4
163
+ pexpect==4.9.0
164
+ platformdirs==4.3.6
165
+ prometheus_client==0.21.0
166
+ prompt_toolkit==3.0.47
167
+ psutil==6.0.0
168
+ ptyprocess==0.7.0
169
+ pure_eval==0.2.3
170
+ pycparser==2.22
171
+ Pygments==2.18.0
172
+ python-dateutil==2.9.0.post0
173
+ python-json-logger==2.0.7
174
+ referencing==0.35.1
175
+ requests==2.32.3
176
+ rfc3339-validator==0.1.4
177
+ rfc3986-validator==0.1.1
178
+ rpds-py==0.20.0
179
+ sniffio==1.3.1
180
+ soupsieve==2.6
181
+ stack-data==0.6.3
182
+ terminado==0.18.1
183
+ tinycss2==1.3.0
184
+ tornado==6.4.1
185
+ traitlets==5.14.3
186
+ types-python-dateutil==2.9.0.20240906
187
+ uri-template==1.3.0
188
+ urllib3==2.2.3
189
+ wcwidth==0.2.13
190
+ webcolors==24.8.0
191
+ webencodings==0.5.1
192
+ websocket-client==1.8.0
193
+ widgetsnbextension==4.0.13
194
+ Jinja2==3.1.3
195
+ MarkupSafe==2.1.5
196
+ filelock==3.13.1
197
+ fsspec==2024.2.0
198
+ mpmath==1.3.0
199
+ networkx==3.2.1
200
+ nvidia-cublas-cu12==12.4.2.65
201
+ nvidia-cuda-cupti-cu12==12.4.99
202
+ nvidia-cuda-nvrtc-cu12==12.4.99
203
+ nvidia-cuda-runtime-cu12==12.4.99
204
+ nvidia-cudnn-cu12==9.1.0.70
205
+ nvidia-cufft-cu12==11.2.0.44
206
+ nvidia-curand-cu12==10.3.5.119
207
+ nvidia-cusolver-cu12==11.6.0.99
208
+ nvidia-cusparse-cu12==12.3.0.142
209
+ nvidia-nccl-cu12==2.20.5
210
+ nvidia-nvjitlink-cu12==12.4.99
211
+ nvidia-nvtx-cu12==12.4.99
212
+ pillow==10.2.0
213
+ sympy==1.12
214
+ torch==2.4.1+cu124
215
+ torchaudio==2.4.1+cu124
216
+ torchvision==0.19.1+cu124
217
+ triton==3.0.0
218
+ pip==24.2
219
+ setuptools==75.1.0
220
+ wheel==0.44.0
221
+ PyGObject==3.42.1
222
+ PyJWT==2.3.0
223
+ SecretStorage==3.3.1
224
+ blinker==1.4
225
+ cryptography==3.4.8
226
+ dbus-python==1.2.18
227
+ distro==1.7.0
228
+ httplib2==0.20.2
229
+ importlib-metadata==4.6.4
230
+ jeepney==0.7.1
231
+ keyring==23.5.0
232
+ launchpadlib==1.10.16
233
+ lazr.restfulclient==0.14.4
234
+ lazr.uri==1.0.6
235
+ more-itertools==8.10.0
236
+ oauthlib==3.2.0
237
+ python-apt==2.4.0+ubuntu4
238
+ six==1.16.0
239
+ wadllib==1.3.6
240
+ zipp==1.0.0
241
+ autocommand==2.2.2
242
+ backports.tarfile==1.2.0
243
+ importlib_metadata==8.0.0
244
+ importlib_resources==6.4.0
245
+ inflect==7.3.1
246
+ jaraco.collections==5.1.0
247
+ jaraco.context==5.3.0
248
+ jaraco.functools==4.0.1
249
+ jaraco.text==3.12.1
250
+ more-itertools==10.3.0
251
+ packaging==24.1
252
+ platformdirs==4.2.2
253
+ tomli==2.0.1
254
+ typeguard==4.3.0
255
+ typing_extensions==4.12.2
256
+ wheel==0.43.0
257
+ zipp==3.19.2
LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/wandb-metadata.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-60-generic-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.11.10",
4
+ "startedAt": "2026-02-09T08:03:05.241495Z",
5
+ "args": [
6
+ "/workspace/v127rc_exp2/B_dup.yaml"
7
+ ],
8
+ "program": "/usr/local/bin/llamafactory-cli",
9
+ "git": {
10
+ "remote": "https://github.com/hiyouga/LlamaFactory.git",
11
+ "commit": "1a02717fa84c270d1c156c4c4a391c2f95525a63"
12
+ },
13
+ "email": "markmochi200@gmail.com",
14
+ "root": "/workspace/LlamaFactory",
15
+ "host": "3bebe963f251",
16
+ "executable": "/usr/bin/python",
17
+ "cpu_count": 16,
18
+ "cpu_count_logical": 32,
19
+ "gpu": "NVIDIA GeForce RTX 4090",
20
+ "gpu_count": 1,
21
+ "disk": {
22
+ "/": {
23
+ "total": "21474836480",
24
+ "used": "2060378112"
25
+ }
26
+ },
27
+ "memory": {
28
+ "total": "134156767232"
29
+ },
30
+ "gpu_nvidia": [
31
+ {
32
+ "name": "NVIDIA GeForce RTX 4090",
33
+ "memoryTotal": "25757220864",
34
+ "cudaCores": 16384,
35
+ "architecture": "Ada",
36
+ "uuid": "GPU-6c1e98c2-1b34-cfd8-5de5-319e272f1d1e"
37
+ }
38
+ ],
39
+ "cudaVersion": "12.9",
40
+ "writerId": "1eab48yhuu1cnh8oebani4cfaxehtql2"
41
+ }
LlamaFactory/wandb/run-20260209_080305-rc9olpt3/files/wandb-summary.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"train/learning_rate":5.385996409335727e-06,"_step":120,"_runtime":127,"train/grad_norm":0.3255986273288727,"train_runtime":127.7247,"train/num_input_tokens_seen":247687,"_timestamp":1.7706243125413227e+09,"train/loss":1.4658641815185547,"train/global_step":121,"train/train_tokens_per_second":1939.226,"_wandb":{"runtime":127},"train/epoch":0.01086566091954023}
LlamaFactory/wandb/run-20260209_080305-rc9olpt3/logs/debug-internal.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-02-09T08:03:05.494912833Z","level":"INFO","msg":"stream: starting","core version":"0.24.2"}
2
+ {"time":"2026-02-09T08:03:05.814243496Z","level":"INFO","msg":"stream: created new stream","id":"rc9olpt3"}
3
+ {"time":"2026-02-09T08:03:05.814810208Z","level":"INFO","msg":"handler: started","stream_id":"rc9olpt3"}
4
+ {"time":"2026-02-09T08:03:05.816182474Z","level":"INFO","msg":"stream: started","id":"rc9olpt3"}
5
+ {"time":"2026-02-09T08:03:05.816197572Z","level":"INFO","msg":"writer: started","stream_id":"rc9olpt3"}
6
+ {"time":"2026-02-09T08:03:05.81619655Z","level":"INFO","msg":"sender: started","stream_id":"rc9olpt3"}
7
+ {"time":"2026-02-09T08:05:13.521232852Z","level":"INFO","msg":"stream: closing","id":"rc9olpt3"}
8
+ {"time":"2026-02-09T08:05:14.079361011Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
9
+ {"time":"2026-02-09T08:05:14.315037075Z","level":"INFO","msg":"handler: closed","stream_id":"rc9olpt3"}
10
+ {"time":"2026-02-09T08:05:14.320046119Z","level":"INFO","msg":"sender: closed","stream_id":"rc9olpt3"}
11
+ {"time":"2026-02-09T08:05:14.32050484Z","level":"INFO","msg":"stream: closed","id":"rc9olpt3"}
LlamaFactory/wandb/run-20260209_080305-rc9olpt3/logs/debug.log ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-09 08:03:05,270 INFO MainThread:1005 [wandb_setup.py:_flush():81] Current SDK version is 0.24.2
2
+ 2026-02-09 08:03:05,270 INFO MainThread:1005 [wandb_setup.py:_flush():81] Configure stats pid to 1005
3
+ 2026-02-09 08:03:05,271 INFO MainThread:1005 [wandb_setup.py:_flush():81] Loading settings from environment variables
4
+ 2026-02-09 08:03:05,271 INFO MainThread:1005 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/LlamaFactory/wandb/run-20260209_080305-rc9olpt3/logs/debug.log
5
+ 2026-02-09 08:03:05,272 INFO MainThread:1005 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/LlamaFactory/wandb/run-20260209_080305-rc9olpt3/logs/debug-internal.log
6
+ 2026-02-09 08:03:05,273 INFO MainThread:1005 [wandb_init.py:init():844] calling init triggers
7
+ 2026-02-09 08:03:05,273 INFO MainThread:1005 [wandb_init.py:init():849] wandb.init called with sweep_config: {}
8
+ config: {'_wandb': {}}
9
+ 2026-02-09 08:03:05,274 INFO MainThread:1005 [wandb_init.py:init():892] starting backend
10
+ 2026-02-09 08:03:05,483 INFO MainThread:1005 [wandb_init.py:init():895] sending inform_init request
11
+ 2026-02-09 08:03:05,492 INFO MainThread:1005 [wandb_init.py:init():903] backend started and connected
12
+ 2026-02-09 08:03:05,494 INFO MainThread:1005 [wandb_init.py:init():973] updated telemetry
13
+ 2026-02-09 08:03:05,557 INFO MainThread:1005 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
14
+ 2026-02-09 08:03:06,212 INFO MainThread:1005 [wandb_init.py:init():1042] starting run threads in backend
15
+ 2026-02-09 08:03:06,281 INFO MainThread:1005 [wandb_run.py:_console_start():2529] atexit reg
16
+ 2026-02-09 08:03:06,282 INFO MainThread:1005 [wandb_run.py:_redirect():2377] redirect: wrap_raw
17
+ 2026-02-09 08:03:06,282 INFO MainThread:1005 [wandb_run.py:_redirect():2446] Wrapping output streams.
18
+ 2026-02-09 08:03:06,282 INFO MainThread:1005 [wandb_run.py:_redirect():2469] Redirects installed.
19
+ 2026-02-09 08:03:06,285 INFO MainThread:1005 [wandb_init.py:init():1082] run started, returning control to user process
20
+ 2026-02-09 08:03:06,286 INFO MainThread:1005 [wandb_run.py:_config_callback():1404] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.1', 'base_model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'revision': None, 'inference_mode': False, 'r': 16, 'target_modules': ['gate_proj', 'q_proj', 'k_proj', 'v_proj', 'down_proj', 'o_proj', 'up_proj'], 'exclude_modules': None, 'lora_alpha': 32, 'lora_dropout': 0.03, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 4096, 'intermediate_size': 12288, 'num_hidden_layers': 36, 'num_attention_heads': 32, 'use_sliding_window': False, 'sliding_window': None, 'max_window_layers': 36, 'num_key_value_heads': 8, 'head_dim': 128, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'attention_bias': False, 'attention_dropout': 0.0, 'layer_types': ['full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention'], 'pad_token_id': 151643, 'bos_token_id': None, 'eos_token_id': 151645, 'tie_word_embeddings': False, 'rope_parameters': {'rope_theta': 1000000, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'architectures': ['Qwen3ForCausalLM'], 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'problem_type': None, '_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'transformers_version': '5.0.0', 'model_type': 'qwen3', 'output_attentions': False, 'output_dir': '/workspace/v127rc_exp2/B_dup', 'do_train': True, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 8, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_epsilon': 1e-08, 'max_grad_norm': 1, 'num_train_epochs': 10, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.01, 'warmup_steps': 0.01, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': True, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': ['labels'], 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 180000000, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'all', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'sortish_sampler': False, 'predict_with_generate': False, 'generation_max_length': 2047, 'generation_num_beams': None, 'generation_config': None, 'ray_num_workers': 1, 'ray_init_kwargs': None, 'master_addr': None, 'master_port': None, 'fp8': False, 'fp8_backend': 'auto', 'fp8_enable_fsdp_float8_all_gather': False, 'overwrite_output_dir': False}
21
+ 2026-02-09 08:03:06,292 INFO MainThread:1005 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 8234382336 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x7dac31ac4550>>
22
+ 2026-02-09 08:03:06,292 INFO MainThread:1005 [wandb_run.py:_config_callback():1404] config_cb model/num_parameters 8234382336 None
23
+ 2026-02-09 08:03:06,295 INFO MainThread:1005 [wandb_run.py:_config_callback():1404] config_cb None None {'model_args': {'model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'adapter_name_or_path': None, 'adapter_folder': None, 'cache_dir': None, 'use_fast_tokenizer': True, 'resize_vocab': False, 'split_special_tokens': False, 'add_tokens': None, 'add_special_tokens': None, 'new_special_tokens_config': None, 'init_special_tokens': 'noise_init', 'model_revision': 'main', 'low_cpu_mem_usage': True, 'rope_scaling': None, 'flash_attn': 'auto', 'shift_attn': False, 'mixture_of_depths': None, 'use_unsloth': False, 'use_unsloth_gc': False, 'enable_liger_kernel': False, 'moe_aux_loss_coef': None, 'disable_gradient_checkpointing': False, 'use_reentrant_gc': True, 'upcast_layernorm': False, 'upcast_lmhead_output': False, 'train_from_scratch': False, 'infer_backend': 'HF', 'offload_folder': 'offload', 'use_kv_cache': True, 'use_v1_kernels': False, 'infer_dtype': 'auto', 'hf_hub_token': '<HF_HUB_TOKEN>', 'ms_hub_token': '<MS_HUB_TOKEN>', 'om_hub_token': '<OM_HUB_TOKEN>', 'print_param_status': False, 'trust_remote_code': True, 'quantization_method': 'BNB', 'quantization_bit': None, 'quantization_type': 'nf4', 'double_quantization': True, 'quantization_device_map': None, 'image_max_pixels': 589824, 'image_min_pixels': 1024, 'image_do_pan_and_scan': False, 'crop_to_patches': False, 'video_max_pixels': 65536, 'video_min_pixels': 256, 'video_fps': 2.0, 'video_maxlen': 128, 'use_audio_in_video': False, 'audio_sampling_rate': 16000, 'export_dir': None, 'export_size': 5, 'export_device': 'cpu', 'export_quantization_bit': None, 'export_quantization_dataset': None, 'export_quantization_nsamples': 128, 'export_quantization_maxlen': 1024, 'export_legacy_format': False, 'export_hub_model_id': None, 'use_kt': False, 'kt_optimize_rule': None, 'cpu_infer': 32, 'chunk_size': 8192, 'mode': 'normal', 'kt_maxlen': 4096, 'kt_use_cuda_graph': True, 'kt_mode': 'normal', 'kt_force_think': False, 'vllm_maxlen': 4096, 'vllm_gpu_util': 0.7, 'vllm_enforce_eager': False, 'vllm_max_lora_rank': 32, 'vllm_config': None, 'sglang_maxlen': 4096, 'sglang_mem_fraction': 0.7, 'sglang_tp_size': -1, 'sglang_config': None, 'sglang_lora_backend': 'triton', 'compute_dtype': 'torch.bfloat16', 'device_map': {'': 'cuda:0'}, 'model_max_length': 2047, 'block_diag_attn': False}, 'data_args': {'template': 'qwen3_nothink', 'dataset': ['Markie_Voss_t0_d34_r300'], 'eval_dataset': None, 'dataset_dir': '/workspace/LlamaFactory/data', 'media_dir': '/workspace/LlamaFactory/data', 'cutoff_len': 2047, 'train_on_prompt': False, 'mask_history': False, 'streaming': False, 'buffer_size': 16384, 'mix_strategy': 'concat', 'interleave_probs': None, 'overwrite_cache': False, 'preprocessing_batch_size': 1000, 'preprocessing_num_workers': 16, 'max_samples': 100000000, 'eval_num_beams': None, 'ignore_pad_token_for_loss': True, 'val_size': 0.0, 'eval_on_each_dataset': False, 'packing': True, 'neat_packing': False, 'tool_format': None, 'default_system': None, 'enable_thinking': False, 'tokenized_path': None, 'data_shared_file_system': False}, 'finetuning_args': {'freeze_trainable_layers': 2, 'freeze_trainable_modules': ['all'], 'freeze_extra_modules': None, 'additional_target': None, 'module_dropout': 0.0, 'oft_rank': 0, 'oft_block_size': 32, 'oft_target': ['all'], 'create_new_adapter': False, 'lora_alpha': 32, 'lora_dropout': 0.03, 'lora_rank': 16, 'lora_target': ['all'], 'loraplus_lr_ratio': None, 'loraplus_lr_embedding': 1e-06, 'use_rslora': False, 'use_dora': False, 'pissa_init': False, 'pissa_iter': 16, 'pissa_convert': False, 'pref_beta': 0.1, 'pref_ftx': 0.0, 'pref_bco_weight': 0.0, 'pref_loss': 'sigmoid', 'dpo_label_smoothing': 0.0, 'kto_chosen_weight': 1.0, 'kto_rejected_weight': 1.0, 'simpo_gamma': 0.5, 'ppo_buffer_size': 1, 'ppo_epochs': 4, 'ppo_score_norm': False, 'ppo_target': 6.0, 'ppo_whiten_rewards': False, 'ref_model': None, 'ref_model_adapters': None, 'ref_model_quantization_bit': None, 'reward_model': None, 'reward_model_adapters': None, 'reward_model_quantization_bit': None, 'reward_model_type': 'lora', 'ld_alpha': None, 'use_galore': False, 'galore_target': ['all'], 'galore_rank': 16, 'galore_update_interval': 200, 'galore_scale': 2.0, 'galore_proj_type': 'std', 'galore_layerwise': False, 'use_apollo': False, 'apollo_target': ['all'], 'apollo_rank': 16, 'apollo_update_interval': 200, 'apollo_scale': 32.0, 'apollo_proj': 'random', 'apollo_proj_type': 'std', 'apollo_scale_type': 'channel', 'apollo_layerwise': False, 'apollo_scale_front': False, 'use_badam': False, 'badam_mode': 'layer', 'badam_start_block': None, 'badam_switch_mode': 'ascending', 'badam_switch_interval': 50, 'badam_update_ratio': 0.05, 'badam_mask_mode': 'adjacent', 'badam_verbose': 0, 'use_swanlab': False, 'swanlab_project': 'llamafactory', 'swanlab_workspace': None, 'swanlab_run_name': None, 'swanlab_mode': 'cloud', 'swanlab_api_key': '<SWANLAB_API_KEY>', 'swanlab_logdir': None, 'swanlab_lark_webhook_url': None, 'swanlab_lark_secret': None, 'pure_bf16': False, 'stage': 'pt', 'finetuning_type': 'lora', 'use_llama_pro': False, 'use_adam_mini': False, 'use_mca': False, 'use_muon': False, 'use_dft_loss': False, 'use_eaft_loss': False, 'eaft_alpha': 1.0, 'freeze_vision_tower': True, 'freeze_multi_modal_projector': True, 'freeze_language_model': False, 'compute_accuracy': False, 'disable_shuffling': False, 'early_stopping_steps': None, 'plot_loss': True, 'include_effective_tokens_per_second': False}, 'generating_args': {'do_sample': True, 'temperature': 0.95, 'top_p': 0.7, 'top_k': 50, 'num_beams': 1, 'max_new_tokens': 1024, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'skip_special_tokens': True}}
24
+ 2026-02-09 08:05:13,521 INFO wandb-AsyncioManager-main:1005 [service_client.py:_forward_responses():94] Reached EOF.
25
+ 2026-02-09 08:05:13,521 INFO wandb-AsyncioManager-main:1005 [mailbox.py:close():154] Closing mailbox, abandoning 1 handles.
LlamaFactory/wandb/run-20260209_081117-18fi1m6s/files/output.log ADDED
The diff for this file is too large to render. See raw diff
 
LlamaFactory/wandb/run-20260209_081117-18fi1m6s/files/requirements.txt ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pytz==2025.2
2
+ pydub==0.25.1
3
+ brotli==1.2.0
4
+ antlr4-python3-runtime==4.9.3
5
+ xxhash==3.6.0
6
+ websockets==15.0.1
7
+ tzdata==2025.3
8
+ typing_extensions==4.15.0
9
+ tqdm==4.67.3
10
+ tomlkit==0.13.3
11
+ termcolor==3.3.0
12
+ shtab==1.8.0
13
+ shellingham==1.5.4
14
+ sentencepiece==0.2.1
15
+ semantic-version==2.10.0
16
+ safetensors==0.7.0
17
+ ruff==0.15.0
18
+ regex==2026.1.15
19
+ python-multipart==0.0.22
20
+ pyparsing==3.3.2
21
+ pyarrow==23.0.0
22
+ protobuf==6.33.5
23
+ propcache==0.4.1
24
+ orjson==3.11.7
25
+ omegaconf==2.3.0
26
+ numpy==2.4.2
27
+ multidict==6.7.1
28
+ mdurl==0.1.2
29
+ kiwisolver==1.4.9
30
+ hf-xet==1.2.0
31
+ hf_transfer==0.1.9
32
+ groovy==0.1.2
33
+ frozenlist==1.8.0
34
+ fonttools==4.61.1
35
+ ffmpy==1.0.0
36
+ einops==0.8.2
37
+ docstring_parser==0.17.0
38
+ dill==0.3.8
39
+ cycler==0.12.1
40
+ click==8.3.1
41
+ av==16.0.0
42
+ annotated-types==0.7.0
43
+ annotated-doc==0.0.4
44
+ aiohappyeyeballs==2.6.1
45
+ aiofiles==24.1.0
46
+ yarl==1.22.0
47
+ uvicorn==0.40.0
48
+ typing-inspection==0.4.2
49
+ typer-slim==0.21.1
50
+ tiktoken==0.12.0
51
+ scipy==1.17.0
52
+ pydantic_core==2.41.4
53
+ pandas==2.3.3
54
+ multiprocess==0.70.16
55
+ modelscope==1.34.0
56
+ markdown-it-py==4.0.0
57
+ fire==0.7.1
58
+ contourpy==1.3.3
59
+ anyio==4.12.1
60
+ aiosignal==1.4.0
61
+ starlette==0.52.1
62
+ rich==14.3.2
63
+ pydantic==2.12.3
64
+ matplotlib==3.10.8
65
+ aiohttp==3.13.3
66
+ tyro==0.8.14
67
+ typer==0.21.1
68
+ torchdata==0.11.0
69
+ sse-starlette==3.2.0
70
+ safehttpx==0.1.7
71
+ huggingface_hub==1.4.1
72
+ fastapi==0.128.5
73
+ tokenizers==0.22.2
74
+ gradio_client==1.14.0
75
+ datasets==4.0.0
76
+ accelerate==1.11.0
77
+ transformers==5.0.0
78
+ gradio==5.50.0
79
+ trl==0.24.0
80
+ peft==0.18.1
81
+ llamafactory==0.9.5.dev0
82
+ jieba==0.42.1
83
+ rouge-chinese==1.0.3
84
+ joblib==1.5.3
85
+ nltk==3.9.2
86
+ py-cpuinfo==9.0.0
87
+ nvidia-ml-py==13.590.48
88
+ hjson==3.1.0
89
+ ninja==1.13.0
90
+ msgpack==1.1.2
91
+ deepspeed==0.16.9
92
+ smmap==5.0.2
93
+ sentry-sdk==2.52.0
94
+ gitdb==4.0.12
95
+ GitPython==3.1.46
96
+ wandb==0.24.2
97
+ entrypoints==0.4
98
+ jupyter_client==7.4.9
99
+ nbclassic==1.1.0
100
+ notebook==6.5.5
101
+ pyzmq==24.0.1
102
+ PyYAML==6.0.2
103
+ Send2Trash==1.8.3
104
+ argon2-cffi==23.1.0
105
+ argon2-cffi-bindings==21.2.0
106
+ arrow==1.3.0
107
+ asttokens==2.4.1
108
+ async-lru==2.0.4
109
+ attrs==24.2.0
110
+ babel==2.16.0
111
+ beautifulsoup4==4.12.3
112
+ bleach==6.1.0
113
+ certifi==2024.8.30
114
+ cffi==1.17.1
115
+ charset-normalizer==3.3.2
116
+ comm==0.2.2
117
+ debugpy==1.8.5
118
+ decorator==5.1.1
119
+ defusedxml==0.7.1
120
+ executing==2.1.0
121
+ fastjsonschema==2.20.0
122
+ fqdn==1.5.1
123
+ h11==0.14.0
124
+ httpcore==1.0.5
125
+ httpx==0.27.2
126
+ idna==3.10
127
+ ipykernel==6.29.5
128
+ ipython==8.27.0
129
+ ipython-genutils==0.2.0
130
+ ipywidgets==8.1.5
131
+ isoduration==20.11.0
132
+ jedi==0.19.1
133
+ json5==0.9.25
134
+ jsonpointer==3.0.0
135
+ jsonschema==4.23.0
136
+ jsonschema-specifications==2023.12.1
137
+ jupyter-archive==3.4.0
138
+ jupyter_contrib_core==0.4.2
139
+ jupyter_contrib_nbextensions==0.7.0
140
+ jupyter_core==5.7.2
141
+ jupyter-events==0.10.0
142
+ jupyter-highlight-selected-word==0.2.0
143
+ jupyter-lsp==2.2.5
144
+ jupyter_nbextensions_configurator==0.6.4
145
+ jupyter_server==2.14.2
146
+ jupyter_server_terminals==0.5.3
147
+ jupyterlab==4.2.5
148
+ jupyterlab_pygments==0.3.0
149
+ jupyterlab_server==2.27.3
150
+ jupyterlab_widgets==3.0.13
151
+ lxml==5.3.0
152
+ matplotlib-inline==0.1.7
153
+ mistune==3.0.2
154
+ nbclient==0.10.0
155
+ nbconvert==7.16.4
156
+ nbformat==5.10.4
157
+ nest-asyncio==1.6.0
158
+ notebook_shim==0.2.4
159
+ overrides==7.7.0
160
+ packaging==24.1
161
+ pandocfilters==1.5.1
162
+ parso==0.8.4
163
+ pexpect==4.9.0
164
+ platformdirs==4.3.6
165
+ prometheus_client==0.21.0
166
+ prompt_toolkit==3.0.47
167
+ psutil==6.0.0
168
+ ptyprocess==0.7.0
169
+ pure_eval==0.2.3
170
+ pycparser==2.22
171
+ Pygments==2.18.0
172
+ python-dateutil==2.9.0.post0
173
+ python-json-logger==2.0.7
174
+ referencing==0.35.1
175
+ requests==2.32.3
176
+ rfc3339-validator==0.1.4
177
+ rfc3986-validator==0.1.1
178
+ rpds-py==0.20.0
179
+ sniffio==1.3.1
180
+ soupsieve==2.6
181
+ stack-data==0.6.3
182
+ terminado==0.18.1
183
+ tinycss2==1.3.0
184
+ tornado==6.4.1
185
+ traitlets==5.14.3
186
+ types-python-dateutil==2.9.0.20240906
187
+ uri-template==1.3.0
188
+ urllib3==2.2.3
189
+ wcwidth==0.2.13
190
+ webcolors==24.8.0
191
+ webencodings==0.5.1
192
+ websocket-client==1.8.0
193
+ widgetsnbextension==4.0.13
194
+ Jinja2==3.1.3
195
+ MarkupSafe==2.1.5
196
+ filelock==3.13.1
197
+ fsspec==2024.2.0
198
+ mpmath==1.3.0
199
+ networkx==3.2.1
200
+ nvidia-cublas-cu12==12.4.2.65
201
+ nvidia-cuda-cupti-cu12==12.4.99
202
+ nvidia-cuda-nvrtc-cu12==12.4.99
203
+ nvidia-cuda-runtime-cu12==12.4.99
204
+ nvidia-cudnn-cu12==9.1.0.70
205
+ nvidia-cufft-cu12==11.2.0.44
206
+ nvidia-curand-cu12==10.3.5.119
207
+ nvidia-cusolver-cu12==11.6.0.99
208
+ nvidia-cusparse-cu12==12.3.0.142
209
+ nvidia-nccl-cu12==2.20.5
210
+ nvidia-nvjitlink-cu12==12.4.99
211
+ nvidia-nvtx-cu12==12.4.99
212
+ pillow==10.2.0
213
+ sympy==1.12
214
+ torch==2.4.1+cu124
215
+ torchaudio==2.4.1+cu124
216
+ torchvision==0.19.1+cu124
217
+ triton==3.0.0
218
+ pip==24.2
219
+ setuptools==75.1.0
220
+ wheel==0.44.0
221
+ PyGObject==3.42.1
222
+ PyJWT==2.3.0
223
+ SecretStorage==3.3.1
224
+ blinker==1.4
225
+ cryptography==3.4.8
226
+ dbus-python==1.2.18
227
+ distro==1.7.0
228
+ httplib2==0.20.2
229
+ importlib-metadata==4.6.4
230
+ jeepney==0.7.1
231
+ keyring==23.5.0
232
+ launchpadlib==1.10.16
233
+ lazr.restfulclient==0.14.4
234
+ lazr.uri==1.0.6
235
+ more-itertools==8.10.0
236
+ oauthlib==3.2.0
237
+ python-apt==2.4.0+ubuntu4
238
+ six==1.16.0
239
+ wadllib==1.3.6
240
+ zipp==1.0.0
241
+ autocommand==2.2.2
242
+ backports.tarfile==1.2.0
243
+ importlib_metadata==8.0.0
244
+ importlib_resources==6.4.0
245
+ inflect==7.3.1
246
+ jaraco.collections==5.1.0
247
+ jaraco.context==5.3.0
248
+ jaraco.functools==4.0.1
249
+ jaraco.text==3.12.1
250
+ more-itertools==10.3.0
251
+ packaging==24.1
252
+ platformdirs==4.2.2
253
+ tomli==2.0.1
254
+ typeguard==4.3.0
255
+ typing_extensions==4.12.2
256
+ wheel==0.43.0
257
+ zipp==3.19.2
LlamaFactory/wandb/run-20260209_081117-18fi1m6s/files/wandb-metadata.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "os": "Linux-6.8.0-60-generic-x86_64-with-glibc2.35",
3
+ "python": "CPython 3.11.10",
4
+ "startedAt": "2026-02-09T08:11:17.633214Z",
5
+ "args": [
6
+ "/workspace/v127rc_exp2/B_dup.yaml"
7
+ ],
8
+ "program": "/usr/local/bin/llamafactory-cli",
9
+ "git": {
10
+ "remote": "https://github.com/hiyouga/LlamaFactory.git",
11
+ "commit": "1a02717fa84c270d1c156c4c4a391c2f95525a63"
12
+ },
13
+ "email": "markmochi200@gmail.com",
14
+ "root": "/workspace/LlamaFactory",
15
+ "host": "3bebe963f251",
16
+ "executable": "/usr/bin/python",
17
+ "cpu_count": 16,
18
+ "cpu_count_logical": 32,
19
+ "gpu": "NVIDIA GeForce RTX 4090",
20
+ "gpu_count": 1,
21
+ "disk": {
22
+ "/": {
23
+ "total": "21474836480",
24
+ "used": "2060402688"
25
+ }
26
+ },
27
+ "memory": {
28
+ "total": "134156767232"
29
+ },
30
+ "gpu_nvidia": [
31
+ {
32
+ "name": "NVIDIA GeForce RTX 4090",
33
+ "memoryTotal": "25757220864",
34
+ "cudaCores": 16384,
35
+ "architecture": "Ada",
36
+ "uuid": "GPU-6c1e98c2-1b34-cfd8-5de5-319e272f1d1e"
37
+ }
38
+ ],
39
+ "cudaVersion": "12.9",
40
+ "writerId": "cpeblcn9qsylpv3vdjtepn4bz3uf73py"
41
+ }
LlamaFactory/wandb/run-20260209_081117-18fi1m6s/logs/debug-internal.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"time":"2026-02-09T08:11:17.885519267Z","level":"INFO","msg":"stream: starting","core version":"0.24.2"}
2
+ {"time":"2026-02-09T08:11:18.191759306Z","level":"INFO","msg":"stream: created new stream","id":"18fi1m6s"}
3
+ {"time":"2026-02-09T08:11:18.192322641Z","level":"INFO","msg":"handler: started","stream_id":"18fi1m6s"}
4
+ {"time":"2026-02-09T08:11:18.194262901Z","level":"INFO","msg":"stream: started","id":"18fi1m6s"}
5
+ {"time":"2026-02-09T08:11:18.194272128Z","level":"INFO","msg":"writer: started","stream_id":"18fi1m6s"}
6
+ {"time":"2026-02-09T08:11:18.194274833Z","level":"INFO","msg":"sender: started","stream_id":"18fi1m6s"}
7
+ {"time":"2026-02-09T08:34:15.201223367Z","level":"INFO","msg":"stream: closing","id":"18fi1m6s"}
8
+ {"time":"2026-02-09T08:34:16.252836477Z","level":"INFO","msg":"fileTransfer: Close: file transfer manager closed"}
9
+ {"time":"2026-02-09T08:34:16.482835888Z","level":"INFO","msg":"handler: closed","stream_id":"18fi1m6s"}
10
+ {"time":"2026-02-09T08:34:16.486903324Z","level":"INFO","msg":"sender: closed","stream_id":"18fi1m6s"}
11
+ {"time":"2026-02-09T08:34:16.487263633Z","level":"INFO","msg":"stream: closed","id":"18fi1m6s"}
LlamaFactory/wandb/run-20260209_081117-18fi1m6s/logs/debug.log ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2026-02-09 08:11:17,660 INFO MainThread:2068 [wandb_setup.py:_flush():81] Current SDK version is 0.24.2
2
+ 2026-02-09 08:11:17,661 INFO MainThread:2068 [wandb_setup.py:_flush():81] Configure stats pid to 2068
3
+ 2026-02-09 08:11:17,661 INFO MainThread:2068 [wandb_setup.py:_flush():81] Loading settings from environment variables
4
+ 2026-02-09 08:11:17,662 INFO MainThread:2068 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/LlamaFactory/wandb/run-20260209_081117-18fi1m6s/logs/debug.log
5
+ 2026-02-09 08:11:17,662 INFO MainThread:2068 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/LlamaFactory/wandb/run-20260209_081117-18fi1m6s/logs/debug-internal.log
6
+ 2026-02-09 08:11:17,663 INFO MainThread:2068 [wandb_init.py:init():844] calling init triggers
7
+ 2026-02-09 08:11:17,663 INFO MainThread:2068 [wandb_init.py:init():849] wandb.init called with sweep_config: {}
8
+ config: {'_wandb': {}}
9
+ 2026-02-09 08:11:17,664 INFO MainThread:2068 [wandb_init.py:init():892] starting backend
10
+ 2026-02-09 08:11:17,875 INFO MainThread:2068 [wandb_init.py:init():895] sending inform_init request
11
+ 2026-02-09 08:11:17,883 INFO MainThread:2068 [wandb_init.py:init():903] backend started and connected
12
+ 2026-02-09 08:11:17,886 INFO MainThread:2068 [wandb_init.py:init():973] updated telemetry
13
+ 2026-02-09 08:11:17,958 INFO MainThread:2068 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
14
+ 2026-02-09 08:11:18,522 INFO MainThread:2068 [wandb_init.py:init():1042] starting run threads in backend
15
+ 2026-02-09 08:11:18,591 INFO MainThread:2068 [wandb_run.py:_console_start():2529] atexit reg
16
+ 2026-02-09 08:11:18,591 INFO MainThread:2068 [wandb_run.py:_redirect():2377] redirect: wrap_raw
17
+ 2026-02-09 08:11:18,592 INFO MainThread:2068 [wandb_run.py:_redirect():2446] Wrapping output streams.
18
+ 2026-02-09 08:11:18,592 INFO MainThread:2068 [wandb_run.py:_redirect():2469] Redirects installed.
19
+ 2026-02-09 08:11:18,594 INFO MainThread:2068 [wandb_init.py:init():1082] run started, returning control to user process
20
+ 2026-02-09 08:11:18,595 INFO MainThread:2068 [wandb_run.py:_config_callback():1404] config_cb None None {'peft_config': {'default': {'task_type': 'CAUSAL_LM', 'peft_type': 'LORA', 'auto_mapping': None, 'peft_version': '0.18.1', 'base_model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'revision': None, 'inference_mode': False, 'r': 16, 'target_modules': ['q_proj', 'v_proj', 'down_proj', 'up_proj', 'k_proj', 'gate_proj', 'o_proj'], 'exclude_modules': None, 'lora_alpha': 32, 'lora_dropout': 0.03, 'fan_in_fan_out': False, 'bias': 'none', 'use_rslora': False, 'modules_to_save': None, 'init_lora_weights': True, 'layers_to_transform': None, 'layers_pattern': None, 'rank_pattern': {}, 'alpha_pattern': {}, 'megatron_config': None, 'megatron_core': 'megatron.core', 'trainable_token_indices': None, 'loftq_config': {}, 'eva_config': None, 'corda_config': None, 'use_dora': False, 'alora_invocation_tokens': None, 'use_qalora': False, 'qalora_group_size': 16, 'layer_replication': None, 'runtime_config': {'ephemeral_gpu_offload': False}, 'lora_bias': False, 'target_parameters': None, 'arrow_config': None, 'ensure_weight_tying': False}}, 'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 4096, 'intermediate_size': 12288, 'num_hidden_layers': 36, 'num_attention_heads': 32, 'use_sliding_window': False, 'sliding_window': None, 'max_window_layers': 36, 'num_key_value_heads': 8, 'head_dim': 128, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': False, 'attention_bias': False, 'attention_dropout': 0.0, 'layer_types': ['full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention', 'full_attention'], 'pad_token_id': 151643, 'bos_token_id': None, 'eos_token_id': 151645, 'tie_word_embeddings': False, 'rope_parameters': {'rope_theta': 1000000, 'rope_type': 'default'}, 'return_dict': True, 'output_hidden_states': False, 'dtype': 'bfloat16', 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'architectures': ['Qwen3ForCausalLM'], 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'problem_type': None, '_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'transformers_version': '5.0.0', 'model_type': 'qwen3', 'output_attentions': False, 'output_dir': '/workspace/v127rc_exp2/B_dup', 'do_train': True, 'do_eval': False, 'do_predict': False, 'eval_strategy': 'no', 'prediction_loss_only': False, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 8, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0, 'adam_beta1': 0.9, 'adam_beta2': 0.95, 'adam_epsilon': 1e-08, 'max_grad_norm': 1, 'num_train_epochs': 10, 'max_steps': -1, 'lr_scheduler_type': 'cosine', 'lr_scheduler_kwargs': None, 'warmup_ratio': 0.01, 'warmup_steps': 0.01, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': None, 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 1, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 1000, 'save_total_limit': None, 'enable_jit_checkpoint': False, 'save_on_each_node': False, 'save_only_model': True, 'restore_callback_states_from_checkpoint': False, 'use_cpu': False, 'seed': 42, 'data_seed': None, 'bf16': True, 'fp16': False, 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': -1, 'ddp_backend': None, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': None, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'run_name': None, 'disable_tqdm': False, 'remove_unused_columns': False, 'label_names': ['labels'], 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'parallelism_config': None, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['wandb'], 'project': 'huggingface', 'trackio_space_id': 'trackio', 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': None, 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'hub_revision': None, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'auto_find_batch_size': False, 'full_determinism': False, 'ddp_timeout': 180000000, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'include_num_input_tokens_seen': 'all', 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'liger_kernel_config': None, 'eval_use_gather_object': False, 'average_tokens_across_devices': True, 'sortish_sampler': False, 'predict_with_generate': False, 'generation_max_length': 2047, 'generation_num_beams': None, 'generation_config': None, 'ray_num_workers': 1, 'ray_init_kwargs': None, 'master_addr': None, 'master_port': None, 'fp8': False, 'fp8_backend': 'auto', 'fp8_enable_fsdp_float8_all_gather': False, 'overwrite_output_dir': False}
21
+ 2026-02-09 08:11:18,601 INFO MainThread:2068 [wandb_config.py:__setitem__():154] [no run ID] config set model/num_parameters = 8234382336 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x78c6bc701e10>>
22
+ 2026-02-09 08:11:18,605 INFO MainThread:2068 [wandb_run.py:_config_callback():1404] config_cb model/num_parameters 8234382336 None
23
+ 2026-02-09 08:11:18,609 INFO MainThread:2068 [wandb_run.py:_config_callback():1404] config_cb None None {'model_args': {'model_name_or_path': '/workspace/Qwen/Qwen3-8B-Base', 'adapter_name_or_path': None, 'adapter_folder': None, 'cache_dir': None, 'use_fast_tokenizer': True, 'resize_vocab': False, 'split_special_tokens': False, 'add_tokens': None, 'add_special_tokens': None, 'new_special_tokens_config': None, 'init_special_tokens': 'noise_init', 'model_revision': 'main', 'low_cpu_mem_usage': True, 'rope_scaling': None, 'flash_attn': 'auto', 'shift_attn': False, 'mixture_of_depths': None, 'use_unsloth': False, 'use_unsloth_gc': False, 'enable_liger_kernel': False, 'moe_aux_loss_coef': None, 'disable_gradient_checkpointing': False, 'use_reentrant_gc': True, 'upcast_layernorm': False, 'upcast_lmhead_output': False, 'train_from_scratch': False, 'infer_backend': 'HF', 'offload_folder': 'offload', 'use_kv_cache': True, 'use_v1_kernels': False, 'infer_dtype': 'auto', 'hf_hub_token': '<HF_HUB_TOKEN>', 'ms_hub_token': '<MS_HUB_TOKEN>', 'om_hub_token': '<OM_HUB_TOKEN>', 'print_param_status': False, 'trust_remote_code': True, 'quantization_method': 'BNB', 'quantization_bit': None, 'quantization_type': 'nf4', 'double_quantization': True, 'quantization_device_map': None, 'image_max_pixels': 589824, 'image_min_pixels': 1024, 'image_do_pan_and_scan': False, 'crop_to_patches': False, 'video_max_pixels': 65536, 'video_min_pixels': 256, 'video_fps': 2.0, 'video_maxlen': 128, 'use_audio_in_video': False, 'audio_sampling_rate': 16000, 'export_dir': None, 'export_size': 5, 'export_device': 'cpu', 'export_quantization_bit': None, 'export_quantization_dataset': None, 'export_quantization_nsamples': 128, 'export_quantization_maxlen': 1024, 'export_legacy_format': False, 'export_hub_model_id': None, 'use_kt': False, 'kt_optimize_rule': None, 'cpu_infer': 32, 'chunk_size': 8192, 'mode': 'normal', 'kt_maxlen': 4096, 'kt_use_cuda_graph': True, 'kt_mode': 'normal', 'kt_force_think': False, 'vllm_maxlen': 4096, 'vllm_gpu_util': 0.7, 'vllm_enforce_eager': False, 'vllm_max_lora_rank': 32, 'vllm_config': None, 'sglang_maxlen': 4096, 'sglang_mem_fraction': 0.7, 'sglang_tp_size': -1, 'sglang_config': None, 'sglang_lora_backend': 'triton', 'compute_dtype': 'torch.bfloat16', 'device_map': {'': 'cuda:0'}, 'model_max_length': 2047, 'block_diag_attn': False}, 'data_args': {'template': 'qwen3_nothink', 'dataset': ['Markie_Voss_t0_d34_r300'], 'eval_dataset': None, 'dataset_dir': '/workspace/LlamaFactory/data', 'media_dir': '/workspace/LlamaFactory/data', 'cutoff_len': 2047, 'train_on_prompt': False, 'mask_history': False, 'streaming': False, 'buffer_size': 16384, 'mix_strategy': 'concat', 'interleave_probs': None, 'overwrite_cache': False, 'preprocessing_batch_size': 1000, 'preprocessing_num_workers': 16, 'max_samples': 100000000, 'eval_num_beams': None, 'ignore_pad_token_for_loss': True, 'val_size': 0.0, 'eval_on_each_dataset': False, 'packing': True, 'neat_packing': False, 'tool_format': None, 'default_system': None, 'enable_thinking': False, 'tokenized_path': None, 'data_shared_file_system': False}, 'finetuning_args': {'freeze_trainable_layers': 2, 'freeze_trainable_modules': ['all'], 'freeze_extra_modules': None, 'additional_target': None, 'module_dropout': 0.0, 'oft_rank': 0, 'oft_block_size': 32, 'oft_target': ['all'], 'create_new_adapter': False, 'lora_alpha': 32, 'lora_dropout': 0.03, 'lora_rank': 16, 'lora_target': ['all'], 'loraplus_lr_ratio': None, 'loraplus_lr_embedding': 1e-06, 'use_rslora': False, 'use_dora': False, 'pissa_init': False, 'pissa_iter': 16, 'pissa_convert': False, 'pref_beta': 0.1, 'pref_ftx': 0.0, 'pref_bco_weight': 0.0, 'pref_loss': 'sigmoid', 'dpo_label_smoothing': 0.0, 'kto_chosen_weight': 1.0, 'kto_rejected_weight': 1.0, 'simpo_gamma': 0.5, 'ppo_buffer_size': 1, 'ppo_epochs': 4, 'ppo_score_norm': False, 'ppo_target': 6.0, 'ppo_whiten_rewards': False, 'ref_model': None, 'ref_model_adapters': None, 'ref_model_quantization_bit': None, 'reward_model': None, 'reward_model_adapters': None, 'reward_model_quantization_bit': None, 'reward_model_type': 'lora', 'ld_alpha': None, 'use_galore': False, 'galore_target': ['all'], 'galore_rank': 16, 'galore_update_interval': 200, 'galore_scale': 2.0, 'galore_proj_type': 'std', 'galore_layerwise': False, 'use_apollo': False, 'apollo_target': ['all'], 'apollo_rank': 16, 'apollo_update_interval': 200, 'apollo_scale': 32.0, 'apollo_proj': 'random', 'apollo_proj_type': 'std', 'apollo_scale_type': 'channel', 'apollo_layerwise': False, 'apollo_scale_front': False, 'use_badam': False, 'badam_mode': 'layer', 'badam_start_block': None, 'badam_switch_mode': 'ascending', 'badam_switch_interval': 50, 'badam_update_ratio': 0.05, 'badam_mask_mode': 'adjacent', 'badam_verbose': 0, 'use_swanlab': False, 'swanlab_project': 'llamafactory', 'swanlab_workspace': None, 'swanlab_run_name': None, 'swanlab_mode': 'cloud', 'swanlab_api_key': '<SWANLAB_API_KEY>', 'swanlab_logdir': None, 'swanlab_lark_webhook_url': None, 'swanlab_lark_secret': None, 'pure_bf16': False, 'stage': 'pt', 'finetuning_type': 'lora', 'use_llama_pro': False, 'use_adam_mini': False, 'use_mca': False, 'use_muon': False, 'use_dft_loss': False, 'use_eaft_loss': False, 'eaft_alpha': 1.0, 'freeze_vision_tower': True, 'freeze_multi_modal_projector': True, 'freeze_language_model': False, 'compute_accuracy': False, 'disable_shuffling': False, 'early_stopping_steps': None, 'plot_loss': True, 'include_effective_tokens_per_second': False}, 'generating_args': {'do_sample': True, 'temperature': 0.95, 'top_p': 0.7, 'top_k': 50, 'num_beams': 1, 'max_new_tokens': 1024, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'skip_special_tokens': True}}
24
+ 2026-02-09 08:34:15,201 INFO wandb-AsyncioManager-main:2068 [service_client.py:_forward_responses():94] Reached EOF.
25
+ 2026-02-09 08:34:15,206 INFO wandb-AsyncioManager-main:2068 [mailbox.py:close():154] Closing mailbox, abandoning 1 handles.