[2026-01-03 15:17:19,855] [DEBUG] [axolotl.utils.config.resolve_dtype:66] [PID:284] bf16 support detected, enabling for this configuration. config.json: 0%| | 0.00/727 [00:00 [2026-01-03 15:17:22,710] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:284] BOS: None / None [2026-01-03 15:17:22,710] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:282] [PID:284] PAD: 151643 / <|endoftext|> [2026-01-03 15:17:22,710] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:283] [PID:284] UNK: None / None [2026-01-03 15:17:22,713] [INFO] [axolotl.utils.data.shared.load_preprocessed_dataset:481] [PID:284] Unable to find prepared dataset in last_run_prepared/90a4bd078072b9d1de83a8db5d6b8671 [2026-01-03 15:17:22,713] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:284] Loading raw datasets... [2026-01-03 15:17:22,714] [WARNING] [axolotl.utils.data.sft._load_raw_datasets:322] [PID:284] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset using `axolotl preprocess path/to/config.yml`. Generating train split: 0 examples [00:00, ? examples/s] Generating train split: 503 examples [00:00, 22482.50 examples/s] [2026-01-03 15:17:23,108] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:284] Loading dataset: data.jsonl with base_type: chat_template and prompt_style: None [2026-01-03 15:17:23,136] [INFO] [axolotl.prompt_strategies.chat_template.__call__:996] [PID:284] Using chat template: --- {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + ' ' + message['content'] + '<|im_end|>' + ' '}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant ' }}{% endif %} --- Tokenizing Prompts (num_proc=24): 0%| | 0/503 [00:008192) (num_proc=24): 0%| | 0/503 [00:008192) (num_proc=24): 4%|█████▍ | 21/503 [00:00<00:18, 26.29 examples/s] Dropping Long Sequences (>8192) (num_proc=24): 25%|████████████████████████████████ | 126/503 [00:00<00:02, 180.11 examples/s] Dropping Long Sequences (>8192) (num_proc=24): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 503/503 [00:01<00:00, 810.59 examples/s] Dropping Long Sequences (>8192) (num_proc=24): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 503/503 [00:01<00:00, 414.08 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=24): 0%| | 0/503 [00:00 [2026-01-03 15:17:42,821] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:281] [PID:284] BOS: None / None [2026-01-03 15:17:42,821] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:282] [PID:284] PAD: 151643 / <|endoftext|> [2026-01-03 15:17:42,821] [DEBUG] [axolotl.loaders.tokenizer.load_tokenizer:283] [PID:284] UNK: None / None [2026-01-03 15:17:42,821] [DEBUG] [axolotl.train.setup_model_and_tokenizer:82] [PID:284] Loading model [2026-01-03 15:17:42,956] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_evaluation_loop:87] [PID:284] Patched Trainer.evaluation_loop with nanmean loss calculation [2026-01-03 15:17:42,961] [DEBUG] [axolotl.monkeypatch.transformers.trainer_loss_calc.patch_maybe_log_save_evaluate:138] [PID:284] Patched Trainer._maybe_log_save_evaluate with nanmean loss calculation [2026-01-03 15:17:42,961] [INFO] [axolotl.loaders.patch_manager._apply_multipack_patches:301] [PID:284] Applying multipack dataloader patch for sample packing... model.safetensors.index.json: 0.00B [00:00, ?B/s] model.safetensors.index.json: 32.8kB [00:00, 47.6MB/s] model-00001-of-00003.safetensors: 0%| | 0.00/3.96G [00:00