[2025-09-29 13:54:57,552] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:386] Loading raw datasets... [2025-09-29 13:54:57,781] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:386] Loading dataset: /workspace/outputs/training_data/ with base_type: concisechoice and prompt_style: None Dropping Long Sequences (>2048) (num_proc=192): 0%| | 0/1918 [00:002048) (num_proc=192): 1%|▏ | 10/1918 [00:02<08:51, 3.59 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 4%|█▋ | 70/1918 [00:02<00:57, 32.31 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 6%|██▉ | 120/1918 [00:03<00:29, 61.76 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 9%|████ | 170/1918 [00:03<00:17, 97.79 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 13%|█████▋ | 240/1918 [00:03<00:10, 159.17 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 16%|███████▎ | 310/1918 [00:03<00:07, 225.43 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 19%|████████▋ | 370/1918 [00:03<00:05, 277.98 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 23%|██████████▎ | 440/1918 [00:03<00:04, 348.37 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 27%|███████████▉ | 510/1918 [00:03<00:03, 412.44 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 30%|█████████████▎ | 570/1918 [00:03<00:03, 433.71 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 33%|██████████████▊ | 630/1918 [00:03<00:02, 462.93 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 36%|████████████████▍ | 700/1918 [00:04<00:02, 515.98 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 41%|██████████████████▎ | 780/1918 [00:04<00:02, 567.85 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 44%|███████████████████▉ | 850/1918 [00:04<00:01, 602.15 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 71%|██████████████████████████████▍ | 1360/1918 [00:04<00:00, 1822.42 examples/s] Dropping Long Sequences (>2048) (num_proc=192): 100%|████████████████████████████████████████████| 1918/1918 [00:04<00:00, 384.26 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=192): 0%| | 0/1918 [00:00