| [2026-03-10 09:21:08,376] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:2542054] Loading raw datasets... | |
| [2026-03-10 09:21:08,520] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:2542054] Loading dataset: data/1b/all.jsonl with base_type: completion and prompt_style: None | |
| Dropping Invalid Sequences (<None or >2048) (num_proc=96): 0%| | 0/3150 [00:00<?, ? examples/s] Dropping Invalid Sequences (<None or >2048) (num_proc=96): 1%|β | 33/3150 [00:01<03:05, 16.84 examples/s] Dropping Invalid Sequences (<None or >2048) (num_proc=96): 8%|βββββββββ | 264/3150 [00:02<00:17, 161.07 examples/s] Dropping Invalid Sequences (<None or >2048) (num_proc=96): 15%|ββββββββββββββββ | 462/3150 [00:02<00:08, 307.72 examples/s] Dropping Invalid Sequences (<None or >2048) (num_proc=96): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3150/3150 [00:02<00:00, 1210.02 examples/s] | |
| Drop Samples with Zero Trainable Tokens (num_proc=96): 0%| | 0/3150 [00:00<?, ? examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 1%|ββ | 33/3150 [00:01<03:01, 17.20 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 6%|βββββββ | 198/3150 [00:02<00:22, 128.58 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 10%|ββββββββββββ | 330/3150 [00:02<00:12, 234.46 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 17%|βββββββββββββββββββ | 528/3150 [00:02<00:06, 415.89 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 22%|ββββββββββββββββββββββββ | 693/3150 [00:02<00:04, 579.04 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 27%|ββββββββββββββββββββββββββββββ | 858/3150 [00:02<00:03, 742.71 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 32%|βββββββββββββββββββββββββββββββββββ | 1023/3150 [00:02<00:02, 895.88 examples/s] Drop Samples with Zero Trainable Tokens (num_proc=96): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3150/3150 [00:02<00:00, 1087.78 examples/s] | |
| Add position_id column (Sample Packing) (num_proc=96): 0%| | 0/3150 [00:00<?, ? examples/s] Add position_id column (Sample Packing) (num_proc=96): 1%|ββ | 33/3150 [00:02<03:11, 16.30 examples/s] Add position_id column (Sample Packing) (num_proc=96): 9%|βββββββββββ | 297/3150 [00:02<00:17, 163.06 examples/s] Add position_id column (Sample Packing) (num_proc=96): 73%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 2310/3150 [00:02<00:00, 1661.54 examples/s] Add position_id column (Sample Packing) (num_proc=96): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3150/3150 [00:02<00:00, 1156.24 examples/s] | |
| Saving the dataset (0/12 shards): 0%| | 0/3150 [00:00<?, ? examples/s] Saving the dataset (0/12 shards): 8%|βββββββββββ | 263/3150 [00:00<00:04, 682.65 examples/s] Saving the dataset (1/12 shards): 8%|βββββββββββ | 263/3150 [00:00<00:04, 682.65 examples/s] Saving the dataset (2/12 shards): 17%|ββββββββββββββββββββββ | 526/3150 [00:00<00:03, 682.65 examples/s] Saving the dataset (3/12 shards): 25%|βββββββββββββββββββββββββββββββββ | 789/3150 [00:00<00:03, 682.65 examples/s] Saving the dataset (4/12 shards): 33%|βββββββββββββββββββββββββββββββββββββββββββ | 1052/3150 [00:00<00:03, 682.65 examples/s] Saving the dataset (5/12 shards): 42%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1315/3150 [00:00<00:02, 682.65 examples/s] Saving the dataset (6/12 shards): 50%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1578/3150 [00:00<00:02, 682.65 examples/s] Saving the dataset (7/12 shards): 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1840/3150 [00:00<00:01, 682.65 examples/s] Saving the dataset (8/12 shards): 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 2102/3150 [00:00<00:01, 682.65 examples/s] Saving the dataset (9/12 shards): 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 2364/3150 [00:00<00:01, 682.65 examples/s] Saving the dataset (10/12 shards): 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 2626/3150 [00:00<00:00, 682.65 examples/s] Saving the dataset (11/12 shards): 92%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 2888/3150 [00:00<00:00, 682.65 examples/s] Saving the dataset (12/12 shards): 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3150/3150 [00:00<00:00, 682.65 examples/s] Saving the dataset (12/12 shards): 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 3150/3150 [00:00<00:00, 6414.30 examples/s] | |
| Loading weights: 0%| | 0/355 [00:00<?, ?it/s] Loading weights: 0%|β | 1/355 [00:00<00:42, 8.25it/s] Loading weights: 1%|β | 2/355 [00:00<01:03, 5.58it/s] Loading weights: 4%|ββββββ | 13/355 [00:00<00:08, 38.03it/s] Loading weights: 7%|βββββββββββ | 24/355 [00:00<00:05, 59.35it/s] Loading weights: 10%|ββββββββββββββββ | 35/355 [00:00<00:04, 73.07it/s] Loading weights: 13%|ββββββββββββββββββββ | 45/355 [00:00<00:03, 80.14it/s] Loading weights: 15%|ββββββββββββββββββββββββ | 54/355 [00:00<00:03, 80.27it/s] Loading weights: 18%|βββββββββββββββββββββββββββββ | 64/355 [00:00<00:03, 80.50it/s] Loading weights: 21%|βββββββββββββββββββββββββββββββββ | 73/355 [00:01<00:03, 82.26it/s] Loading weights: 23%|βββββββββββββββββββββββββββββββββββββ | 82/355 [00:01<00:03, 81.25it/s] Loading weights: 26%|ββββββββββββββββββββββββββββββββββββββββββ | 93/355 [00:01<00:03, 86.12it/s] Loading weights: 29%|ββββββββββββββββββββββββββββββββββββββββββββββ | 104/355 [00:01<00:02, 89.73it/s] Loading weights: 32%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 115/355 [00:01<00:02, 90.21it/s] Loading weights: 35%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 126/355 [00:01<00:02, 94.40it/s] Loading weights: 39%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 137/355 [00:01<00:02, 94.91it/s] Loading weights: 42%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 148/355 [00:01<00:02, 96.13it/s] Loading weights: 45%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 159/355 [00:02<00:02, 96.60it/s] Loading weights: 48%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 170/355 [00:02<00:01, 95.71it/s] Loading weights: 51%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 181/355 [00:02<00:01, 96.71it/s] Loading weights: 54%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 192/355 [00:02<00:01, 96.25it/s] Loading weights: 57%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 203/355 [00:02<00:01, 95.92it/s] Loading weights: 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 214/355 [00:02<00:01, 96.33it/s] Loading weights: 63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 225/355 [00:02<00:01, 96.60it/s] Loading weights: 66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 236/355 [00:02<00:01, 98.07it/s] Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/355 [00:02<00:01, 100.82it/s] Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 262/355 [00:03<00:00, 111.85it/s] Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 274/355 [00:03<00:00, 109.61it/s] Loading weights: 81%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 288/355 [00:03<00:00, 115.60it/s] Loading weights: 85%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 300/355 [00:03<00:00, 110.81it/s] Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 312/355 [00:03<00:00, 109.12it/s] Loading weights: 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 324/355 [00:03<00:00, 108.06it/s] Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 339/355 [00:03<00:00, 115.02it/s] Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 355/355 [00:03<00:00, 93.64it/s] | |