File size: 10,436 Bytes
07dbf42
 
 
 
 
 
 
1
2
3
4
5
6
7
8
[2025-09-29 16:16:56,308] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:23243] Loading raw datasets...
[2025-09-29 16:16:56,541] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:23243] Loading dataset: /workspace/outputs/training_data/ with base_type: chat_template and prompt_style: None

Dropping Long Sequences (>1024) (num_proc=192):   0%|                                 | 0/1918 [00:00<?, ? examples/s]
Dropping Long Sequences (>1024) (num_proc=192):   1%|▏                       | 10/1918 [00:02<08:04,  3.94 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):   4%|β–‰                       | 70/1918 [00:02<00:52, 35.45 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):   8%|β–ˆβ–Š                     | 150/1918 [00:02<00:20, 85.95 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  10%|β–ˆβ–ˆβ–Ž                   | 200/1918 [00:02<00:14, 122.09 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  14%|β–ˆβ–ˆβ–‰                   | 260/1918 [00:03<00:09, 174.73 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  17%|β–ˆβ–ˆβ–ˆβ–‹                  | 320/1918 [00:03<00:06, 231.88 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž                 | 380/1918 [00:03<00:05, 290.99 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 450/1918 [00:03<00:04, 357.52 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                | 520/1918 [00:03<00:03, 417.39 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹               | 580/1918 [00:03<00:02, 446.62 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ              | 660/1918 [00:03<00:02, 515.42 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž             | 730/1918 [00:03<00:02, 556.77 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–            | 800/1918 [00:03<00:01, 570.35 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ            | 880/1918 [00:04<00:01, 614.25 examples/s]
Dropping Long Sequences (>1024) (num_proc=192):  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 1590/1918 [00:04<00:00, 2349.82 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:04<00:00, 410.73 examples/s]

Drop Samples with Zero Trainable Tokens (num_proc=192):   0%|                         | 0/1918 [00:00<?, ? examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   1%|                | 10/1918 [00:02<07:59,  3.98 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   2%|β–Ž               | 40/1918 [00:02<01:35, 19.64 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   4%|β–‹               | 80/1918 [00:02<00:40, 45.26 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   6%|β–Š              | 110/1918 [00:02<00:27, 66.20 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   7%|β–ˆ              | 140/1918 [00:03<00:20, 88.47 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   9%|β–ˆβ–            | 170/1918 [00:03<00:15, 113.89 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  10%|β–ˆβ–            | 200/1918 [00:03<00:12, 142.67 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  12%|β–ˆβ–‹            | 230/1918 [00:03<00:15, 106.32 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  14%|β–ˆβ–‰            | 260/1918 [00:03<00:12, 130.44 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  15%|β–ˆβ–ˆ            | 290/1918 [00:03<00:10, 153.03 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  17%|β–ˆβ–ˆβ–Ž           | 320/1918 [00:04<00:09, 174.05 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  18%|β–ˆβ–ˆβ–Œ           | 350/1918 [00:04<00:09, 172.70 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  20%|β–ˆβ–ˆβ–Š           | 380/1918 [00:04<00:07, 192.97 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  21%|β–ˆβ–ˆβ–‰           | 410/1918 [00:04<00:07, 203.26 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  23%|β–ˆβ–ˆβ–ˆβ–Ž          | 450/1918 [00:04<00:06, 230.57 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  25%|β–ˆβ–ˆβ–ˆβ–Œ          | 480/1918 [00:04<00:05, 243.14 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  27%|β–ˆβ–ˆβ–ˆβ–‹          | 510/1918 [00:04<00:05, 252.14 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  28%|β–ˆβ–ˆβ–ˆβ–‰          | 540/1918 [00:04<00:05, 253.64 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–         | 570/1918 [00:05<00:05, 261.18 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–         | 600/1918 [00:05<00:05, 254.92 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ         | 630/1918 [00:05<00:05, 253.38 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š         | 660/1918 [00:05<00:04, 262.62 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         | 700/1918 [00:05<00:04, 273.67 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 850/1918 [00:05<00:01, 593.81 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:06<00:00, 307.84 examples/s]

Add position_id column (Sample Packing) (num_proc=192):   0%|                         | 0/1918 [00:00<?, ? examples/s]
Add position_id column (Sample Packing) (num_proc=192):   1%|                | 10/1918 [00:02<08:32,  3.72 examples/s]
Add position_id column (Sample Packing) (num_proc=192):   4%|β–Œ               | 70/1918 [00:02<00:58, 31.60 examples/s]
Add position_id column (Sample Packing) (num_proc=192):   7%|β–ˆ              | 130/1918 [00:03<00:26, 66.22 examples/s]
Add position_id column (Sample Packing) (num_proc=192):   8%|β–ˆβ–Ž             | 160/1918 [00:03<00:22, 79.58 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  10%|β–ˆβ–            | 200/1918 [00:03<00:15, 108.67 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  13%|β–ˆβ–Š            | 240/1918 [00:03<00:12, 139.08 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  15%|β–ˆβ–ˆ            | 280/1918 [00:03<00:09, 174.94 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  17%|β–ˆβ–ˆβ–Ž           | 320/1918 [00:03<00:07, 210.84 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  19%|β–ˆβ–ˆβ–‹           | 360/1918 [00:03<00:06, 245.13 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  21%|β–ˆβ–ˆβ–‰           | 400/1918 [00:03<00:05, 265.91 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  23%|β–ˆβ–ˆβ–ˆβ–          | 440/1918 [00:04<00:05, 287.55 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 1000/1918 [00:04<00:00, 1554.43 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:04<00:00, 407.87 examples/s]

Saving the dataset (0/7 shards):   0%|                                                | 0/1918 [00:00<?, ? examples/s]
Saving the dataset (0/7 shards):  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 274/1918 [00:00<00:01, 1299.27 examples/s]
Saving the dataset (1/7 shards):  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 274/1918 [00:00<00:01, 1299.27 examples/s]
Saving the dataset (2/7 shards):  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                         | 548/1918 [00:00<00:01, 1299.27 examples/s]
Saving the dataset (3/7 shards):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                    | 822/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (4/7 shards):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               | 1096/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (5/7 shards):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ          | 1370/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (6/7 shards):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1644/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (7/7 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (7/7 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:00<00:00, 6023.49 examples/s]

Loading checkpoint shards:   0%|                                                                | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                            | 1/5 [00:00<00:02,  1.35it/s]
Loading checkpoint shards:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                 | 2/5 [00:01<00:02,  1.40it/s]
Loading checkpoint shards:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                      | 3/5 [00:02<00:01,  1.43it/s]
Loading checkpoint shards:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š           | 4/5 [00:02<00:00,  1.51it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:02<00:00,  1.99it/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:02<00:00,  1.70it/s]