File size: 10,436 Bytes
07dbf42 |
1 2 3 4 5 6 7 8 |
[2025-09-29 16:16:56,308] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:23243] Loading raw datasets...
[2025-09-29 16:16:56,541] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:23243] Loading dataset: /workspace/outputs/training_data/ with base_type: chat_template and prompt_style: None
Dropping Long Sequences (>1024) (num_proc=192): 0%| | 0/1918 [00:00<?, ? examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 1%|β | 10/1918 [00:02<08:04, 3.94 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 4%|β | 70/1918 [00:02<00:52, 35.45 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 8%|ββ | 150/1918 [00:02<00:20, 85.95 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 10%|βββ | 200/1918 [00:02<00:14, 122.09 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 14%|βββ | 260/1918 [00:03<00:09, 174.73 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 17%|ββββ | 320/1918 [00:03<00:06, 231.88 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 20%|βββββ | 380/1918 [00:03<00:05, 290.99 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 23%|ββββββ | 450/1918 [00:03<00:04, 357.52 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 27%|ββββββ | 520/1918 [00:03<00:03, 417.39 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 30%|βββββββ | 580/1918 [00:03<00:02, 446.62 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 34%|ββββββββ | 660/1918 [00:03<00:02, 515.42 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 38%|βββββββββ | 730/1918 [00:03<00:02, 556.77 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 42%|ββββββββββ | 800/1918 [00:03<00:01, 570.35 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 46%|ββββββββββ | 880/1918 [00:04<00:01, 614.25 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 83%|βββββββββββββββββ | 1590/1918 [00:04<00:00, 2349.82 examples/s]
Dropping Long Sequences (>1024) (num_proc=192): 100%|βββββββββββββββββββββ| 1918/1918 [00:04<00:00, 410.73 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 0%| | 0/1918 [00:00<?, ? examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 1%| | 10/1918 [00:02<07:59, 3.98 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 2%|β | 40/1918 [00:02<01:35, 19.64 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 4%|β | 80/1918 [00:02<00:40, 45.26 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 6%|β | 110/1918 [00:02<00:27, 66.20 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 7%|β | 140/1918 [00:03<00:20, 88.47 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 9%|ββ | 170/1918 [00:03<00:15, 113.89 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 10%|ββ | 200/1918 [00:03<00:12, 142.67 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 12%|ββ | 230/1918 [00:03<00:15, 106.32 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 14%|ββ | 260/1918 [00:03<00:12, 130.44 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 15%|ββ | 290/1918 [00:03<00:10, 153.03 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 17%|βββ | 320/1918 [00:04<00:09, 174.05 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 18%|βββ | 350/1918 [00:04<00:09, 172.70 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 20%|βββ | 380/1918 [00:04<00:07, 192.97 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 21%|βββ | 410/1918 [00:04<00:07, 203.26 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 23%|ββββ | 450/1918 [00:04<00:06, 230.57 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 25%|ββββ | 480/1918 [00:04<00:05, 243.14 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 27%|ββββ | 510/1918 [00:04<00:05, 252.14 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 28%|ββββ | 540/1918 [00:04<00:05, 253.64 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 30%|βββββ | 570/1918 [00:05<00:05, 261.18 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 31%|βββββ | 600/1918 [00:05<00:05, 254.92 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 33%|βββββ | 630/1918 [00:05<00:05, 253.38 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 34%|βββββ | 660/1918 [00:05<00:04, 262.62 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 36%|βββββ | 700/1918 [00:05<00:04, 273.67 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 44%|βββββββ | 850/1918 [00:05<00:01, 593.81 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 100%|βββββββββββββ| 1918/1918 [00:06<00:00, 307.84 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 0%| | 0/1918 [00:00<?, ? examples/s]
Add position_id column (Sample Packing) (num_proc=192): 1%| | 10/1918 [00:02<08:32, 3.72 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 4%|β | 70/1918 [00:02<00:58, 31.60 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 7%|β | 130/1918 [00:03<00:26, 66.22 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 8%|ββ | 160/1918 [00:03<00:22, 79.58 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 10%|ββ | 200/1918 [00:03<00:15, 108.67 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 13%|ββ | 240/1918 [00:03<00:12, 139.08 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 15%|ββ | 280/1918 [00:03<00:09, 174.94 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 17%|βββ | 320/1918 [00:03<00:07, 210.84 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 19%|βββ | 360/1918 [00:03<00:06, 245.13 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 21%|βββ | 400/1918 [00:03<00:05, 265.91 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 23%|ββββ | 440/1918 [00:04<00:05, 287.55 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 52%|βββββββ | 1000/1918 [00:04<00:00, 1554.43 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 100%|βββββββββββββ| 1918/1918 [00:04<00:00, 407.87 examples/s]
Saving the dataset (0/7 shards): 0%| | 0/1918 [00:00<?, ? examples/s]
Saving the dataset (0/7 shards): 14%|ββββββ | 274/1918 [00:00<00:01, 1299.27 examples/s]
Saving the dataset (1/7 shards): 14%|ββββββ | 274/1918 [00:00<00:01, 1299.27 examples/s]
Saving the dataset (2/7 shards): 29%|βββββββββββ | 548/1918 [00:00<00:01, 1299.27 examples/s]
Saving the dataset (3/7 shards): 43%|ββββββββββββββββ | 822/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (4/7 shards): 57%|ββββββββββββββββββββ | 1096/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (5/7 shards): 71%|βββββββββββββββββββββββββ | 1370/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (6/7 shards): 86%|ββββββββββββββββββββββββββββββ | 1644/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (7/7 shards): 100%|βββββββββββββββββββββββββββββββββββ| 1918/1918 [00:00<00:00, 1299.27 examples/s]
Saving the dataset (7/7 shards): 100%|βββββββββββββββββββββββββββββββββββ| 1918/1918 [00:00<00:00, 6023.49 examples/s]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|ββββββββββββ | 1/5 [00:00<00:02, 1.35it/s]
Loading checkpoint shards: 40%|βββββββββββββββββββββββ | 2/5 [00:01<00:02, 1.40it/s]
Loading checkpoint shards: 60%|ββββββββββββββββββββββββββββββββββ | 3/5 [00:02<00:01, 1.43it/s]
Loading checkpoint shards: 80%|βββββββββββββββββββββββββββββββββββββββββββββ | 4/5 [00:02<00:00, 1.51it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:02<00:00, 1.99it/s]
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:02<00:00, 1.70it/s]
|