File size: 18,835 Bytes
abbfc79 |
1 2 3 4 5 6 7 8 9 10 11 |
[2025-09-29 13:54:57,552] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:386] Loading raw datasets...
[2025-09-29 13:54:57,781] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:386] Loading dataset: /workspace/outputs/training_data/ with base_type: concisechoice and prompt_style: None
Dropping Long Sequences (>2048) (num_proc=192): 0%| | 0/1918 [00:00<?, ? examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 1%|β | 10/1918 [00:02<08:51, 3.59 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 4%|ββ | 70/1918 [00:02<00:57, 32.31 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 6%|βββ | 120/1918 [00:03<00:29, 61.76 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 9%|ββββ | 170/1918 [00:03<00:17, 97.79 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 13%|ββββββ | 240/1918 [00:03<00:10, 159.17 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 16%|ββββββββ | 310/1918 [00:03<00:07, 225.43 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 19%|βββββββββ | 370/1918 [00:03<00:05, 277.98 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 23%|βββββββββββ | 440/1918 [00:03<00:04, 348.37 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 27%|ββββββββββββ | 510/1918 [00:03<00:03, 412.44 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 30%|ββββββββββββββ | 570/1918 [00:03<00:03, 433.71 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 33%|βββββββββββββββ | 630/1918 [00:03<00:02, 462.93 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 36%|βββββββββββββββββ | 700/1918 [00:04<00:02, 515.98 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 41%|βββββββββββββββββββ | 780/1918 [00:04<00:02, 567.85 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 44%|ββββββββββββββββββββ | 850/1918 [00:04<00:01, 602.15 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 71%|βββββββββββββββββββββββββββββββ | 1360/1918 [00:04<00:00, 1822.42 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 100%|ββββββββββββββββββββββββββββββββββββββββββββ| 1918/1918 [00:04<00:00, 384.26 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 0%| | 0/1918 [00:00<?, ? examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 1%|β | 10/1918 [00:02<08:36, 3.70 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 2%|β | 40/1918 [00:02<01:43, 18.10 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 4%|ββ | 70/1918 [00:02<00:50, 36.40 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 5%|ββ | 100/1918 [00:03<00:31, 57.99 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 7%|βββ | 130/1918 [00:03<00:21, 81.78 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 8%|βββ | 160/1918 [00:03<00:16, 109.60 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 10%|ββββ | 190/1918 [00:03<00:13, 132.20 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 11%|βββββ | 220/1918 [00:03<00:10, 156.22 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 13%|βββββ | 250/1918 [00:03<00:09, 182.28 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 15%|ββββββ | 280/1918 [00:03<00:07, 205.47 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 16%|ββββββ | 310/1918 [00:03<00:07, 217.09 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 18%|βββββββ | 340/1918 [00:04<00:11, 135.99 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 20%|ββββββββ | 380/1918 [00:04<00:09, 168.36 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 21%|ββββββββ | 410/1918 [00:04<00:08, 178.74 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 23%|βββββββββ | 440/1918 [00:04<00:07, 201.23 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 25%|βββββββββ | 470/1918 [00:04<00:06, 216.83 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 27%|ββββββββββ | 510/1918 [00:04<00:05, 257.45 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 28%|βββββββββββ | 540/1918 [00:05<00:05, 263.84 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 30%|βββββββββββ | 570/1918 [00:05<00:04, 271.00 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 31%|ββββββββββββ | 600/1918 [00:05<00:04, 269.63 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 33%|βββββββββββββ | 640/1918 [00:05<00:04, 285.61 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 35%|βββββββββββββ | 680/1918 [00:05<00:04, 297.20 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 38%|ββββββββββββββ | 720/1918 [00:05<00:04, 268.02 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 39%|βββββββββββββββ | 750/1918 [00:05<00:04, 272.21 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 47%|ββββββββββββββββββ | 900/1918 [00:05<00:01, 579.47 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 100%|ββββββββββββββββββββββββββββββββββββ| 1918/1918 [00:06<00:00, 294.44 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 0%| | 0/1918 [00:00<?, ? examples/s]
Add position_id column (Sample Packing) (num_proc=192): 1%|β | 10/1918 [00:02<08:50, 3.60 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 3%|β | 50/1918 [00:02<01:21, 22.94 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 5%|ββ | 90/1918 [00:03<00:41, 44.51 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 8%|βββ | 150/1918 [00:03<00:20, 86.42 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 10%|ββββ | 190/1918 [00:03<00:15, 109.34 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 11%|βββββ | 220/1918 [00:03<00:12, 131.54 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 14%|βββββ | 260/1918 [00:03<00:10, 164.54 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 16%|ββββββ | 310/1918 [00:03<00:07, 215.03 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 18%|βββββββ | 350/1918 [00:03<00:06, 241.83 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 21%|ββββββββ | 400/1918 [00:03<00:05, 281.68 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 23%|βββββββββ | 450/1918 [00:04<00:04, 315.51 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 26%|ββββββββββ | 490/1918 [00:04<00:04, 326.76 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 75%|βββββββββββββββββββββββββββ | 1440/1918 [00:04<00:00, 2553.43 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 100%|ββββββββββββββββββββββββββββββββββββ| 1918/1918 [00:04<00:00, 395.06 examples/s]
Saving the dataset (0/7 shards): 0%| | 0/1918 [00:00<?, ? examples/s]
Saving the dataset (0/7 shards): 14%|βββββββββ | 274/1918 [00:00<00:01, 1194.77 examples/s]
Saving the dataset (1/7 shards): 14%|βββββββββ | 274/1918 [00:00<00:01, 1194.77 examples/s]
Saving the dataset (2/7 shards): 29%|βββββββββββββββββ | 548/1918 [00:00<00:01, 1194.77 examples/s]
Saving the dataset (3/7 shards): 43%|ββββββββββββββββββββββββββ | 822/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (4/7 shards): 57%|ββββββββββββββββββββββββββββββββββ | 1096/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (5/7 shards): 71%|ββββββββββββββββββββββββββββββββββββββββββ | 1370/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (6/7 shards): 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 1644/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (7/7 shards): 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1918/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (7/7 shards): 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1918/1918 [00:00<00:00, 5712.48 examples/s]
model.safetensors.index.json: 0.00B [00:00, ?B/s]
model.safetensors.index.json: 32.9kB [00:00, 126MB/s]
model-00001-of-00005.safetensors: 0%| | 0.00/4.00G [00:00<?, ?B/s]
model-00001-of-00005.safetensors: 0%| | 893k/4.00G [00:00<1:12:52, 914kB/s]
model-00001-of-00005.safetensors: 2%|β | 67.7M/4.00G [00:01<00:57, 68.6MB/s]
model-00001-of-00005.safetensors: 8%|βββββ | 302M/4.00G [00:01<00:12, 289MB/s]
model-00001-of-00005.safetensors: 18%|βββββββββββββ | 739M/4.00G [00:01<00:04, 804MB/s]
model-00001-of-00005.safetensors: 25%|βββββββββββββββββ | 981M/4.00G [00:01<00:03, 920MB/s]
model-00001-of-00005.safetensors: 30%|ββββββββββββββββββββ | 1.18G/4.00G [00:02<00:02, 980MB/s]
model-00001-of-00005.safetensors: 35%|βββββββββββββββββββββββ | 1.39G/4.00G [00:02<00:02, 1.07GB/s]
model-00001-of-00005.safetensors: 39%|ββββββββββββββββββββββββββ | 1.56G/4.00G [00:02<00:02, 1.13GB/s]
model-00001-of-00005.safetensors: 43%|ββββββββββββββββββββββββββββ | 1.71G/4.00G [00:02<00:01, 1.15GB/s]
model-00001-of-00005.safetensors: 46%|βββββββββββββββββββββββββββββββ | 1.85G/4.00G [00:02<00:01, 1.19GB/s]
model-00001-of-00005.safetensors: 51%|βββββββββββββββββββββββββββββββββ | 2.02G/4.00G [00:02<00:01, 1.28GB/s]
model-00001-of-00005.safetensors: 55%|ββββββββββββββββββββββββββββββββββββ | 2.19G/4.00G [00:02<00:01, 1.35GB/s]
model-00001-of-00005.safetensors: 59%|βββββββββββββββββββββββββββββββββββββββ | 2.37G/4.00G [00:02<00:01, 1.42GB/s]
model-00001-of-00005.safetensors: 64%|ββββββββββββββββββββββββββββββββββββββββββ | 2.56G/4.00G [00:02<00:00, 1.48GB/s]
model-00001-of-00005.safetensors: 69%|βββββββββββββββββββββββββββββββββββββββββββββ | 2.74G/4.00G [00:03<00:00, 1.53GB/s]
model-00001-of-00005.safetensors: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 2.94G/4.00G [00:03<00:00, 1.52GB/s]
model-00001-of-00005.safetensors: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 3.14G/4.00G [00:03<00:00, 1.53GB/s]
model-00001-of-00005.safetensors: 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 3.34G/4.00G [00:03<00:00, 1.56GB/s]
model-00001-of-00005.safetensors: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 3.53G/4.00G [00:03<00:00, 1.57GB/s]
model-00001-of-00005.safetensors: 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 3.73G/4.00G [00:03<00:00, 1.58GB/s]
model-00001-of-00005.safetensors: 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 3.93G/4.00G [00:03<00:00, 1.58GB/s]
model-00001-of-00005.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.00G/4.00G [00:03<00:00, 1.02GB/s]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|ββββββββββββββββ | 1/5 [00:02<00:11, 2.87s/it]
Loading checkpoint shards: 40%|ββββββββββββββββββββββββββββββββ | 2/5 [00:06<00:10, 3.44s/it]
Loading checkpoint shards: 60%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 3/5 [00:10<00:07, 3.60s/it]
Loading checkpoint shards: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 4/5 [00:13<00:03, 3.41s/it]
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:13<00:00, 2.26s/it]
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:13<00:00, 2.76s/it]
generation_config.json: 0%| | 0.00/239 [00:00<?, ?B/s]
generation_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 239/239 [00:00<00:00, 1.93MB/s]
|