File size: 18,835 Bytes
abbfc79
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
[2025-09-29 13:54:57,552] [INFO] [axolotl.utils.data.sft._load_raw_datasets:320] [PID:386] Loading raw datasets...
[2025-09-29 13:54:57,781] [INFO] [axolotl.utils.data.wrappers.get_dataset_wrapper:87] [PID:386] Loading dataset: /workspace/outputs/training_data/ with base_type: concisechoice and prompt_style: None

Dropping Long Sequences (>2048) (num_proc=192):   0%|                                                        | 0/1918 [00:00<?, ? examples/s]
Dropping Long Sequences (>2048) (num_proc=192):   1%|▏                                              | 10/1918 [00:02<08:51,  3.59 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):   4%|β–ˆβ–‹                                             | 70/1918 [00:02<00:57, 32.31 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):   6%|β–ˆβ–ˆβ–‰                                           | 120/1918 [00:03<00:29, 61.76 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):   9%|β–ˆβ–ˆβ–ˆβ–ˆ                                          | 170/1918 [00:03<00:17, 97.79 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                       | 240/1918 [00:03<00:10, 159.17 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                     | 310/1918 [00:03<00:07, 225.43 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                    | 370/1918 [00:03<00:05, 277.98 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                  | 440/1918 [00:03<00:04, 348.37 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                 | 510/1918 [00:03<00:03, 412.44 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                               | 570/1918 [00:03<00:03, 433.71 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                              | 630/1918 [00:03<00:02, 462.93 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                            | 700/1918 [00:04<00:02, 515.98 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                          | 780/1918 [00:04<00:02, 567.85 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                         | 850/1918 [00:04<00:01, 602.15 examples/s]
Dropping Long Sequences (>2048) (num_proc=192):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–            | 1360/1918 [00:04<00:00, 1822.42 examples/s]
Dropping Long Sequences (>2048) (num_proc=192): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:04<00:00, 384.26 examples/s]

Drop Samples with Zero Trainable Tokens (num_proc=192):   0%|                                                | 0/1918 [00:00<?, ? examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   1%|▏                                      | 10/1918 [00:02<08:36,  3.70 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   2%|β–Š                                      | 40/1918 [00:02<01:43, 18.10 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   4%|β–ˆβ–                                     | 70/1918 [00:02<00:50, 36.40 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   5%|β–ˆβ–‰                                    | 100/1918 [00:03<00:31, 57.99 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   7%|β–ˆβ–ˆβ–Œ                                   | 130/1918 [00:03<00:21, 81.78 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):   8%|β–ˆβ–ˆβ–ˆ                                  | 160/1918 [00:03<00:16, 109.60 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  10%|β–ˆβ–ˆβ–ˆβ–‹                                 | 190/1918 [00:03<00:13, 132.20 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  11%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                | 220/1918 [00:03<00:10, 156.22 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  13%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                                | 250/1918 [00:03<00:09, 182.28 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                               | 280/1918 [00:03<00:07, 205.47 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                               | 310/1918 [00:03<00:07, 217.09 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                              | 340/1918 [00:04<00:11, 135.99 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                             | 380/1918 [00:04<00:09, 168.36 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                             | 410/1918 [00:04<00:08, 178.74 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                            | 440/1918 [00:04<00:07, 201.23 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                            | 470/1918 [00:04<00:06, 216.83 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                           | 510/1918 [00:04<00:05, 257.45 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                          | 540/1918 [00:05<00:05, 263.84 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                          | 570/1918 [00:05<00:04, 271.00 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                         | 600/1918 [00:05<00:04, 269.63 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                        | 640/1918 [00:05<00:04, 285.61 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                        | 680/1918 [00:05<00:04, 297.20 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                       | 720/1918 [00:05<00:04, 268.02 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                      | 750/1918 [00:05<00:04, 272.21 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192):  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                   | 900/1918 [00:05<00:01, 579.47 examples/s]
Drop Samples with Zero Trainable Tokens (num_proc=192): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:06<00:00, 294.44 examples/s]

Add position_id column (Sample Packing) (num_proc=192):   0%|                                                | 0/1918 [00:00<?, ? examples/s]
Add position_id column (Sample Packing) (num_proc=192):   1%|▏                                      | 10/1918 [00:02<08:50,  3.60 examples/s]
Add position_id column (Sample Packing) (num_proc=192):   3%|β–ˆ                                      | 50/1918 [00:02<01:21, 22.94 examples/s]
Add position_id column (Sample Packing) (num_proc=192):   5%|β–ˆβ–Š                                     | 90/1918 [00:03<00:41, 44.51 examples/s]
Add position_id column (Sample Packing) (num_proc=192):   8%|β–ˆβ–ˆβ–‰                                   | 150/1918 [00:03<00:20, 86.42 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  10%|β–ˆβ–ˆβ–ˆβ–‹                                 | 190/1918 [00:03<00:15, 109.34 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  11%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                | 220/1918 [00:03<00:12, 131.54 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                | 260/1918 [00:03<00:10, 164.54 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                               | 310/1918 [00:03<00:07, 215.03 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                              | 350/1918 [00:03<00:06, 241.83 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                             | 400/1918 [00:03<00:05, 281.68 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                            | 450/1918 [00:04<00:04, 315.51 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                           | 490/1918 [00:04<00:04, 326.76 examples/s]
Add position_id column (Sample Packing) (num_proc=192):  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž        | 1440/1918 [00:04<00:00, 2553.43 examples/s]
Add position_id column (Sample Packing) (num_proc=192): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:04<00:00, 395.06 examples/s]

Saving the dataset (0/7 shards):   0%|                                                                       | 0/1918 [00:00<?, ? examples/s]
Saving the dataset (0/7 shards):  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                  | 274/1918 [00:00<00:01, 1194.77 examples/s]
Saving the dataset (1/7 shards):  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                  | 274/1918 [00:00<00:01, 1194.77 examples/s]
Saving the dataset (2/7 shards):  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                          | 548/1918 [00:00<00:01, 1194.77 examples/s]
Saving the dataset (3/7 shards):  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                 | 822/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (4/7 shards):  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                        | 1096/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (5/7 shards):  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 1370/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (6/7 shards):  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹        | 1644/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (7/7 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:00<00:00, 1194.77 examples/s]
Saving the dataset (7/7 shards): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1918/1918 [00:00<00:00, 5712.48 examples/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]
model.safetensors.index.json: 32.9kB [00:00, 126MB/s]

model-00001-of-00005.safetensors:   0%|                                                                          | 0.00/4.00G [00:00<?, ?B/s]
model-00001-of-00005.safetensors:   0%|                                                                 | 893k/4.00G [00:00<1:12:52, 914kB/s]
model-00001-of-00005.safetensors:   2%|β–ˆ                                                                | 67.7M/4.00G [00:01<00:57, 68.6MB/s]
model-00001-of-00005.safetensors:   8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                              | 302M/4.00G [00:01<00:12, 289MB/s]
model-00001-of-00005.safetensors:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                      | 739M/4.00G [00:01<00:04, 804MB/s]
model-00001-of-00005.safetensors:  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                  | 981M/4.00G [00:01<00:03, 920MB/s]
model-00001-of-00005.safetensors:  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                              | 1.18G/4.00G [00:02<00:02, 980MB/s]
model-00001-of-00005.safetensors:  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                          | 1.39G/4.00G [00:02<00:02, 1.07GB/s]
model-00001-of-00005.safetensors:  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                       | 1.56G/4.00G [00:02<00:02, 1.13GB/s]
model-00001-of-00005.safetensors:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                     | 1.71G/4.00G [00:02<00:01, 1.15GB/s]
model-00001-of-00005.safetensors:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                  | 1.85G/4.00G [00:02<00:01, 1.19GB/s]
model-00001-of-00005.safetensors:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                | 2.02G/4.00G [00:02<00:01, 1.28GB/s]
model-00001-of-00005.safetensors:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                             | 2.19G/4.00G [00:02<00:01, 1.35GB/s]
model-00001-of-00005.safetensors:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                          | 2.37G/4.00G [00:02<00:01, 1.42GB/s]
model-00001-of-00005.safetensors:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                       | 2.56G/4.00G [00:02<00:00, 1.48GB/s]
model-00001-of-00005.safetensors:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                    | 2.74G/4.00G [00:03<00:00, 1.53GB/s]
model-00001-of-00005.safetensors:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                 | 2.94G/4.00G [00:03<00:00, 1.52GB/s]
model-00001-of-00005.safetensors:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              | 3.14G/4.00G [00:03<00:00, 1.53GB/s]
model-00001-of-00005.safetensors:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž          | 3.34G/4.00G [00:03<00:00, 1.56GB/s]
model-00001-of-00005.safetensors:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 3.53G/4.00G [00:03<00:00, 1.57GB/s]
model-00001-of-00005.safetensors:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 3.73G/4.00G [00:03<00:00, 1.58GB/s]
model-00001-of-00005.safetensors:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 3.93G/4.00G [00:03<00:00, 1.58GB/s]
model-00001-of-00005.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.00G/4.00G [00:03<00:00, 1.02GB/s]

Loading checkpoint shards:   0%|                                                                                       | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                               | 1/5 [00:02<00:11,  2.87s/it]
Loading checkpoint shards:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                               | 2/5 [00:06<00:10,  3.44s/it]
Loading checkpoint shards:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                               | 3/5 [00:10<00:07,  3.60s/it]
Loading checkpoint shards:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–               | 4/5 [00:13<00:03,  3.41s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:13<00:00,  2.26s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:13<00:00,  2.76s/it]

generation_config.json:   0%|                                                                                      | 0.00/239 [00:00<?, ?B/s]
generation_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 239/239 [00:00<00:00, 1.93MB/s]