File size: 178,911 Bytes
4423878
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
[2026-02-15 03:59:23,355] [WARNING] [py.warnings._showwarnmsg:110] [PID:6181] /root/axolotl/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. 
  warnings.warn(  # warn only once


Loading weights:   0%|                                                                                                                                                                    | 0/291 [00:00<?, ?it/s]
Loading weights:   0%|▍                                                                                                                     | 1/291 [00:00<00:00, 6584.46it/s, Materializing param=lm_head.weight]
Loading weights:   0%|▍                                                                                                                     | 1/291 [00:00<00:00, 3013.15it/s, Materializing param=lm_head.weight]
Loading weights:   1%|▋                                                                                                          | 2/291 [00:00<00:00, 3184.74it/s, Materializing param=model.embed_tokens.weight]
Loading weights:   1%|▋                                                                                                          | 2/291 [00:00<00:00, 2559.84it/s, Materializing param=model.embed_tokens.weight]
Loading weights:   1%|▉                                                                                              | 3/291 [00:00<00:00, 2903.30it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights:   1%|▉                                                                                              | 3/291 [00:00<00:00, 2543.54it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights:   1%|█▎                                                                                               | 4/291 [00:00<00:00, 2859.59it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights:   1%|█▎                                                                                               | 4/291 [00:00<00:00, 2592.28it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights:   2%|█▋                                                                                               | 5/291 [00:00<00:00, 2857.93it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights:   2%|█▋                                                                                               | 5/291 [00:00<00:00, 2619.80it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights:   2%|██                                                                                                 | 6/291 [00:00<00:00, 2826.03it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights:   2%|██                                                                                                 | 6/291 [00:00<00:00, 2651.27it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights:   2%|██                                                                                    | 7/291 [00:00<00:00, 2811.73it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights:   2%|██                                                                                    | 7/291 [00:00<00:00, 2664.02it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights:   3%|██▌                                                                                           | 8/291 [00:00<00:00, 2640.21it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights:   3%|██▌                                                                                           | 8/291 [00:00<00:00, 2505.93it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights:   3%|██▉                                                                                           | 9/291 [00:00<00:00, 2643.10it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights:   3%|██▉                                                                                           | 9/291 [00:00<00:00, 2368.03it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights:   3%|███▏                                                                                         | 10/291 [00:00<00:00, 2455.83it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights:   3%|███▏                                                                                         | 10/291 [00:00<00:00, 2371.40it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights:   4%|███▌                                                                                         | 11/291 [00:00<00:00, 2475.18it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights:   4%|███▌                                                                                         | 11/291 [00:00<00:00, 2399.86it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights:   4%|███▉                                                                                          | 12/291 [00:00<00:00, 2497.97it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights:   4%|███▉                                                                                          | 12/291 [00:00<00:00, 2428.08it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights:   4%|████▎                                                                                           | 13/291 [00:00<00:00, 2519.80it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights:   4%|████▎                                                                                           | 13/291 [00:00<00:00, 2453.14it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights:   5%|████▌                                                                                           | 14/291 [00:00<00:00, 2320.23it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights:   5%|████▌                                                                                           | 14/291 [00:00<00:00, 2262.56it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights:   5%|█████                                                                                             | 15/291 [00:00<00:00, 2344.41it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights:   5%|█████                                                                                             | 15/291 [00:00<00:00, 2297.41it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights:   5%|████▋                                                                                | 16/291 [00:00<00:00, 2221.49it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights:   5%|████▋                                                                                | 16/291 [00:00<00:00, 2175.26it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights:   6%|█████▍                                                                                       | 17/291 [00:00<00:00, 2185.81it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights:   6%|█████▍                                                                                       | 17/291 [00:00<00:00, 2034.97it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights:   6%|█████▊                                                                                       | 18/291 [00:00<00:00, 2093.43it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights:   6%|█████▊                                                                                       | 18/291 [00:00<00:00, 2059.40it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights:   7%|██████                                                                                       | 19/291 [00:00<00:00, 2073.15it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights:   7%|██████                                                                                       | 19/291 [00:00<00:00, 2040.03it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights:   7%|██████▍                                                                                      | 20/291 [00:00<00:00, 2100.14it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights:   7%|██████▍                                                                                      | 20/291 [00:00<00:00, 2069.68it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights:   7%|██████▊                                                                                       | 21/291 [00:00<00:00, 2079.82it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights:   7%|██████▊                                                                                       | 21/291 [00:00<00:00, 2058.58it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights:   8%|███████▎                                                                                        | 22/291 [00:00<00:00, 2126.34it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights:   8%|███████▎                                                                                        | 22/291 [00:00<00:00, 2106.68it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights:   8%|███████▌                                                                                        | 23/291 [00:00<00:00, 2026.15it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights:   8%|███████▌                                                                                        | 23/291 [00:00<00:00, 2000.81it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights:   8%|████████                                                                                          | 24/291 [00:00<00:00, 2059.19it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights:   8%|████████                                                                                          | 24/291 [00:00<00:00, 2039.70it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights:   9%|███████▎                                                                             | 25/291 [00:00<00:00, 2089.46it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights:   9%|███████▎                                                                             | 25/291 [00:00<00:00, 2029.92it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights:   9%|████████▎                                                                                    | 26/291 [00:00<00:00, 2081.22it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights:   9%|████████▎                                                                                    | 26/291 [00:00<00:00, 2063.54it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights:   9%|████████▋                                                                                    | 27/291 [00:00<00:00, 2110.75it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights:   9%|████████▋                                                                                    | 27/291 [00:00<00:00, 2025.94it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights:  10%|████████▉                                                                                    | 28/291 [00:00<00:00, 2048.68it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights:  10%|████████▉                                                                                    | 28/291 [00:00<00:00, 2032.97it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights:  10%|█████████▎                                                                                   | 29/291 [00:00<00:00, 2080.69it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights:  10%|█████████▎                                                                                   | 29/291 [00:00<00:00, 2065.88it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights:  10%|█████████▋                                                                                    | 30/291 [00:00<00:00, 2112.64it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights:  10%|█████████▋                                                                                    | 30/291 [00:00<00:00, 2097.96it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights:  11%|██████████▏                                                                                     | 31/291 [00:00<00:00, 2145.42it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights:  11%|██████████▏                                                                                     | 31/291 [00:00<00:00, 2130.59it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights:  11%|██████████▌                                                                                     | 32/291 [00:00<00:00, 2038.70it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights:  11%|██████████▌                                                                                     | 32/291 [00:00<00:00, 2021.14it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights:  11%|███████████                                                                                       | 33/291 [00:00<00:00, 2059.67it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights:  11%|███████████                                                                                       | 33/291 [00:00<00:00, 2045.73it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights:  12%|█████████▉                                                                           | 34/291 [00:00<00:00, 2023.70it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights:  12%|█████████▉                                                                           | 34/291 [00:00<00:00, 2010.81it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights:  12%|███████████▏                                                                                 | 35/291 [00:00<00:00, 2050.95it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights:  12%|███████████▏                                                                                 | 35/291 [00:00<00:00, 2038.95it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights:  12%|███████████▌                                                                                 | 36/291 [00:00<00:00, 2079.02it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights:  12%|███████████▌                                                                                 | 36/291 [00:00<00:00, 2066.56it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights:  13%|███████████▊                                                                                 | 37/291 [00:00<00:00, 2105.72it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights:  13%|███████████▊                                                                                 | 37/291 [00:00<00:00, 2094.46it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights:  13%|████████████▏                                                                                | 38/291 [00:00<00:00, 2132.88it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights:  13%|████████████▏                                                                                | 38/291 [00:00<00:00, 1989.73it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights:  13%|████████████▌                                                                                 | 39/291 [00:00<00:00, 2022.73it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights:  13%|████████████▌                                                                                 | 39/291 [00:00<00:00, 2011.90it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights:  14%|█████████████▏                                                                                  | 40/291 [00:00<00:00, 2047.65it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights:  14%|█████████████▏                                                                                  | 40/291 [00:00<00:00, 2037.21it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights:  14%|█████████████▌                                                                                  | 41/291 [00:00<00:00, 2037.78it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights:  14%|█████████████▌                                                                                  | 41/291 [00:00<00:00, 2026.85it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights:  14%|██████████████▏                                                                                   | 42/291 [00:00<00:00, 2059.83it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights:  14%|██████████████▏                                                                                   | 42/291 [00:00<00:00, 2049.38it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights:  15%|████████████▌                                                                        | 43/291 [00:00<00:00, 2082.12it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights:  15%|████████████▌                                                                        | 43/291 [00:00<00:00, 2072.07it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights:  15%|██████████████                                                                               | 44/291 [00:00<00:00, 2103.17it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights:  15%|██████████████                                                                               | 44/291 [00:00<00:00, 2092.99it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights:  15%|██████████████▍                                                                              | 45/291 [00:00<00:00, 2122.31it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights:  15%|██████████████▍                                                                              | 45/291 [00:00<00:00, 2056.39it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights:  16%|██████████████▋                                                                              | 46/291 [00:00<00:00, 2083.29it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights:  16%|██████████████▋                                                                              | 46/291 [00:00<00:00, 2073.46it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights:  16%|███████████████                                                                              | 47/291 [00:00<00:00, 2103.69it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights:  16%|███████████████                                                                              | 47/291 [00:00<00:00, 2093.81it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights:  16%|███████████████▌                                                                              | 48/291 [00:00<00:00, 2102.67it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights:  16%|███████████████▌                                                                              | 48/291 [00:00<00:00, 2092.60it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights:  17%|████████████████▏                                                                               | 49/291 [00:00<00:00, 2121.37it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights:  17%|████████████████▏                                                                               | 49/291 [00:00<00:00, 2112.46it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights:  17%|████████████████▍                                                                               | 50/291 [00:00<00:00, 2140.43it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights:  17%|████████████████▍                                                                               | 50/291 [00:00<00:00, 2131.32it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights:  18%|█████████████████▏                                                                                | 51/291 [00:00<00:00, 2157.87it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights:  18%|█████████████████▏                                                                                | 51/291 [00:00<00:00, 2148.68it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights:  18%|███████████████▏                                                                     | 52/291 [00:00<00:00, 2174.34it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights:  18%|███████████████▏                                                                     | 52/291 [00:00<00:00, 2164.91it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights:  18%|████████████████▉                                                                            | 53/291 [00:00<00:00, 2191.66it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights:  18%|████████████████▉                                                                            | 53/291 [00:00<00:00, 2122.54it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights:  19%|█████████████████▎                                                                           | 54/291 [00:00<00:00, 2145.81it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights:  19%|█████████████████▎                                                                           | 54/291 [00:00<00:00, 2136.94it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights:  19%|█████████████████▌                                                                           | 55/291 [00:00<00:00, 2161.55it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights:  19%|█████████████████▌                                                                           | 55/291 [00:00<00:00, 2152.89it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights:  19%|█████████████████▉                                                                           | 56/291 [00:00<00:00, 2178.94it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights:  19%|█████████████████▉                                                                           | 56/291 [00:00<00:00, 2167.38it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights:  20%|██████████████████▍                                                                           | 57/291 [00:00<00:00, 2192.41it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights:  20%|██████████████████▍                                                                           | 57/291 [00:00<00:00, 2183.40it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights:  20%|███████████████████▏                                                                            | 58/291 [00:00<00:00, 2208.33it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights:  20%|███████████████████▏                                                                            | 58/291 [00:00<00:00, 2199.86it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights:  20%|███████████████████▍                                                                            | 59/291 [00:00<00:00, 2223.64it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights:  20%|███████████████████▍                                                                            | 59/291 [00:00<00:00, 2215.33it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights:  21%|████████████████████▏                                                                             | 60/291 [00:00<00:00, 2236.62it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights:  21%|████████████████████▏                                                                             | 60/291 [00:00<00:00, 2227.79it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights:  21%|█████████████████▊                                                                   | 61/291 [00:00<00:00, 2251.07it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights:  21%|█████████████████▊                                                                   | 61/291 [00:00<00:00, 2225.40it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights:  21%|███████████████████▊                                                                         | 62/291 [00:00<00:00, 2199.52it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights:  21%|███████████████████▊                                                                         | 62/291 [00:00<00:00, 2191.01it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights:  22%|████████████████████▏                                                                        | 63/291 [00:00<00:00, 2214.19it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights:  22%|████████████████████▏                                                                        | 63/291 [00:00<00:00, 2206.15it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights:  22%|████████████████████▍                                                                        | 64/291 [00:00<00:00, 2228.42it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights:  22%|████████████████████▍                                                                        | 64/291 [00:00<00:00, 2179.60it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights:  22%|████████████████████▊                                                                        | 65/291 [00:00<00:00, 2200.95it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights:  22%|████████████████████▊                                                                        | 65/291 [00:00<00:00, 2193.66it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights:  23%|█████████████████████▎                                                                        | 66/291 [00:00<00:00, 2179.55it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights:  23%|█████████████████████▎                                                                        | 66/291 [00:00<00:00, 2172.24it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights:  23%|██████████████████████                                                                          | 67/291 [00:00<00:00, 2194.46it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights:  23%|██████████████████████                                                                          | 67/291 [00:00<00:00, 2187.65it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights:  23%|██████████████████████▍                                                                         | 68/291 [00:00<00:00, 2207.63it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights:  23%|██████████████████████▍                                                                         | 68/291 [00:00<00:00, 2200.39it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights:  24%|███████████████████████▏                                                                          | 69/291 [00:00<00:00, 2220.32it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights:  24%|███████████████████████▏                                                                          | 69/291 [00:00<00:00, 2211.97it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights:  24%|████████████████████▍                                                                | 70/291 [00:00<00:00, 2230.37it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights:  24%|████████████████████▍                                                                | 70/291 [00:00<00:00, 2222.90it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights:  24%|██████████████████████▋                                                                      | 71/291 [00:00<00:00, 2242.72it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights:  24%|██████████████████████▋                                                                      | 71/291 [00:00<00:00, 2235.43it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights:  25%|███████████████████████                                                                      | 72/291 [00:00<00:00, 2255.44it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights:  25%|███████████████████████                                                                      | 72/291 [00:00<00:00, 2248.57it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights:  25%|███████████████████████▎                                                                     | 73/291 [00:00<00:00, 2268.50it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights:  25%|███████████████████████▎                                                                     | 73/291 [00:00<00:00, 2261.63it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights:  25%|███████████████████████▋                                                                     | 74/291 [00:00<00:00, 2282.04it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights:  25%|███████████████████████▋                                                                     | 74/291 [00:00<00:00, 2275.17it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights:  26%|████████████████████████▏                                                                     | 75/291 [00:00<00:00, 2294.64it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights:  26%|████████████████████████▏                                                                     | 75/291 [00:00<00:00, 2286.92it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights:  26%|█████████████████████████                                                                       | 76/291 [00:00<00:00, 2307.15it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights:  26%|█████████████████████████                                                                       | 76/291 [00:00<00:00, 2300.36it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights:  26%|█████████████████████████▍                                                                      | 77/291 [00:00<00:00, 2317.91it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights:  26%|█████████████████████████▍                                                                      | 77/291 [00:00<00:00, 2299.35it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights:  27%|██████████████████████████▎                                                                       | 78/291 [00:00<00:00, 2317.34it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights:  27%|██████████████████████████▎                                                                       | 78/291 [00:00<00:00, 2310.21it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights:  27%|███████████████████████                                                              | 79/291 [00:00<00:00, 2328.09it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights:  27%|███████████████████████                                                              | 79/291 [00:00<00:00, 2321.29it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights:  27%|█████████████████████████▌                                                                   | 80/291 [00:00<00:00, 2338.09it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights:  27%|█████████████████████████▌                                                                   | 80/291 [00:00<00:00, 2330.83it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights:  28%|█████████████████████████▉                                                                   | 81/291 [00:00<00:00, 2349.62it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights:  28%|█████████████████████████▉                                                                   | 81/291 [00:00<00:00, 2342.64it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights:  28%|██████████████████████████▏                                                                  | 82/291 [00:00<00:00, 2361.80it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights:  28%|██████████████████████████▏                                                                  | 82/291 [00:00<00:00, 2354.30it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights:  29%|██████████████████████████▌                                                                  | 83/291 [00:00<00:00, 2372.25it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights:  29%|██████████████████████████▌                                                                  | 83/291 [00:00<00:00, 2365.04it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights:  29%|███████████████████████████▏                                                                  | 84/291 [00:00<00:00, 2365.86it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights:  29%|███████████████████████████▏                                                                  | 84/291 [00:00<00:00, 2358.67it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights:  29%|████████████████████████████                                                                    | 85/291 [00:00<00:00, 2374.73it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights:  29%|████████████████████████████                                                                    | 85/291 [00:00<00:00, 2367.57it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights:  30%|████████████████████████████▎                                                                   | 86/291 [00:00<00:00, 2383.55it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights:  30%|████████████████████████████▎                                                                   | 86/291 [00:00<00:00, 2376.63it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights:  30%|█████████████████████████████▎                                                                    | 87/291 [00:00<00:00, 2327.67it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights:  30%|█████████████████████████████▎                                                                    | 87/291 [00:00<00:00, 2320.92it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights:  30%|█████████████████████████▋                                                           | 88/291 [00:00<00:00, 2336.60it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights:  30%|█████████████████████████▋                                                           | 88/291 [00:00<00:00, 2330.24it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights:  31%|████████████████████████████▍                                                                | 89/291 [00:00<00:00, 2346.29it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights:  31%|████████████████████████████▍                                                                | 89/291 [00:00<00:00, 2340.20it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights:  31%|████████████████████████████▊                                                                | 90/291 [00:00<00:00, 2356.51it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights:  31%|████████████████████████████▊                                                                | 90/291 [00:00<00:00, 2350.36it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights:  31%|█████████████████████████████                                                                | 91/291 [00:00<00:00, 2366.90it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights:  31%|█████████████████████████████                                                                | 91/291 [00:00<00:00, 2360.68it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights:  32%|█████████████████████████████▍                                                               | 92/291 [00:00<00:00, 2377.34it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights:  32%|█████████████████████████████▍                                                               | 92/291 [00:00<00:00, 2370.65it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights:  32%|█████████████████████████████▋                                                               | 93/291 [00:00<00:00, 2387.20it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights:  32%|█████████████████████████████▋                                                               | 93/291 [00:00<00:00, 2363.19it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights:  32%|██████████████████████████████▋                                                                | 94/291 [00:00<00:00, 2338.45it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights:  32%|██████████████████████████████▋                                                                | 94/291 [00:00<00:00, 2322.50it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights:  33%|███████████████████████████████                                                                | 95/291 [00:00<00:00, 2337.19it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights:  33%|███████████████████████████████                                                                | 95/291 [00:00<00:00, 2331.52it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights:  33%|████████████████████████████████                                                                 | 96/291 [00:00<00:00, 2346.81it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights:  33%|████████████████████████████████                                                                 | 96/291 [00:00<00:00, 2341.10it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights:  33%|████████████████████████████                                                        | 97/291 [00:00<00:00, 2355.16it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights:  33%|████████████████████████████                                                        | 97/291 [00:00<00:00, 2349.19it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights:  34%|██████████████████████████████▉                                                             | 98/291 [00:00<00:00, 2363.30it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights:  34%|██████████████████████████████▉                                                             | 98/291 [00:00<00:00, 2357.69it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights:  34%|███████████████████████████████▎                                                            | 99/291 [00:00<00:00, 2371.68it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights:  34%|███████████████████████████████▎                                                            | 99/291 [00:00<00:00, 2365.44it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights:  34%|███████████████████████████████▎                                                           | 100/291 [00:00<00:00, 2379.67it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights:  34%|███████████████████████████████▎                                                           | 100/291 [00:00<00:00, 2373.74it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights:  35%|███████████████████████████████▌                                                           | 101/291 [00:00<00:00, 2381.85it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights:  35%|███████████████████████████████▌                                                           | 101/291 [00:00<00:00, 2376.04it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights:  35%|████████████████████████████████▏                                                           | 102/291 [00:00<00:00, 2355.99it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights:  35%|████████████████████████████████▏                                                           | 102/291 [00:00<00:00, 2350.21it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights:  35%|█████████████████████████████████▎                                                            | 103/291 [00:00<00:00, 2365.05it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights:  35%|█████████████████████████████████▎                                                            | 103/291 [00:00<00:00, 2359.75it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights:  36%|█████████████████████████████████▌                                                            | 104/291 [00:00<00:00, 2373.17it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights:  36%|█████████████████████████████████▌                                                            | 104/291 [00:00<00:00, 2367.70it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights:  36%|██████████████████████████████████▋                                                             | 105/291 [00:00<00:00, 2380.50it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights:  36%|██████████████████████████████████▋                                                             | 105/291 [00:00<00:00, 2375.24it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights:  36%|██████████████████████████████▏                                                    | 106/291 [00:00<00:00, 2388.70it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights:  36%|██████████████████████████████▏                                                    | 106/291 [00:00<00:00, 2383.33it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights:  37%|█████████████████████████████████▍                                                         | 107/291 [00:00<00:00, 2396.48it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights:  37%|█████████████████████████████████▍                                                         | 107/291 [00:00<00:00, 2391.15it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights:  37%|█████████████████████████████████▊                                                         | 108/291 [00:00<00:00, 2404.74it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights:  37%|█████████████████████████████████▊                                                         | 108/291 [00:00<00:00, 2399.26it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights:  37%|██████████████████████████████████                                                         | 109/291 [00:00<00:00, 2404.05it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights:  37%|██████████████████████████████████                                                         | 109/291 [00:00<00:00, 2397.69it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights:  38%|██████████████████████████████████▍                                                        | 110/291 [00:00<00:00, 2411.63it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights:  38%|██████████████████████████████████▍                                                        | 110/291 [00:00<00:00, 2406.57it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights:  38%|███████████████████████████████████                                                         | 111/291 [00:00<00:00, 2395.28it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights:  38%|███████████████████████████████████                                                         | 111/291 [00:00<00:00, 2389.57it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights:  38%|████████████████████████████████████▏                                                         | 112/291 [00:00<00:00, 2336.23it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights:  38%|████████████████████████████████████▏                                                         | 112/291 [00:00<00:00, 2330.65it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights:  39%|████████████████████████████████████▌                                                         | 113/291 [00:00<00:00, 2343.66it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights:  39%|████████████████████████████████████▌                                                         | 113/291 [00:00<00:00, 2338.92it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights:  39%|█████████████████████████████████████▌                                                          | 114/291 [00:00<00:00, 2319.84it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights:  39%|█████████████████████████████████████▌                                                          | 114/291 [00:00<00:00, 2314.22it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights:  40%|████████████████████████████████▊                                                  | 115/291 [00:00<00:00, 2326.47it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights:  40%|████████████████████████████████▊                                                  | 115/291 [00:00<00:00, 2285.90it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights:  40%|████████████████████████████████████▎                                                      | 116/291 [00:00<00:00, 2296.25it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights:  40%|████████████████████████████████████▎                                                      | 116/291 [00:00<00:00, 2290.87it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights:  40%|████████████████████████████████████▌                                                      | 117/291 [00:00<00:00, 2302.46it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights:  40%|████████████████████████████████████▌                                                      | 117/291 [00:00<00:00, 2286.71it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights:  41%|████████████████████████████████████▉                                                      | 118/291 [00:00<00:00, 2298.17it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights:  41%|████████████████████████████████████▉                                                      | 118/291 [00:00<00:00, 2293.45it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights:  41%|█████████████████████████████████████▏                                                     | 119/291 [00:00<00:00, 2306.10it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights:  41%|█████████████████████████████████████▏                                                     | 119/291 [00:00<00:00, 2301.52it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights:  41%|█████████████████████████████████████▉                                                      | 120/291 [00:00<00:00, 2292.30it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights:  41%|█████████████████████████████████████▉                                                      | 120/291 [00:00<00:00, 2279.23it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights:  42%|███████████████████████████████████████                                                       | 121/291 [00:00<00:00, 2290.23it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights:  42%|███████████████████████████████████████                                                       | 121/291 [00:00<00:00, 2285.71it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights:  42%|███████████████████████████████████████▍                                                      | 122/291 [00:00<00:00, 2297.49it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights:  42%|███████████████████████████████████████▍                                                      | 122/291 [00:00<00:00, 2292.79it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights:  42%|████████████████████████████████████████▌                                                       | 123/291 [00:00<00:00, 2304.91it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights:  42%|████████████████████████████████████████▌                                                       | 123/291 [00:00<00:00, 2292.55it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights:  43%|███████████████████████████████████▎                                               | 124/291 [00:00<00:00, 2300.77it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights:  43%|███████████████████████████████████▎                                               | 124/291 [00:00<00:00, 2296.35it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights:  43%|███████████████████████████████████████                                                    | 125/291 [00:00<00:00, 2307.95it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights:  43%|███████████████████████████████████████                                                    | 125/291 [00:00<00:00, 2303.88it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights:  43%|███████████████████████████████████████▍                                                   | 126/291 [00:00<00:00, 2314.93it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights:  43%|███████████████████████████████████████▍                                                   | 126/291 [00:00<00:00, 2309.07it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights:  44%|███████████████████████████████████████▋                                                   | 127/291 [00:00<00:00, 2317.71it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights:  44%|███████████████████████████████████████▋                                                   | 127/291 [00:00<00:00, 2311.83it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights:  44%|████████████████████████████████████████                                                   | 128/291 [00:00<00:00, 2321.09it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights:  44%|████████████████████████████████████████                                                   | 128/291 [00:00<00:00, 2315.33it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights:  44%|████████████████████████████████████████▊                                                   | 129/291 [00:00<00:00, 2323.35it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights:  44%|████████████████████████████████████████▊                                                   | 129/291 [00:00<00:00, 2317.32it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights:  45%|█████████████████████████████████████████▉                                                    | 130/291 [00:00<00:00, 2318.62it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights:  45%|█████████████████████████████████████████▉                                                    | 130/291 [00:00<00:00, 2312.85it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights:  45%|██████████████████████████████████████████▎                                                   | 131/291 [00:00<00:00, 2322.11it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights:  45%|██████████████████████████████████████████▎                                                   | 131/291 [00:00<00:00, 2316.51it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights:  45%|███████████████████████████████████████████▌                                                    | 132/291 [00:00<00:00, 2325.81it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights:  45%|███████████████████████████████████████████▌                                                    | 132/291 [00:00<00:00, 2320.07it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights:  46%|█████████████████████████████████████▉                                             | 133/291 [00:00<00:00, 2331.43it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights:  46%|█████████████████████████████████████▉                                             | 133/291 [00:00<00:00, 2327.23it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights:  46%|█████████████████████████████████████████▉                                                 | 134/291 [00:00<00:00, 2337.66it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights:  46%|█████████████████████████████████████████▉                                                 | 134/291 [00:00<00:00, 2333.59it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights:  46%|██████████████████████████████████████████▏                                                | 135/291 [00:00<00:00, 2344.10it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights:  46%|██████████████████████████████████████████▏                                                | 135/291 [00:00<00:00, 2339.98it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights:  47%|██████████████████████████████████████████▌                                                | 136/291 [00:00<00:00, 2350.66it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights:  47%|██████████████████████████████████████████▌                                                | 136/291 [00:00<00:00, 2346.45it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights:  47%|██████████████████████████████████████████▊                                                | 137/291 [00:00<00:00, 2357.07it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights:  47%|██████████████████████████████████████████▊                                                | 137/291 [00:00<00:00, 2353.15it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights:  47%|███████████████████████████████████████████▋                                                | 138/291 [00:00<00:00, 2364.21it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights:  47%|███████████████████████████████████████████▋                                                | 138/291 [00:00<00:00, 2360.21it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights:  48%|████████████████████████████████████████████▉                                                 | 139/291 [00:00<00:00, 2371.16it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights:  48%|████████████████████████████████████████████▉                                                 | 139/291 [00:00<00:00, 2367.24it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights:  48%|█████████████████████████████████████████████▏                                                | 140/291 [00:00<00:00, 2377.68it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights:  48%|█████████████████████████████████████████████▏                                                | 140/291 [00:00<00:00, 2373.59it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights:  48%|██████████████████████████████████████████████▌                                                 | 141/291 [00:00<00:00, 2368.74it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights:  48%|██████████████████████████████████████████████▌                                                 | 141/291 [00:00<00:00, 2364.58it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights:  49%|████████████████████████████████████████▌                                          | 142/291 [00:00<00:00, 2357.54it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights:  49%|████████████████████████████████████████▌                                          | 142/291 [00:00<00:00, 2353.52it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights:  49%|████████████████████████████████████████████▋                                              | 143/291 [00:00<00:00, 2363.87it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights:  49%|████████████████████████████████████████████▋                                              | 143/291 [00:00<00:00, 2359.94it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights:  49%|█████████████████████████████████████████████                                              | 144/291 [00:00<00:00, 2370.47it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights:  49%|█████████████████████████████████████████████                                              | 144/291 [00:00<00:00, 2366.19it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights:  50%|█████████████████████████████████████████████▎                                             | 145/291 [00:00<00:00, 2376.06it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights:  50%|█████████████████████████████████████████████▎                                             | 145/291 [00:00<00:00, 2371.86it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights:  50%|█████████████████████████████████████████████▋                                             | 146/291 [00:00<00:00, 2382.32it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights:  50%|█████████████████████████████████████████████▋                                             | 146/291 [00:00<00:00, 2378.55it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights:  51%|██████████████████████████████████████████████▍                                             | 147/291 [00:00<00:00, 2388.93it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights:  51%|██████████████████████████████████████████████▍                                             | 147/291 [00:00<00:00, 2385.07it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights:  51%|███████████████████████████████████████████████▊                                              | 148/291 [00:00<00:00, 2395.18it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights:  51%|███████████████████████████████████████████████▊                                              | 148/291 [00:00<00:00, 2391.23it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights:  51%|████████████████████████████████████████████████▏                                             | 149/291 [00:00<00:00, 2383.02it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights:  51%|████████████████████████████████████████████████▏                                             | 149/291 [00:00<00:00, 2379.04it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights:  52%|█████████████████████████████████████████████████▍                                              | 150/291 [00:00<00:00, 2384.70it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights:  52%|█████████████████████████████████████████████████▍                                              | 150/291 [00:00<00:00, 2360.80it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights:  52%|███████████████████████████████████████████                                        | 151/291 [00:00<00:00, 2361.50it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights:  52%|███████████████████████████████████████████                                        | 151/291 [00:00<00:00, 2357.52it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights:  52%|███████████████████████████████████████████████▌                                           | 152/291 [00:00<00:00, 2367.23it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights:  52%|███████████████████████████████████████████████▌                                           | 152/291 [00:00<00:00, 2333.83it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights:  53%|███████████████████████████████████████████████▊                                           | 153/291 [00:00<00:00, 2342.49it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights:  53%|███████████████████████████████████████████████▊                                           | 153/291 [00:00<00:00, 2337.19it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights:  53%|████████████████████████████████████████████████▏                                          | 154/291 [00:00<00:00, 2346.77it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights:  53%|████████████████████████████████████████████████▏                                          | 154/291 [00:00<00:00, 2343.27it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights:  53%|████████████████████████████████████████████████▍                                          | 155/291 [00:00<00:00, 2352.62it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights:  53%|████████████████████████████████████████████████▍                                          | 155/291 [00:00<00:00, 2348.36it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights:  54%|█████████████████████████████████████████████████▎                                          | 156/291 [00:00<00:00, 2357.40it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights:  54%|█████████████████████████████████████████████████▎                                          | 156/291 [00:00<00:00, 2353.84it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights:  54%|██████████████████████████████████████████████████▋                                           | 157/291 [00:00<00:00, 2342.79it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights:  54%|██████████████████████████████████████████████████▋                                           | 157/291 [00:00<00:00, 2338.63it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights:  54%|███████████████████████████████████████████████████                                           | 158/291 [00:00<00:00, 2348.03it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights:  54%|███████████████████████████████████████████████████                                           | 158/291 [00:00<00:00, 2344.64it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights:  55%|████████████████████████████████████████████████████▍                                           | 159/291 [00:00<00:00, 2353.70it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights:  55%|████████████████████████████████████████████████████▍                                           | 159/291 [00:00<00:00, 2350.28it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights:  55%|█████████████████████████████████████████████▋                                     | 160/291 [00:00<00:00, 2358.92it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights:  55%|█████████████████████████████████████████████▋                                     | 160/291 [00:00<00:00, 2355.45it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights:  55%|██████████████████████████████████████████████████▎                                        | 161/291 [00:00<00:00, 2364.93it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights:  55%|██████████████████████████████████████████████████▎                                        | 161/291 [00:00<00:00, 2360.69it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights:  56%|██████████████████████████████████████████████████▋                                        | 162/291 [00:00<00:00, 2369.18it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights:  56%|██████████████████████████████████████████████████▋                                        | 162/291 [00:00<00:00, 2365.32it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights:  56%|██████████████████████████████████████████████████▉                                        | 163/291 [00:00<00:00, 2372.84it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights:  56%|██████████████████████████████████████████████████▉                                        | 163/291 [00:00<00:00, 2354.78it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights:  56%|███████████████████████████████████████████████████▎                                       | 164/291 [00:00<00:00, 2362.57it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights:  56%|███████████████████████████████████████████████████▎                                       | 164/291 [00:00<00:00, 2358.58it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights:  57%|████████████████████████████████████████████████████▏                                       | 165/291 [00:00<00:00, 2366.03it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights:  57%|████████████████████████████████████████████████████▏                                       | 165/291 [00:00<00:00, 2357.58it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights:  57%|█████████████████████████████████████████████████████▌                                        | 166/291 [00:00<00:00, 2355.20it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights:  57%|█████████████████████████████████████████████████████▌                                        | 166/291 [00:00<00:00, 2351.12it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights:  57%|█████████████████████████████████████████████████████▉                                        | 167/291 [00:00<00:00, 2358.69it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights:  57%|█████████████████████████████████████████████████████▉                                        | 167/291 [00:00<00:00, 2354.15it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights:  58%|███████████████████████████████████████████████████████▍                                        | 168/291 [00:00<00:00, 2361.98it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights:  58%|███████████████████████████████████████████████████████▍                                        | 168/291 [00:00<00:00, 2356.31it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights:  58%|████████████████████████████████████████████████▏                                  | 169/291 [00:00<00:00, 2363.47it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights:  58%|████████████████████████████████████████████████▏                                  | 169/291 [00:00<00:00, 2358.27it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights:  58%|█████████████████████████████████████████████████████▏                                     | 170/291 [00:00<00:00, 2355.16it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights:  58%|█████████████████████████████████████████████████████▏                                     | 170/291 [00:00<00:00, 2350.37it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights:  59%|█████████████████████████████████████████████████████▍                                     | 171/291 [00:00<00:00, 2357.16it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights:  59%|█████████████████████████████████████████████████████▍                                     | 171/291 [00:00<00:00, 2352.86it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights:  59%|█████████████████████████████████████████████████████▊                                     | 172/291 [00:00<00:00, 2359.32it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights:  59%|█████████████████████████████████████████████████████▊                                     | 172/291 [00:00<00:00, 2354.95it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights:  59%|██████████████████████████████████████████████████████                                     | 173/291 [00:00<00:00, 2354.78it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights:  59%|██████████████████████████████████████████████████████                                     | 173/291 [00:00<00:00, 2350.10it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights:  60%|███████████████████████████████████████████████████████                                     | 174/291 [00:00<00:00, 2356.77it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights:  60%|███████████████████████████████████████████████████████                                     | 174/291 [00:00<00:00, 2352.30it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights:  60%|████████████████████████████████████████████████████████▌                                     | 175/291 [00:00<00:00, 2358.94it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights:  60%|████████████████████████████████████████████████████████▌                                     | 175/291 [00:00<00:00, 2354.59it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights:  60%|████████████████████████████████████████████████████████▊                                     | 176/291 [00:00<00:00, 2361.26it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights:  60%|████████████████████████████████████████████████████████▊                                     | 176/291 [00:00<00:00, 2327.50it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights:  61%|██████████████████████████████████████████████████████████▍                                     | 177/291 [00:00<00:00, 2328.96it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights:  61%|██████████████████████████████████████████████████████████▍                                     | 177/291 [00:00<00:00, 2324.63it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights:  61%|██████████████████████████████████████████████████▊                                | 178/291 [00:00<00:00, 2331.20it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights:  61%|██████████████████████████████████████████████████▊                                | 178/291 [00:00<00:00, 2327.01it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights:  62%|███████████████████████████████████████████████████████▉                                   | 179/291 [00:00<00:00, 2333.70it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights:  62%|███████████████████████████████████████████████████████▉                                   | 179/291 [00:00<00:00, 2329.08it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights:  62%|████████████████████████████████████████████████████████▎                                  | 180/291 [00:00<00:00, 2334.85it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights:  62%|████████████████████████████████████████████████████████▎                                  | 180/291 [00:00<00:00, 2330.18it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights:  62%|████████████████████████████████████████████████████████▌                                  | 181/291 [00:00<00:00, 2336.14it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights:  62%|████████████████████████████████████████████████████████▌                                  | 181/291 [00:00<00:00, 2331.69it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights:  63%|████████████████████████████████████████████████████████▉                                  | 182/291 [00:00<00:00, 2332.97it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights:  63%|████████████████████████████████████████████████████████▉                                  | 182/291 [00:00<00:00, 2314.67it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights:  63%|█████████████████████████████████████████████████████████▊                                  | 183/291 [00:00<00:00, 2316.34it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights:  63%|█████████████████████████████████████████████████████████▊                                  | 183/291 [00:00<00:00, 2311.58it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights:  63%|███████████████████████████████████████████████████████████▍                                  | 184/291 [00:00<00:00, 2317.27it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights:  63%|███████████████████████████████████████████████████████████▍                                  | 184/291 [00:00<00:00, 2313.99it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights:  64%|███████████████████████████████████████████████████████████▊                                  | 185/291 [00:00<00:00, 2321.84it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights:  64%|███████████████████████████████████████████████████████████▊                                  | 185/291 [00:00<00:00, 2292.61it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights:  64%|█████████████████████████████████████████████████████████████▎                                  | 186/291 [00:00<00:00, 2298.94it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights:  64%|█████████████████████████████████████████████████████████████▎                                  | 186/291 [00:00<00:00, 2295.77it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights:  64%|█████████████████████████████████████████████████████▎                             | 187/291 [00:00<00:00, 2303.10it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights:  64%|█████████████████████████████████████████████████████▎                             | 187/291 [00:00<00:00, 2300.02it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights:  65%|██████████████████████████████████████████████████████████▊                                | 188/291 [00:00<00:00, 2307.69it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights:  65%|██████████████████████████████████████████████████████████▊                                | 188/291 [00:00<00:00, 2304.71it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights:  65%|███████████████████████████████████████████████████████████                                | 189/291 [00:00<00:00, 2308.39it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights:  65%|███████████████████████████████████████████████████████████                                | 189/291 [00:00<00:00, 2305.38it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights:  65%|███████████████████████████████████████████████████████████▍                               | 190/291 [00:00<00:00, 2312.80it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights:  65%|███████████████████████████████████████████████████████████▍                               | 190/291 [00:00<00:00, 2309.91it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights:  66%|███████████████████████████████████████████████████████████▋                               | 191/291 [00:00<00:00, 2317.74it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights:  66%|███████████████████████████████████████████████████████████▋                               | 191/291 [00:00<00:00, 2314.94it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights:  66%|████████████████████████████████████████████████████████████▋                               | 192/291 [00:00<00:00, 2323.02it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights:  66%|████████████████████████████████████████████████████████████▋                               | 192/291 [00:00<00:00, 2320.21it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights:  66%|██████████████████████████████████████████████████████████████▎                               | 193/291 [00:00<00:00, 2327.67it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights:  66%|██████████████████████████████████████████████████████████████▎                               | 193/291 [00:00<00:00, 2324.80it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights:  67%|██████████████████████████████████████████████████████████████▋                               | 194/291 [00:00<00:00, 2332.41it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights:  67%|██████████████████████████████████████████████████████████████▋                               | 194/291 [00:00<00:00, 2326.55it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights:  67%|████████████████████████████████████████████████████████████████▎                               | 195/291 [00:00<00:00, 2329.39it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights:  67%|████████████████████████████████████████████████████████████████▎                               | 195/291 [00:00<00:00, 2326.24it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights:  67%|███████████████████████████████████████████████████████▉                           | 196/291 [00:00<00:00, 2333.62it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights:  67%|███████████████████████████████████████████████████████▉                           | 196/291 [00:00<00:00, 2330.48it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights:  68%|█████████████████████████████████████████████████████████████▌                             | 197/291 [00:00<00:00, 2337.39it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights:  68%|█████████████████████████████████████████████████████████████▌                             | 197/291 [00:00<00:00, 2334.43it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights:  68%|█████████████████████████████████████████████████████████████▉                             | 198/291 [00:00<00:00, 2318.34it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights:  68%|█████████████████████████████████████████████████████████████▉                             | 198/291 [00:00<00:00, 2314.38it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights:  68%|██████████████████████████████████████████████████████████████▏                            | 199/291 [00:00<00:00, 2321.65it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights:  68%|██████████████████████████████████████████████████████████████▏                            | 199/291 [00:00<00:00, 2318.70it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights:  69%|██████████████████████████████████████████████████████████████▌                            | 200/291 [00:00<00:00, 2326.03it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights:  69%|██████████████████████████████████████████████████████████████▌                            | 200/291 [00:00<00:00, 2323.17it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights:  69%|███████████████████████████████████████████████████████████████▌                            | 201/291 [00:00<00:00, 2330.35it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights:  69%|███████████████████████████████████████████████████████████████▌                            | 201/291 [00:00<00:00, 2327.65it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights:  69%|█████████████████████████████████████████████████████████████████▎                            | 202/291 [00:00<00:00, 2334.95it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights:  69%|█████████████████████████████████████████████████████████████████▎                            | 202/291 [00:00<00:00, 2332.16it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights:  70%|█████████████████████████████████████████████████████████████████▌                            | 203/291 [00:00<00:00, 2338.77it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights:  70%|█████████████████████████████████████████████████████████████████▌                            | 203/291 [00:00<00:00, 2335.97it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights:  70%|███████████████████████████████████████████████████████████████████▎                            | 204/291 [00:00<00:00, 2343.35it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights:  70%|███████████████████████████████████████████████████████████████████▎                            | 204/291 [00:00<00:00, 2340.49it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights:  70%|██████████████████████████████████████████████████████████▍                        | 205/291 [00:00<00:00, 2347.79it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights:  70%|██████████████████████████████████████████████████████████▍                        | 205/291 [00:00<00:00, 2345.12it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights:  71%|████████████████████████████████████████████████████████████████▍                          | 206/291 [00:00<00:00, 2352.33it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights:  71%|████████████████████████████████████████████████████████████████▍                          | 206/291 [00:00<00:00, 2349.54it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights:  71%|████████████████████████████████████████████████████████████████▋                          | 207/291 [00:00<00:00, 2356.58it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights:  71%|████████████████████████████████████████████████████████████████▋                          | 207/291 [00:00<00:00, 2353.82it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights:  71%|█████████████████████████████████████████████████████████████████                          | 208/291 [00:00<00:00, 2332.54it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights:  71%|█████████████████████████████████████████████████████████████████                          | 208/291 [00:00<00:00, 2329.64it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights:  72%|█████████████████████████████████████████████████████████████████▎                         | 209/291 [00:00<00:00, 2336.69it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights:  72%|█████████████████████████████████████████████████████████████████▎                         | 209/291 [00:00<00:00, 2334.12it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights:  72%|██████████████████████████████████████████████████████████████████▍                         | 210/291 [00:00<00:00, 2340.93it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights:  72%|██████████████████████████████████████████████████████████████████▍                         | 210/291 [00:00<00:00, 2337.94it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights:  73%|████████████████████████████████████████████████████████████████████▏                         | 211/291 [00:00<00:00, 2344.58it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights:  73%|████████████████████████████████████████████████████████████████████▏                         | 211/291 [00:00<00:00, 2341.83it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights:  73%|████████████████████████████████████████████████████████████████████▍                         | 212/291 [00:00<00:00, 2346.75it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights:  73%|████████████████████████████████████████████████████████████████████▍                         | 212/291 [00:00<00:00, 2343.16it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights:  73%|██████████████████████████████████████████████████████████████████████▎                         | 213/291 [00:00<00:00, 2348.42it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights:  73%|██████████████████████████████████████████████████████████████████████▎                         | 213/291 [00:00<00:00, 2344.31it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights:  74%|█████████████████████████████████████████████████████████████                      | 214/291 [00:00<00:00, 2349.73it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights:  74%|█████████████████████████████████████████████████████████████                      | 214/291 [00:00<00:00, 2345.97it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights:  74%|███████████████████████████████████████████████████████████████████▏                       | 215/291 [00:00<00:00, 2351.42it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights:  74%|███████████████████████████████████████████████████████████████████▏                       | 215/291 [00:00<00:00, 2347.96it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights:  74%|███████████████████████████████████████████████████████████████████▌                       | 216/291 [00:00<00:00, 2333.48it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights:  74%|███████████████████████████████████████████████████████████████████▌                       | 216/291 [00:00<00:00, 2325.01it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights:  75%|███████████████████████████████████████████████████████████████████▊                       | 217/291 [00:00<00:00, 2322.42it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights:  75%|███████████████████████████████████████████████████████████████████▊                       | 217/291 [00:00<00:00, 2318.58it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights:  75%|████████████████████████████████████████████████████████████████████▏                      | 218/291 [00:00<00:00, 2323.52it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights:  75%|████████████████████████████████████████████████████████████████████▏                      | 218/291 [00:00<00:00, 2320.07it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights:  75%|█████████████████████████████████████████████████████████████████████▏                      | 219/291 [00:00<00:00, 2325.03it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights:  75%|█████████████████████████████████████████████████████████████████████▏                      | 219/291 [00:00<00:00, 2321.56it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights:  76%|███████████████████████████████████████████████████████████████████████                       | 220/291 [00:00<00:00, 2311.04it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights:  76%|███████████████████████████████████████████████████████████████████████                       | 220/291 [00:00<00:00, 2307.39it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights:  76%|███████████████████████████████████████████████████████████████████████▍                      | 221/291 [00:00<00:00, 2312.44it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights:  76%|███████████████████████████████████████████████████████████████████████▍                      | 221/291 [00:00<00:00, 2309.11it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights:  76%|█████████████████████████████████████████████████████████████████████████▏                      | 222/291 [00:00<00:00, 2302.18it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights:  76%|█████████████████████████████████████████████████████████████████████████▏                      | 222/291 [00:00<00:00, 2298.57it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights:  77%|███████████████████████████████████████████████████████████████▌                   | 223/291 [00:00<00:00, 2296.49it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights:  77%|███████████████████████████████████████████████████████████████▌                   | 223/291 [00:00<00:00, 2292.81it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights:  77%|██████████████████████████████████████████████████████████████████████                     | 224/291 [00:00<00:00, 2297.78it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights:  77%|██████████████████████████████████████████████████████████████████████                     | 224/291 [00:00<00:00, 2294.54it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights:  77%|██████████████████████████████████████████████████████████████████████▎                    | 225/291 [00:00<00:00, 2299.40it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights:  77%|██████████████████████████████████████████████████████████████████████▎                    | 225/291 [00:00<00:00, 2296.06it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights:  78%|██████████████████████████████████████████████████████████████████████▋                    | 226/291 [00:00<00:00, 2301.25it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights:  78%|██████████████████████████████████████████████████████████████████████▋                    | 226/291 [00:00<00:00, 2298.01it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights:  78%|██████████████████████████████████████████████████████████████████████▉                    | 227/291 [00:00<00:00, 2295.75it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights:  78%|██████████████████████████████████████████████████████████████████████▉                    | 227/291 [00:00<00:00, 2292.50it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights:  78%|████████████████████████████████████████████████████████████████████████                    | 228/291 [00:00<00:00, 2297.53it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights:  78%|████████████████████████████████████████████████████████████████████████                    | 228/291 [00:00<00:00, 2294.29it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights:  79%|█████████████████████████████████████████████████████████████████████████▉                    | 229/291 [00:00<00:00, 2299.44it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights:  79%|█████████████████████████████████████████████████████████████████████████▉                    | 229/291 [00:00<00:00, 2296.20it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights:  79%|██████████████████████████████████████████████████████████████████████████▎                   | 230/291 [00:00<00:00, 2300.86it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights:  79%|██████████████████████████████████████████████████████████████████████████▎                   | 230/291 [00:00<00:00, 2297.43it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights:  79%|██████████████████████████████████████████████████████████████████████████▌                   | 231/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights:  79%|████████████████████████████████████████████████████████████████████████████▏                   | 231/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights:  79%|████████████████████████████████████████████████████████████████████████████▏                   | 231/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights:  80%|██████████████████████████████████████████████████████████████████▏                | 232/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights:  80%|██████████████████████████████████████████████████████████████████▏                | 232/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights:  80%|████████████████████████████████████████████████████████████████████████▊                  | 233/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights:  80%|████████████████████████████████████████████████████████████████████████▊                  | 233/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights:  80%|█████████████████████████████████████████████████████████████████████████▏                 | 234/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights:  80%|█████████████████████████████████████████████████████████████████████████▏                 | 234/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights:  81%|█████████████████████████████████████████████████████████████████████████▍                 | 235/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights:  81%|█████████████████████████████████████████████████████████████████████████▍                 | 235/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights:  81%|█████████████████████████████████████████████████████████████████████████▊                 | 236/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights:  81%|█████████████████████████████████████████████████████████████████████████▊                 | 236/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights:  81%|██████████████████████████████████████████████████████████████████████████▉                 | 237/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights:  81%|██████████████████████████████████████████████████████████████████████████▉                 | 237/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights:  82%|████████████████████████████████████████████████████████████████████████████▉                 | 238/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights:  82%|████████████████████████████████████████████████████████████████████████████▉                 | 238/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights:  82%|█████████████████████████████████████████████████████████████████████████████▏                | 239/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights:  82%|█████████████████████████████████████████████████████████████████████████████▏                | 239/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights:  82%|███████████████████████████████████████████████████████████████████████████████▏                | 240/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights:  82%|███████████████████████████████████████████████████████████████████████████████▏                | 240/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights:  83%|████████████████████████████████████████████████████████████████████▋              | 241/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights:  83%|████████████████████████████████████████████████████████████████████▋              | 241/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights:  83%|███████████████████████████████████████████████████████████████████████████▋               | 242/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights:  83%|███████████████████████████████████████████████████████████████████████████▋               | 242/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights:  84%|███████████████████████████████████████████████████████████████████████████▉               | 243/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights:  84%|███████████████████████████████████████████████████████████████████████████▉               | 243/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights:  84%|████████████████████████████████████████████████████████████████████████████▎              | 244/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights:  84%|████████████████████████████████████████████████████████████████████████████▎              | 244/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights:  84%|████████████████████████████████████████████████████████████████████████████▌              | 245/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights:  84%|████████████████████████████████████████████████████████████████████████████▌              | 245/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights:  85%|█████████████████████████████████████████████████████████████████████████████▊              | 246/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights:  85%|█████████████████████████████████████████████████████████████████████████████▊              | 246/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights:  85%|███████████████████████████████████████████████████████████████████████████████▊              | 247/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights:  85%|███████████████████████████████████████████████████████████████████████████████▊              | 247/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights:  85%|████████████████████████████████████████████████████████████████████████████████              | 248/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights:  85%|████████████████████████████████████████████████████████████████████████████████              | 248/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights:  86%|██████████████████████████████████████████████████████████████████████████████████▏             | 249/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights:  86%|██████████████████████████████████████████████████████████████████████████████████▏             | 249/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights:  86%|███████████████████████████████████████████████████████████████████████▎           | 250/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights:  86%|███████████████████████████████████████████████████████████████████████▎           | 250/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights:  86%|██████████████████████████████████████████████████████████████████████████████▍            | 251/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights:  86%|██████████████████████████████████████████████████████████████████████████████▍            | 251/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights:  87%|██████████████████████████████████████████████████████████████████████████████▊            | 252/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights:  87%|██████████████████████████████████████████████████████████████████████████████▊            | 252/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights:  87%|███████████████████████████████████████████████████████████████████████████████            | 253/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights:  87%|███████████████████████████████████████████████████████████████████████████████            | 253/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights:  87%|███████████████████████████████████████████████████████████████████████████████▍           | 254/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights:  87%|███████████████████████████████████████████████████████████████████████████████▍           | 254/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights:  88%|████████████████████████████████████████████████████████████████████████████████▌           | 255/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights:  88%|████████████████████████████████████████████████████████████████████████████████▌           | 255/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights:  88%|██████████████████████████████████████████████████████████████████████████████████▋           | 256/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights:  88%|██████████████████████████████████████████████████████████████████████████████████▋           | 256/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights:  88%|███████████████████████████████████████████████████████████████████████████████████           | 257/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights:  88%|███████████████████████████████████████████████████████████████████████████████████           | 257/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights:  89%|█████████████████████████████████████████████████████████████████████████████████████           | 258/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights:  89%|█████████████████████████████████████████████████████████████████████████████████████           | 258/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights:  89%|█████████████████████████████████████████████████████████████████████████▊         | 259/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights:  89%|█████████████████████████████████████████████████████████████████████████▊         | 259/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights:  89%|█████████████████████████████████████████████████████████████████████████████████▎         | 260/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights:  89%|█████████████████████████████████████████████████████████████████████████████████▎         | 260/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights:  90%|█████████████████████████████████████████████████████████████████████████████████▌         | 261/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights:  90%|█████████████████████████████████████████████████████████████████████████████████▌         | 261/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights:  90%|█████████████████████████████████████████████████████████████████████████████████▉         | 262/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights:  90%|█████████████████████████████████████████████████████████████████████████████████▉         | 262/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights:  90%|██████████████████████████████████████████████████████████████████████████████████▏        | 263/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights:  90%|██████████████████████████████████████████████████████████████████████████████████▏        | 263/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights:  91%|███████████████████████████████████████████████████████████████████████████████████▍        | 264/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights:  91%|███████████████████████████████████████████████████████████████████████████████████▍        | 264/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights:  91%|█████████████████████████████████████████████████████████████████████████████████████▌        | 265/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights:  91%|█████████████████████████████████████████████████████████████████████████████████████▌        | 265/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights:  91%|█████████████████████████████████████████████████████████████████████████████████████▉        | 266/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights:  91%|█████████████████████████████████████████████████████████████████████████████████████▉        | 266/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights:  92%|████████████████████████████████████████████████████████████████████████████████████████        | 267/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights:  92%|████████████████████████████████████████████████████████████████████████████████████████        | 267/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights:  92%|████████████████████████████████████████████████████████████████████████████▍      | 268/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights:  92%|████████████████████████████████████████████████████████████████████████████▍      | 268/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights:  92%|████████████████████████████████████████████████████████████████████████████████████       | 269/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights:  92%|████████████████████████████████████████████████████████████████████████████████████       | 269/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights:  93%|████████████████████████████████████████████████████████████████████████████████████▍      | 270/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights:  93%|████████████████████████████████████████████████████████████████████████████████████▍      | 270/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights:  93%|████████████████████████████████████████████████████████████████████████████████████▋      | 271/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights:  93%|████████████████████████████████████████████████████████████████████████████████████▋      | 271/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights:  93%|█████████████████████████████████████████████████████████████████████████████████████      | 272/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights:  93%|█████████████████████████████████████████████████████████████████████████████████████      | 272/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights:  94%|██████████████████████████████████████████████████████████████████████████████████████▎     | 273/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights:  94%|██████████████████████████████████████████████████████████████████████████████████████▎     | 273/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights:  94%|████████████████████████████████████████████████████████████████████████████████████████▌     | 274/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights:  94%|████████████████████████████████████████████████████████████████████████████████████████▌     | 274/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights:  95%|████████████████████████████████████████████████████████████████████████████████████████▊     | 275/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights:  95%|████████████████████████████████████████████████████████████████████████████████████████▊     | 275/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights:  95%|███████████████████████████████████████████████████████████████████████████████████████████     | 276/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights:  95%|███████████████████████████████████████████████████████████████████████████████████████████     | 276/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights:  95%|███████████████████████████████████████████████████████████████████████████████    | 277/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights:  95%|███████████████████████████████████████████████████████████████████████████████    | 277/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights:  96%|██████████████████████████████████████████████████████████████████████████████████████▉    | 278/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights:  96%|██████████████████████████████████████████████████████████████████████████████████████▉    | 278/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights:  96%|███████████████████████████████████████████████████████████████████████████████████████▏   | 279/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights:  96%|███████████████████████████████████████████████████████████████████████████████████████▏   | 279/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights:  96%|███████████████████████████████████████████████████████████████████████████████████████▌   | 280/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights:  96%|███████████████████████████████████████████████████████████████████████████████████████▌   | 280/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights:  97%|███████████████████████████████████████████████████████████████████████████████████████▊   | 281/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights:  97%|███████████████████████████████████████████████████████████████████████████████████████▊   | 281/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights:  97%|█████████████████████████████████████████████████████████████████████████████████████████▏  | 282/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights:  97%|█████████████████████████████████████████████████████████████████████████████████████████▏  | 282/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights:  97%|███████████████████████████████████████████████████████████████████████████████████████████▍  | 283/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights:  97%|███████████████████████████████████████████████████████████████████████████████████████████▍  | 283/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights:  98%|███████████████████████████████████████████████████████████████████████████████████████████▋  | 284/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights:  98%|███████████████████████████████████████████████████████████████████████████████████████████▋  | 284/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights:  98%|██████████████████████████████████████████████████████████████████████████████████████████████  | 285/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights:  98%|██████████████████████████████████████████████████████████████████████████████████████████████  | 285/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights:  98%|█████████████████████████████████████████████████████████████████████████████████▌ | 286/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights:  98%|█████████████████████████████████████████████████████████████████████████████████▌ | 286/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights:  99%|█████████████████████████████████████████████████████████████████████████████████████████▋ | 287/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights:  99%|█████████████████████████████████████████████████████████████████████████████████████████▋ | 287/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights:  99%|██████████████████████████████████████████████████████████████████████████████████████████ | 288/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights:  99%|██████████████████████████████████████████████████████████████████████████████████████████ | 288/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights:  99%|██████████████████████████████████████████████████████████████████████████████████████████▎| 289/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights:  99%|██████████████████████████████████████████████████████████████████████████████████████████▎| 289/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████▋| 290/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████▋| 290/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:00<00:00, 2280.38it/s, Materializing param=model.norm.weight]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:00<00:00, 2280.38it/s, Materializing param=model.norm.weight]
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:00<00:00, 2332.89it/s, Materializing param=model.norm.weight]
[2026-02-15 04:01:21,136] [WARNING] [accelerate.utils.dataclasses.__post_init__:1962] [PID:6181] sharding_strategy is deprecated in favor of reshard_after_forward. This will be removed in a future version of Accelerate.Multiple deprecation warnings due to FSDP2 conversion:
sync_module_states is obsolete in FSDP2, as it is not needed anymore.Setting sync_module_states to None.
[2026-02-15 04:01:24,329] [WARNING] [py.warnings._showwarnmsg:110] [PID:6181] /root/axolotl/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user. 
  warnings.warn(  # warn only once