File size: 121,082 Bytes
6c2a440
 
 
 
 
 
 
1
2
3
4
5
6
7
8

Loading dataset from disk:   0%|                                                                                                  | 0/224 [00:00<?, ?it/s]
Loading dataset from disk: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 224/224 [00:00<00:00, 19756.58it/s]

Loading weights:   0%|                                                                                                            | 0/311 [00:00<?, ?it/s]
Loading weights:   0%|▏                                                            | 1/311 [00:00<00:00, 18724.57it/s, Materializing param=lm_head.weight]
Loading weights:   0%|▏                                                             | 1/311 [00:00<00:00, 6278.90it/s, Materializing param=lm_head.weight]
Loading weights:   1%|β–Ž                                                  | 2/311 [00:00<00:00, 2680.07it/s, Materializing param=model.embed_tokens.weight]
Loading weights:   1%|β–Ž                                                  | 2/311 [00:00<00:00, 2132.88it/s, Materializing param=model.embed_tokens.weight]
Loading weights:   1%|▍                                      | 3/311 [00:00<00:00, 2615.99it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights:   1%|▍                                      | 3/311 [00:00<00:00, 2394.46it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights:   1%|β–Œ                                        | 4/311 [00:00<00:00, 2873.30it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights:   1%|β–Œ                                        | 4/311 [00:00<00:00, 2706.88it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights:   2%|β–‹                                        | 5/311 [00:00<00:00, 3080.88it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights:   2%|β–‹                                        | 5/311 [00:00<00:00, 2926.12it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights:   2%|β–Š                                          | 6/311 [00:00<00:00, 2899.62it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights:   2%|β–Š                                          | 6/311 [00:00<00:00, 2328.88it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights:   2%|β–‹                             | 7/311 [00:00<00:00, 2429.07it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights:   2%|β–‹                             | 7/311 [00:00<00:00, 2364.89it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights:   3%|β–‰                                     | 8/311 [00:00<00:00, 2572.80it/s, Materializing param=model.layers.0.self_attn.k_norm.weight]
Loading weights:   3%|β–‰                                     | 8/311 [00:00<00:00, 2338.29it/s, Materializing param=model.layers.0.self_attn.k_norm.weight]
Loading weights:   3%|β–ˆ                                     | 9/311 [00:00<00:00, 2540.46it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights:   3%|β–ˆ                                     | 9/311 [00:00<00:00, 2486.74it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights:   3%|β–ˆβ–                                   | 10/311 [00:00<00:00, 2597.57it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights:   3%|β–ˆβ–                                   | 10/311 [00:00<00:00, 2545.09it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights:   4%|β–ˆβ–Ž                                   | 11/311 [00:00<00:00, 2721.97it/s, Materializing param=model.layers.0.self_attn.q_norm.weight]
Loading weights:   4%|β–ˆβ–Ž                                   | 11/311 [00:00<00:00, 1941.07it/s, Materializing param=model.layers.0.self_attn.q_norm.weight]
Loading weights:   4%|β–ˆβ–                                   | 12/311 [00:00<00:00, 1814.54it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights:   4%|β–ˆβ–                                   | 12/311 [00:00<00:00, 1792.12it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights:   4%|β–ˆβ–Œ                                   | 13/311 [00:00<00:00, 1617.64it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights:   4%|β–ˆβ–Œ                                   | 13/311 [00:00<00:00, 1495.25it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights:   5%|β–ˆβ–‹                                    | 14/311 [00:00<00:00, 1572.54it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights:   5%|β–ˆβ–‹                                    | 14/311 [00:00<00:00, 1556.37it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights:   5%|β–ˆβ–‰                                      | 15/311 [00:00<00:00, 1644.22it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights:   5%|β–ˆβ–‰                                      | 15/311 [00:00<00:00, 1629.66it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights:   5%|β–ˆβ–ˆ                                      | 16/311 [00:00<00:00, 1712.97it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights:   5%|β–ˆβ–ˆ                                      | 16/311 [00:00<00:00, 1697.71it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights:   5%|β–ˆβ–ˆβ–Ž                                       | 17/311 [00:00<00:00, 1778.31it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights:   5%|β–ˆβ–ˆβ–Ž                                       | 17/311 [00:00<00:00, 1763.97it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights:   6%|β–ˆβ–‹                           | 18/311 [00:00<00:00, 1803.05it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights:   6%|β–ˆβ–‹                           | 18/311 [00:00<00:00, 1546.16it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights:   6%|β–ˆβ–ˆβ–Ž                                  | 19/311 [00:00<00:00, 1556.45it/s, Materializing param=model.layers.1.self_attn.k_norm.weight]
Loading weights:   6%|β–ˆβ–ˆβ–Ž                                  | 19/311 [00:00<00:00, 1542.98it/s, Materializing param=model.layers.1.self_attn.k_norm.weight]
Loading weights:   6%|β–ˆβ–ˆβ–                                  | 20/311 [00:00<00:00, 1606.71it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights:   6%|β–ˆβ–ˆβ–                                  | 20/311 [00:00<00:00, 1596.04it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights:   7%|β–ˆβ–ˆβ–                                  | 21/311 [00:00<00:00, 1657.39it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights:   7%|β–ˆβ–ˆβ–                                  | 21/311 [00:00<00:00, 1647.16it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights:   7%|β–ˆβ–ˆβ–Œ                                  | 22/311 [00:00<00:00, 1698.91it/s, Materializing param=model.layers.1.self_attn.q_norm.weight]
Loading weights:   7%|β–ˆβ–ˆβ–Œ                                  | 22/311 [00:00<00:00, 1688.56it/s, Materializing param=model.layers.1.self_attn.q_norm.weight]
Loading weights:   7%|β–ˆβ–ˆβ–‹                                  | 23/311 [00:00<00:00, 1749.78it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights:   7%|β–ˆβ–ˆβ–‹                                  | 23/311 [00:00<00:00, 1739.53it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights:   8%|β–ˆβ–ˆβ–Š                                  | 24/311 [00:00<00:00, 1799.81it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights:   8%|β–ˆβ–ˆβ–Š                                  | 24/311 [00:00<00:00, 1789.57it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights:   8%|β–ˆβ–ˆβ–ˆ                                   | 25/311 [00:00<00:00, 1831.51it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights:   8%|β–ˆβ–ˆβ–ˆ                                   | 25/311 [00:00<00:00, 1820.57it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights:   8%|β–ˆβ–ˆβ–ˆβ–Ž                                    | 26/311 [00:00<00:00, 1864.10it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights:   8%|β–ˆβ–ˆβ–ˆβ–Ž                                    | 26/311 [00:00<00:00, 1853.46it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights:   9%|β–ˆβ–ˆβ–ˆβ–                                    | 27/311 [00:00<00:00, 1874.25it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights:   9%|β–ˆβ–ˆβ–ˆβ–                                    | 27/311 [00:00<00:00, 1864.01it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights:   9%|β–ˆβ–ˆβ–ˆβ–Š                                      | 28/311 [00:00<00:00, 1917.83it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights:   9%|β–ˆβ–ˆβ–ˆβ–Š                                      | 28/311 [00:00<00:00, 1908.14it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights:   9%|β–ˆβ–ˆβ–‹                          | 29/311 [00:00<00:00, 1958.47it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights:   9%|β–ˆβ–ˆβ–‹                          | 29/311 [00:00<00:00, 1947.28it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights:  10%|β–ˆβ–ˆβ–ˆβ–Œ                                 | 30/311 [00:00<00:00, 1990.31it/s, Materializing param=model.layers.2.self_attn.k_norm.weight]
Loading weights:  10%|β–ˆβ–ˆβ–ˆβ–Œ                                 | 30/311 [00:00<00:00, 1979.85it/s, Materializing param=model.layers.2.self_attn.k_norm.weight]
Loading weights:  10%|β–ˆβ–ˆβ–ˆβ–‹                                 | 31/311 [00:00<00:00, 2030.03it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights:  10%|β–ˆβ–ˆβ–ˆβ–‹                                 | 31/311 [00:00<00:00, 2019.56it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights:  10%|β–ˆβ–ˆβ–ˆβ–Š                                 | 32/311 [00:00<00:00, 2068.68it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights:  10%|β–ˆβ–ˆβ–ˆβ–Š                                 | 32/311 [00:00<00:00, 2058.62it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights:  11%|β–ˆβ–ˆβ–ˆβ–‰                                 | 33/311 [00:00<00:00, 2105.29it/s, Materializing param=model.layers.2.self_attn.q_norm.weight]
Loading weights:  11%|β–ˆβ–ˆβ–ˆβ–‰                                 | 33/311 [00:00<00:00, 2094.87it/s, Materializing param=model.layers.2.self_attn.q_norm.weight]
Loading weights:  11%|β–ˆβ–ˆβ–ˆβ–ˆ                                 | 34/311 [00:00<00:00, 2140.56it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights:  11%|β–ˆβ–ˆβ–ˆβ–ˆ                                 | 34/311 [00:00<00:00, 2130.11it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights:  11%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                | 35/311 [00:00<00:00, 2177.21it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights:  11%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                | 35/311 [00:00<00:00, 2166.86it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights:  12%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                 | 36/311 [00:00<00:00, 2213.09it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights:  12%|β–ˆβ–ˆβ–ˆβ–ˆβ–                                 | 36/311 [00:00<00:00, 2202.83it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights:  12%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                                   | 37/311 [00:00<00:00, 2245.83it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights:  12%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                                   | 37/311 [00:00<00:00, 2234.61it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights:  12%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                   | 38/311 [00:00<00:00, 2277.04it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights:  12%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                   | 38/311 [00:00<00:00, 2266.35it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights:  13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                    | 39/311 [00:00<00:00, 2310.32it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights:  13%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                    | 39/311 [00:00<00:00, 2299.99it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights:  13%|β–ˆβ–ˆβ–ˆβ–‹                         | 40/311 [00:00<00:00, 2340.38it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights:  13%|β–ˆβ–ˆβ–ˆβ–‹                         | 40/311 [00:00<00:00, 2329.52it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights:  13%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                | 41/311 [00:00<00:00, 2371.79it/s, Materializing param=model.layers.3.self_attn.k_norm.weight]
Loading weights:  13%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                | 41/311 [00:00<00:00, 2361.07it/s, Materializing param=model.layers.3.self_attn.k_norm.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                | 42/311 [00:00<00:00, 2399.59it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰                                | 42/311 [00:00<00:00, 2388.81it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                | 43/311 [00:00<00:00, 2429.48it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                | 43/311 [00:00<00:00, 2417.56it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                               | 44/311 [00:00<00:00, 2455.75it/s, Materializing param=model.layers.3.self_attn.q_norm.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                               | 44/311 [00:00<00:00, 2445.08it/s, Materializing param=model.layers.3.self_attn.q_norm.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                               | 45/311 [00:00<00:00, 2482.59it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                               | 45/311 [00:00<00:00, 2471.60it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights:  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                               | 46/311 [00:00<00:00, 2509.11it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights:  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                               | 46/311 [00:00<00:00, 2498.45it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights:  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                | 47/311 [00:00<00:00, 2424.45it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights:  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                | 47/311 [00:00<00:00, 2413.91it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights:  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                 | 48/311 [00:00<00:00, 2450.48it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights:  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                 | 48/311 [00:00<00:00, 2440.77it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights:  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                 | 49/311 [00:00<00:00, 2466.65it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights:  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                 | 49/311 [00:00<00:00, 2452.72it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights:  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                   | 50/311 [00:00<00:00, 2479.93it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights:  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                   | 50/311 [00:00<00:00, 2469.94it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights:  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                        | 51/311 [00:00<00:00, 2504.97it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights:  16%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š                        | 51/311 [00:00<00:00, 2495.36it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights:  17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 52/311 [00:00<00:00, 2521.02it/s, Materializing param=model.layers.4.self_attn.k_norm.weight]
Loading weights:  17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 52/311 [00:00<00:00, 2494.04it/s, Materializing param=model.layers.4.self_attn.k_norm.weight]
Loading weights:  17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                              | 53/311 [00:00<00:00, 2524.80it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights:  17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                              | 53/311 [00:00<00:00, 2515.37it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights:  17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 54/311 [00:00<00:00, 2548.78it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights:  17%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 54/311 [00:00<00:00, 2539.52it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                              | 55/311 [00:00<00:00, 2570.84it/s, Materializing param=model.layers.4.self_attn.q_norm.weight]
Loading weights:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                              | 55/311 [00:00<00:00, 2561.28it/s, Materializing param=model.layers.4.self_attn.q_norm.weight]
Loading weights:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                              | 56/311 [00:00<00:00, 2593.85it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                              | 56/311 [00:00<00:00, 2584.72it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                              | 57/311 [00:00<00:00, 2556.87it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights:  18%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                              | 57/311 [00:00<00:00, 2547.26it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights:  19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                               | 58/311 [00:00<00:00, 2578.40it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights:  19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                               | 58/311 [00:00<00:00, 2569.63it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights:  19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                | 59/311 [00:00<00:00, 2600.70it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights:  19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                | 59/311 [00:00<00:00, 2592.06it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights:  19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                | 60/311 [00:00<00:00, 2603.76it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights:  19%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                | 60/311 [00:00<00:00, 2594.82it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                 | 61/311 [00:00<00:00, 2625.07it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                 | 61/311 [00:00<00:00, 2616.43it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                       | 62/311 [00:00<00:00, 2645.93it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                       | 62/311 [00:00<00:00, 2637.04it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                             | 63/311 [00:00<00:00, 2662.27it/s, Materializing param=model.layers.5.self_attn.k_norm.weight]
Loading weights:  20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                             | 63/311 [00:00<00:00, 2652.92it/s, Materializing param=model.layers.5.self_attn.k_norm.weight]
Loading weights:  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                             | 64/311 [00:00<00:00, 2681.94it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights:  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                             | 64/311 [00:00<00:00, 2673.45it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights:  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                             | 65/311 [00:00<00:00, 2700.56it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights:  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                             | 65/311 [00:00<00:00, 2691.66it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights:  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                             | 66/311 [00:00<00:00, 2719.77it/s, Materializing param=model.layers.5.self_attn.q_norm.weight]
Loading weights:  21%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                             | 66/311 [00:00<00:00, 2711.14it/s, Materializing param=model.layers.5.self_attn.q_norm.weight]
Loading weights:  22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                             | 67/311 [00:00<00:00, 2737.13it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights:  22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                             | 67/311 [00:00<00:00, 2728.04it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights:  22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                             | 68/311 [00:00<00:00, 2755.44it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights:  22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                             | 68/311 [00:00<00:00, 2746.50it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights:  22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                             | 69/311 [00:00<00:00, 2771.73it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights:  22%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                             | 69/311 [00:00<00:00, 2762.81it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                               | 70/311 [00:00<00:00, 2789.88it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                               | 70/311 [00:00<00:00, 2781.26it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 71/311 [00:00<00:00, 2806.22it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 71/311 [00:00<00:00, 2797.33it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                | 72/311 [00:00<00:00, 2823.68it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                | 72/311 [00:00<00:00, 2815.26it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                      | 73/311 [00:00<00:00, 2839.72it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                      | 73/311 [00:00<00:00, 2830.14it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights:  24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                            | 74/311 [00:00<00:00, 2855.60it/s, Materializing param=model.layers.6.self_attn.k_norm.weight]
Loading weights:  24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                            | 74/311 [00:00<00:00, 2847.07it/s, Materializing param=model.layers.6.self_attn.k_norm.weight]
Loading weights:  24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                            | 75/311 [00:00<00:00, 2862.69it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights:  24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                            | 75/311 [00:00<00:00, 2854.12it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights:  24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                            | 76/311 [00:00<00:00, 2879.32it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights:  24%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                            | 76/311 [00:00<00:00, 2870.90it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights:  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                           | 77/311 [00:00<00:00, 2894.13it/s, Materializing param=model.layers.6.self_attn.q_norm.weight]
Loading weights:  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                           | 77/311 [00:00<00:00, 2885.49it/s, Materializing param=model.layers.6.self_attn.q_norm.weight]
Loading weights:  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                           | 78/311 [00:00<00:00, 2870.92it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights:  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                           | 78/311 [00:00<00:00, 2862.03it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights:  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                           | 79/311 [00:00<00:00, 2886.05it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights:  25%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                           | 79/311 [00:00<00:00, 2877.90it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights:  26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                            | 80/311 [00:00<00:00, 2900.10it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights:  26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                            | 80/311 [00:00<00:00, 2891.80it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights:  26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                             | 81/311 [00:00<00:00, 2915.79it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights:  26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                             | 81/311 [00:00<00:00, 2907.75it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights:  26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                             | 82/311 [00:00<00:00, 2917.26it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights:  26%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                             | 82/311 [00:00<00:00, 2908.84it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights:  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 83/311 [00:00<00:00, 2931.89it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights:  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                              | 83/311 [00:00<00:00, 2923.99it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights:  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                     | 84/311 [00:00<00:00, 2945.44it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights:  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                     | 84/311 [00:00<00:00, 2937.19it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights:  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                           | 85/311 [00:00<00:00, 2960.09it/s, Materializing param=model.layers.7.self_attn.k_norm.weight]
Loading weights:  27%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                           | 85/311 [00:00<00:00, 2952.00it/s, Materializing param=model.layers.7.self_attn.k_norm.weight]
Loading weights:  28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                          | 86/311 [00:00<00:00, 2936.30it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights:  28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                          | 86/311 [00:00<00:00, 2928.03it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights:  28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                          | 87/311 [00:00<00:00, 2950.13it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights:  28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                          | 87/311 [00:00<00:00, 2942.45it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights:  28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                          | 88/311 [00:00<00:00, 2930.26it/s, Materializing param=model.layers.7.self_attn.q_norm.weight]
Loading weights:  28%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                          | 88/311 [00:00<00:00, 2922.10it/s, Materializing param=model.layers.7.self_attn.q_norm.weight]
Loading weights:  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                          | 89/311 [00:00<00:00, 2943.93it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights:  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                          | 89/311 [00:00<00:00, 2936.49it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights:  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                          | 90/311 [00:00<00:00, 2956.51it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights:  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                          | 90/311 [00:00<00:00, 2948.77it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights:  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                           | 91/311 [00:00<00:00, 2970.03it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights:  29%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                           | 91/311 [00:00<00:00, 2962.70it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights:  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                            | 92/311 [00:00<00:00, 2982.41it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights:  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                            | 92/311 [00:00<00:00, 2974.78it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights:  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                            | 93/311 [00:00<00:00, 2995.63it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights:  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                            | 93/311 [00:00<00:00, 2988.04it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights:  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                             | 94/311 [00:00<00:00, 3007.24it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights:  30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                             | 94/311 [00:00<00:00, 2999.74it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights:  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                    | 95/311 [00:00<00:00, 3020.53it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights:  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                    | 95/311 [00:00<00:00, 3013.06it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights:  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                         | 96/311 [00:00<00:00, 3024.99it/s, Materializing param=model.layers.8.self_attn.k_norm.weight]
Loading weights:  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                         | 96/311 [00:00<00:00, 3017.35it/s, Materializing param=model.layers.8.self_attn.k_norm.weight]
Loading weights:  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                         | 97/311 [00:00<00:00, 3001.26it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights:  31%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                         | 97/311 [00:00<00:00, 2993.86it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                         | 98/311 [00:00<00:00, 3013.86it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                         | 98/311 [00:00<00:00, 3006.69it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                         | 99/311 [00:00<00:00, 3010.75it/s, Materializing param=model.layers.8.self_attn.q_norm.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                         | 99/311 [00:00<00:00, 3003.39it/s, Materializing param=model.layers.8.self_attn.q_norm.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                        | 100/311 [00:00<00:00, 3023.10it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                        | 100/311 [00:00<00:00, 3016.14it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                        | 101/311 [00:00<00:00, 3034.30it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights:  32%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                        | 101/311 [00:00<00:00, 3027.17it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                        | 102/311 [00:00<00:00, 3033.12it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                        | 102/311 [00:00<00:00, 3025.74it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                          | 103/311 [00:00<00:00, 2960.19it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                          | 103/311 [00:00<00:00, 2953.27it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                          | 104/311 [00:00<00:00, 2971.99it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights:  33%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                          | 104/311 [00:00<00:00, 2965.40it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights:  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                           | 105/311 [00:00<00:00, 2982.64it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights:  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                           | 105/311 [00:00<00:00, 2976.15it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights:  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                  | 106/311 [00:00<00:00, 2994.74it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights:  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                  | 106/311 [00:00<00:00, 2988.28it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights:  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                       | 107/311 [00:00<00:00, 2969.29it/s, Materializing param=model.layers.9.self_attn.k_norm.weight]
Loading weights:  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                       | 107/311 [00:00<00:00, 2962.61it/s, Materializing param=model.layers.9.self_attn.k_norm.weight]
Loading weights:  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                       | 108/311 [00:00<00:00, 2980.46it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights:  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                       | 108/311 [00:00<00:00, 2973.96it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights:  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                       | 109/311 [00:00<00:00, 2971.94it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights:  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                       | 109/311 [00:00<00:00, 2965.23it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights:  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                       | 110/311 [00:00<00:00, 2982.90it/s, Materializing param=model.layers.9.self_attn.q_norm.weight]
Loading weights:  35%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                       | 110/311 [00:00<00:00, 2976.68it/s, Materializing param=model.layers.9.self_attn.q_norm.weight]
Loading weights:  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                       | 111/311 [00:00<00:00, 2994.43it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights:  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                       | 111/311 [00:00<00:00, 2988.11it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights:  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                       | 112/311 [00:00<00:00, 3002.77it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights:  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                       | 112/311 [00:00<00:00, 2996.24it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights:  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       | 113/311 [00:00<00:00, 3012.08it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights:  36%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                       | 113/311 [00:00<00:00, 3005.72it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights:  37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                        | 114/311 [00:00<00:00, 3022.96it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights:  37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                        | 114/311 [00:00<00:00, 3016.53it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights:  37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                        | 115/311 [00:00<00:00, 3032.19it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights:  37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                        | 115/311 [00:00<00:00, 3025.78it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights:  37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                         | 116/311 [00:00<00:00, 3042.75it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights:  37%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                         | 116/311 [00:00<00:00, 3036.69it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights:  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 117/311 [00:00<00:00, 3052.26it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights:  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 117/311 [00:00<00:00, 3045.78it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights:  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                     | 118/311 [00:00<00:00, 3062.54it/s, Materializing param=model.layers.10.self_attn.k_norm.weight]
Loading weights:  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                     | 118/311 [00:00<00:00, 3056.37it/s, Materializing param=model.layers.10.self_attn.k_norm.weight]
Loading weights:  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                     | 119/311 [00:00<00:00, 3071.58it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights:  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                     | 119/311 [00:00<00:00, 3065.28it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights:  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                     | 120/311 [00:00<00:00, 3075.47it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights:  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                     | 120/311 [00:00<00:00, 3069.25it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights:  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                     | 121/311 [00:00<00:00, 3085.53it/s, Materializing param=model.layers.10.self_attn.q_norm.weight]
Loading weights:  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                     | 121/311 [00:00<00:00, 3079.31it/s, Materializing param=model.layers.10.self_attn.q_norm.weight]
Loading weights:  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                     | 122/311 [00:00<00:00, 3070.04it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights:  39%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                     | 122/311 [00:00<00:00, 3063.66it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                     | 123/311 [00:00<00:00, 3078.12it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                     | 123/311 [00:00<00:00, 3071.89it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                     | 124/311 [00:00<00:00, 3087.86it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                     | 124/311 [00:00<00:00, 3081.87it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                      | 125/311 [00:00<00:00, 3096.47it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights:  40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                      | 125/311 [00:00<00:00, 3090.43it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                      | 126/311 [00:00<00:00, 3105.94it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                      | 126/311 [00:00<00:00, 3100.09it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                       | 127/311 [00:00<00:00, 3114.23it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                       | 127/311 [00:00<00:00, 3108.16it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                | 128/311 [00:00<00:00, 3064.91it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                | 128/311 [00:00<00:00, 3058.71it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                    | 129/311 [00:00<00:00, 3073.82it/s, Materializing param=model.layers.11.self_attn.k_norm.weight]
Loading weights:  41%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                    | 129/311 [00:00<00:00, 3068.03it/s, Materializing param=model.layers.11.self_attn.k_norm.weight]
Loading weights:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                    | 130/311 [00:00<00:00, 3083.16it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                    | 130/311 [00:00<00:00, 3077.47it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                    | 131/311 [00:00<00:00, 3091.09it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                    | 131/311 [00:00<00:00, 3085.26it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                    | 132/311 [00:00<00:00, 3100.16it/s, Materializing param=model.layers.11.self_attn.q_norm.weight]
Loading weights:  42%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                    | 132/311 [00:00<00:00, 3094.53it/s, Materializing param=model.layers.11.self_attn.q_norm.weight]
Loading weights:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                    | 133/311 [00:00<00:00, 3108.12it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                    | 133/311 [00:00<00:00, 3102.47it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    | 134/311 [00:00<00:00, 3117.35it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    | 134/311 [00:00<00:00, 3111.91it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                    | 135/311 [00:00<00:00, 3125.52it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights:  43%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                    | 135/311 [00:00<00:00, 3119.85it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                     | 136/311 [00:00<00:00, 3134.67it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                     | 136/311 [00:00<00:00, 3129.13it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                     | 137/311 [00:00<00:00, 3142.40it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                     | 137/311 [00:00<00:00, 3136.78it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                      | 138/311 [00:00<00:00, 3151.29it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights:  44%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                      | 138/311 [00:00<00:00, 3145.76it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               | 139/311 [00:00<00:00, 3158.87it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               | 139/311 [00:00<00:00, 3153.13it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                   | 140/311 [00:00<00:00, 3167.49it/s, Materializing param=model.layers.12.self_attn.k_norm.weight]
Loading weights:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                   | 140/311 [00:00<00:00, 3162.09it/s, Materializing param=model.layers.12.self_attn.k_norm.weight]
Loading weights:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                   | 141/311 [00:00<00:00, 3175.13it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                   | 141/311 [00:00<00:00, 3169.33it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                   | 142/311 [00:00<00:00, 3183.12it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                   | 142/311 [00:00<00:00, 3177.55it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                   | 143/311 [00:00<00:00, 3190.25it/s, Materializing param=model.layers.12.self_attn.q_norm.weight]
Loading weights:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                   | 143/311 [00:00<00:00, 3184.52it/s, Materializing param=model.layers.12.self_attn.q_norm.weight]
Loading weights:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                  | 144/311 [00:00<00:00, 3198.35it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights:  46%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                  | 144/311 [00:00<00:00, 3192.82it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                  | 145/311 [00:00<00:00, 3205.56it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                  | 145/311 [00:00<00:00, 3199.96it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                   | 146/311 [00:00<00:00, 3213.65it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                   | 146/311 [00:00<00:00, 3208.22it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                    | 147/311 [00:00<00:00, 3180.37it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights:  47%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                    | 147/311 [00:00<00:00, 3174.44it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    | 148/311 [00:00<00:00, 3187.90it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    | 148/311 [00:00<00:00, 3182.67it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                    | 149/311 [00:00<00:00, 3196.35it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                    | 149/311 [00:00<00:00, 3191.20it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              | 150/311 [00:00<00:00, 3203.70it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights:  48%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              | 150/311 [00:00<00:00, 3198.26it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                  | 151/311 [00:00<00:00, 3211.71it/s, Materializing param=model.layers.13.self_attn.k_norm.weight]
Loading weights:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                  | 151/311 [00:00<00:00, 3206.51it/s, Materializing param=model.layers.13.self_attn.k_norm.weight]
Loading weights:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  | 152/311 [00:00<00:00, 3218.67it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  | 152/311 [00:00<00:00, 3213.39it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                 | 153/311 [00:00<00:00, 3226.53it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights:  49%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                 | 153/311 [00:00<00:00, 3221.43it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                 | 154/311 [00:00<00:00, 3179.94it/s, Materializing param=model.layers.13.self_attn.q_norm.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                 | 154/311 [00:00<00:00, 3174.54it/s, Materializing param=model.layers.13.self_attn.q_norm.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                 | 155/311 [00:00<00:00, 3187.46it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                 | 155/311 [00:00<00:00, 3182.46it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                 | 156/311 [00:00<00:00, 3194.43it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                 | 156/311 [00:00<00:00, 3189.27it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                 | 157/311 [00:00<00:00, 3202.09it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                 | 157/311 [00:00<00:00, 3197.06it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                  | 158/311 [00:00<00:00, 3208.83it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                  | 158/311 [00:00<00:00, 3203.74it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                  | 159/311 [00:00<00:00, 3216.44it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                  | 159/311 [00:00<00:00, 3211.52it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                   | 160/311 [00:00<00:00, 3223.13it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights:  51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                   | 160/311 [00:00<00:00, 3218.03it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 161/311 [00:00<00:00, 3230.46it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 161/311 [00:00<00:00, 3225.60it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 162/311 [00:00<00:00, 3237.16it/s, Materializing param=model.layers.14.self_attn.k_norm.weight]
Loading weights:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 162/311 [00:00<00:00, 3232.07it/s, Materializing param=model.layers.14.self_attn.k_norm.weight]
Loading weights:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                | 163/311 [00:00<00:00, 3230.32it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights:  52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                | 163/311 [00:00<00:00, 3224.96it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 164/311 [00:00<00:00, 3237.15it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 164/311 [00:00<00:00, 3232.21it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                | 165/311 [00:00<00:00, 3238.86it/s, Materializing param=model.layers.14.self_attn.q_norm.weight]
Loading weights:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                | 165/311 [00:00<00:00, 3233.70it/s, Materializing param=model.layers.14.self_attn.q_norm.weight]
Loading weights:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                | 166/311 [00:00<00:00, 3245.91it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights:  53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                | 166/311 [00:00<00:00, 3240.96it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                | 167/311 [00:00<00:00, 3252.12it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                | 167/311 [00:00<00:00, 3247.27it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 168/311 [00:00<00:00, 3259.49it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 168/311 [00:00<00:00, 3254.66it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                 | 169/311 [00:00<00:00, 3265.76it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights:  54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                 | 169/311 [00:00<00:00, 3260.76it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                 | 170/311 [00:00<00:00, 3260.21it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                 | 170/311 [00:00<00:00, 3255.14it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                  | 171/311 [00:00<00:00, 3247.04it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                  | 171/311 [00:00<00:00, 3241.99it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰            | 172/311 [00:00<00:00, 3253.63it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights:  55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰            | 172/311 [00:00<00:00, 3248.78it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–               | 173/311 [00:00<00:00, 3259.49it/s, Materializing param=model.layers.15.self_attn.k_norm.weight]
Loading weights:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–               | 173/311 [00:00<00:00, 3254.56it/s, Materializing param=model.layers.15.self_attn.k_norm.weight]
Loading weights:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ               | 174/311 [00:00<00:00, 3265.92it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ               | 174/311 [00:00<00:00, 3261.18it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹               | 175/311 [00:00<00:00, 3220.67it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹               | 175/311 [00:00<00:00, 3215.56it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š               | 176/311 [00:00<00:00, 3226.78it/s, Materializing param=model.layers.15.self_attn.q_norm.weight]
Loading weights:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š               | 176/311 [00:00<00:00, 3222.33it/s, Materializing param=model.layers.15.self_attn.q_norm.weight]
Loading weights:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰               | 177/311 [00:00<00:00, 3227.92it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰               | 177/311 [00:00<00:00, 3223.24it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               | 178/311 [00:00<00:00, 3234.51it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights:  57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               | 178/311 [00:00<00:00, 3230.04it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹               | 179/311 [00:00<00:00, 3228.76it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹               | 179/311 [00:00<00:00, 3224.06it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                | 180/311 [00:00<00:00, 3235.22it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                | 180/311 [00:00<00:00, 3230.72it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                | 181/311 [00:00<00:00, 3241.92it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights:  58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                | 181/311 [00:00<00:00, 3237.45it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 182/311 [00:00<00:00, 3247.61it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                | 182/311 [00:00<00:00, 3242.96it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰           | 183/311 [00:00<00:00, 3254.05it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰           | 183/311 [00:00<00:00, 3249.42it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹              | 184/311 [00:00<00:00, 3259.27it/s, Materializing param=model.layers.16.self_attn.k_norm.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹              | 184/311 [00:00<00:00, 3249.18it/s, Materializing param=model.layers.16.self_attn.k_norm.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š              | 185/311 [00:00<00:00, 3259.31it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š              | 185/311 [00:00<00:00, 3254.79it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰              | 186/311 [00:00<00:00, 3264.53it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰              | 186/311 [00:00<00:00, 3259.99it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              | 187/311 [00:00<00:00, 3270.77it/s, Materializing param=model.layers.16.self_attn.q_norm.weight]
Loading weights:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ              | 187/311 [00:00<00:00, 3266.35it/s, Materializing param=model.layers.16.self_attn.q_norm.weight]
Loading weights:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–             | 188/311 [00:00<00:00, 3276.13it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights:  60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–             | 188/311 [00:00<00:00, 3271.58it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž             | 189/311 [00:00<00:00, 3282.15it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž             | 189/311 [00:00<00:00, 3277.61it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰              | 190/311 [00:00<00:00, 3286.94it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰              | 190/311 [00:00<00:00, 3282.37it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž              | 191/311 [00:00<00:00, 3293.03it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights:  61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž              | 191/311 [00:00<00:00, 3288.62it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–              | 192/311 [00:00<00:00, 3298.11it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–              | 192/311 [00:00<00:00, 3293.70it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š               | 193/311 [00:00<00:00, 3290.64it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š               | 193/311 [00:00<00:00, 3285.96it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 194/311 [00:00<00:00, 3296.31it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights:  62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 194/311 [00:00<00:00, 3291.90it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 195/311 [00:00<00:00, 3301.34it/s, Materializing param=model.layers.17.self_attn.k_norm.weight]
Loading weights:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰             | 195/311 [00:00<00:00, 3296.86it/s, Materializing param=model.layers.17.self_attn.k_norm.weight]
Loading weights:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ             | 196/311 [00:00<00:00, 3307.09it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ             | 196/311 [00:00<00:00, 3302.81it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–            | 197/311 [00:00<00:00, 3279.27it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights:  63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–            | 197/311 [00:00<00:00, 3274.62it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž            | 198/311 [00:00<00:00, 3284.59it/s, Materializing param=model.layers.17.self_attn.q_norm.weight]
Loading weights:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž            | 198/311 [00:00<00:00, 3280.22it/s, Materializing param=model.layers.17.self_attn.q_norm.weight]
Loading weights:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–            | 199/311 [00:00<00:00, 3289.31it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–            | 199/311 [00:00<00:00, 3285.00it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ            | 200/311 [00:00<00:00, 3294.96it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights:  64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ            | 200/311 [00:00<00:00, 3290.75it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž            | 201/311 [00:00<00:00, 3300.02it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž            | 201/311 [00:00<00:00, 3295.73it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹             | 202/311 [00:00<00:00, 3293.17it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹             | 202/311 [00:00<00:00, 3288.68it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š             | 203/311 [00:00<00:00, 3298.39it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights:  65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š             | 203/311 [00:00<00:00, 3294.24it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–             | 204/311 [00:00<00:00, 3303.44it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–             | 204/311 [00:00<00:00, 3299.24it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š         | 205/311 [00:00<00:00, 3309.02it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š         | 205/311 [00:00<00:00, 3304.98it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–           | 206/311 [00:00<00:00, 3314.28it/s, Materializing param=model.layers.18.self_attn.k_norm.weight]
Loading weights:  66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–           | 206/311 [00:00<00:00, 3310.12it/s, Materializing param=model.layers.18.self_attn.k_norm.weight]
Loading weights:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž           | 207/311 [00:00<00:00, 3319.86it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž           | 207/311 [00:00<00:00, 3315.78it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–           | 208/311 [00:00<00:00, 3309.44it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–           | 208/311 [00:00<00:00, 3305.10it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ           | 209/311 [00:00<00:00, 3281.01it/s, Materializing param=model.layers.18.self_attn.q_norm.weight]
Loading weights:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ           | 209/311 [00:00<00:00, 3276.63it/s, Materializing param=model.layers.18.self_attn.q_norm.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹           | 210/311 [00:00<00:00, 3286.12it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹           | 210/311 [00:00<00:00, 3281.93it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹           | 211/311 [00:00<00:00, 3290.62it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹           | 211/311 [00:00<00:00, 3286.47it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ           | 212/311 [00:00<00:00, 3295.98it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ           | 212/311 [00:00<00:00, 3291.94it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ            | 213/311 [00:00<00:00, 3300.60it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights:  68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ            | 213/311 [00:00<00:00, 3296.47it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–           | 214/311 [00:00<00:00, 3305.85it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–           | 214/311 [00:00<00:00, 3301.92it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹            | 215/311 [00:00<00:00, 3310.36it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹            | 215/311 [00:00<00:00, 3306.33it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š        | 216/311 [00:00<00:00, 3315.67it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights:  69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š        | 216/311 [00:00<00:00, 3311.64it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–          | 217/311 [00:00<00:00, 3320.12it/s, Materializing param=model.layers.19.self_attn.k_norm.weight]
Loading weights:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–          | 217/311 [00:00<00:00, 3315.92it/s, Materializing param=model.layers.19.self_attn.k_norm.weight]
Loading weights:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ          | 218/311 [00:00<00:00, 3324.99it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ          | 218/311 [00:00<00:00, 3320.93it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹          | 219/311 [00:00<00:00, 3325.55it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights:  70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹          | 219/311 [00:00<00:00, 3321.40it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 220/311 [00:00<00:00, 3329.83it/s, Materializing param=model.layers.19.self_attn.q_norm.weight]
Loading weights:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 220/311 [00:00<00:00, 3325.90it/s, Materializing param=model.layers.19.self_attn.q_norm.weight]
Loading weights:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 221/311 [00:00<00:00, 3303.59it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 221/311 [00:00<00:00, 3299.46it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰          | 222/311 [00:00<00:00, 3293.87it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights:  71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰          | 222/311 [00:00<00:00, 3289.92it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 223/311 [00:00<00:00, 3299.08it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š          | 223/311 [00:00<00:00, 3295.36it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž          | 224/311 [00:00<00:00, 3303.76it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž          | 224/311 [00:00<00:00, 3299.93it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–          | 225/311 [00:00<00:00, 3308.15it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights:  72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–          | 225/311 [00:00<00:00, 3304.35it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ           | 226/311 [00:00<00:00, 3312.56it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ           | 226/311 [00:00<00:00, 3308.83it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹       | 227/311 [00:00<00:00, 3317.10it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹       | 227/311 [00:00<00:00, 3313.26it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹         | 228/311 [00:00<00:00, 3321.40it/s, Materializing param=model.layers.20.self_attn.k_norm.weight]
Loading weights:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹         | 228/311 [00:00<00:00, 3317.66it/s, Materializing param=model.layers.20.self_attn.k_norm.weight]
Loading weights:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š         | 229/311 [00:00<00:00, 3326.53it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š         | 229/311 [00:00<00:00, 3322.81it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰         | 230/311 [00:00<00:00, 3330.84it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰         | 230/311 [00:00<00:00, 3327.06it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰         | 231/311 [00:00<00:00, 3335.85it/s, Materializing param=model.layers.20.self_attn.q_norm.weight]
Loading weights:  74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰         | 231/311 [00:00<00:00, 3332.13it/s, Materializing param=model.layers.20.self_attn.q_norm.weight]
Loading weights:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         | 232/311 [00:00<00:00, 3333.28it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         | 232/311 [00:00<00:00, 3329.43it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–        | 233/311 [00:00<00:00, 3338.00it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–        | 233/311 [00:00<00:00, 3334.38it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         | 234/311 [00:00<00:00, 3342.36it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights:  75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ         | 234/311 [00:00<00:00, 3338.65it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹         | 235/311 [00:00<00:00, 3338.06it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹         | 235/311 [00:00<00:00, 3334.17it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š         | 236/311 [00:00<00:00, 3342.77it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š         | 236/311 [00:00<00:00, 3339.15it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–         | 237/311 [00:00<00:00, 3331.50it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–         | 237/311 [00:00<00:00, 3327.73it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹      | 238/311 [00:00<00:00, 3336.36it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹      | 238/311 [00:00<00:00, 3332.69it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰        | 239/311 [00:00<00:00, 3340.57it/s, Materializing param=model.layers.21.self_attn.k_norm.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰        | 239/311 [00:00<00:00, 3336.90it/s, Materializing param=model.layers.21.self_attn.k_norm.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        | 240/311 [00:00<00:00, 3338.35it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        | 240/311 [00:00<00:00, 3334.58it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        | 241/311 [00:00<00:00, 3342.12it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        | 241/311 [00:00<00:00, 3338.43it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 242/311 [00:00<00:00, 3346.03it/s, Materializing param=model.layers.21.self_attn.q_norm.weight]
Loading weights:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 242/311 [00:00<00:00, 3342.43it/s, Materializing param=model.layers.21.self_attn.q_norm.weight]
Loading weights:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž       | 243/311 [00:00<00:00, 3350.88it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž       | 243/311 [00:00<00:00, 3347.40it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 244/311 [00:00<00:00, 3316.82it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights:  78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 244/311 [00:00<00:00, 3313.03it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž       | 245/311 [00:00<00:00, 3321.09it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž       | 245/311 [00:00<00:00, 3317.56it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        | 246/311 [00:00<00:00, 3325.17it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        | 246/311 [00:00<00:00, 3321.69it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 247/311 [00:00<00:00, 3329.93it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights:  79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–       | 247/311 [00:00<00:00, 3326.53it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰        | 248/311 [00:00<00:00, 3334.28it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰        | 248/311 [00:00<00:00, 3330.69it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 249/311 [00:00<00:00, 3338.87it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 249/311 [00:00<00:00, 3335.32it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 250/311 [00:00<00:00, 3343.01it/s, Materializing param=model.layers.22.self_attn.k_norm.weight]
Loading weights:  80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 250/311 [00:00<00:00, 3339.52it/s, Materializing param=model.layers.22.self_attn.k_norm.weight]
Loading weights:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 251/311 [00:00<00:00, 3347.53it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 251/311 [00:00<00:00, 3344.13it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž      | 252/311 [00:00<00:00, 3351.78it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž      | 252/311 [00:00<00:00, 3348.24it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 253/311 [00:00<00:00, 3356.29it/s, Materializing param=model.layers.22.self_attn.q_norm.weight]
Loading weights:  81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 253/311 [00:00<00:00, 3352.82it/s, Materializing param=model.layers.22.self_attn.q_norm.weight]
Loading weights:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ      | 254/311 [00:00<00:00, 3360.33it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ      | 254/311 [00:00<00:00, 3356.52it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹      | 255/311 [00:00<00:00, 3355.32it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹      | 255/311 [00:00<00:00, 3351.78it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹      | 256/311 [00:00<00:00, 3336.59it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights:  82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹      | 256/311 [00:00<00:00, 3332.98it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 257/311 [00:00<00:00, 3340.84it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–      | 257/311 [00:00<00:00, 3337.48it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ      | 258/311 [00:00<00:00, 3322.84it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ      | 258/311 [00:00<00:00, 3319.35it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž      | 259/311 [00:00<00:00, 3327.03it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž      | 259/311 [00:00<00:00, 3323.86it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 260/311 [00:00<00:00, 3331.12it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 260/311 [00:00<00:00, 3327.74it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 261/311 [00:00<00:00, 3335.63it/s, Materializing param=model.layers.23.self_attn.k_norm.weight]
Loading weights:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 261/311 [00:00<00:00, 3332.27it/s, Materializing param=model.layers.23.self_attn.k_norm.weight]
Loading weights:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–     | 262/311 [00:00<00:00, 3339.32it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights:  84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–     | 262/311 [00:00<00:00, 3336.05it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 263/311 [00:00<00:00, 3343.92it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 263/311 [00:00<00:00, 3340.65it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 264/311 [00:00<00:00, 3347.58it/s, Materializing param=model.layers.23.self_attn.q_norm.weight]
Loading weights:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 264/311 [00:00<00:00, 3344.18it/s, Materializing param=model.layers.23.self_attn.q_norm.weight]
Loading weights:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 265/311 [00:00<00:00, 3351.88it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights:  85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 265/311 [00:00<00:00, 3348.61it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 266/311 [00:00<00:00, 3355.49it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 266/311 [00:00<00:00, 3352.14it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 267/311 [00:00<00:00, 3359.84it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 267/311 [00:00<00:00, 3356.57it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 268/311 [00:00<00:00, 3363.50it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 268/311 [00:00<00:00, 3360.17it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 269/311 [00:00<00:00, 3332.47it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights:  86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 269/311 [00:00<00:00, 3329.07it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 270/311 [00:00<00:00, 3336.62it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 270/311 [00:00<00:00, 3333.48it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 271/311 [00:00<00:00, 3340.26it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 271/311 [00:00<00:00, 3336.87it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 272/311 [00:00<00:00, 3344.32it/s, Materializing param=model.layers.24.self_attn.k_norm.weight]
Loading weights:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 272/311 [00:00<00:00, 3341.14it/s, Materializing param=model.layers.24.self_attn.k_norm.weight]
Loading weights:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 273/311 [00:00<00:00, 3347.89it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 273/311 [00:00<00:00, 3344.42it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 274/311 [00:00<00:00, 3351.72it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 274/311 [00:00<00:00, 3348.46it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 275/311 [00:00<00:00, 3355.19it/s, Materializing param=model.layers.24.self_attn.q_norm.weight]
Loading weights:  88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 275/311 [00:00<00:00, 3351.90it/s, Materializing param=model.layers.24.self_attn.q_norm.weight]
Loading weights:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 276/311 [00:00<00:00, 3359.30it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 276/311 [00:00<00:00, 3356.20it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 277/311 [00:00<00:00, 3362.99it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 277/311 [00:00<00:00, 3359.71it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 278/311 [00:00<00:00, 3367.02it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights:  89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 278/311 [00:00<00:00, 3363.89it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 279/311 [00:00<00:00, 3370.58it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 279/311 [00:00<00:00, 3367.28it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 280/311 [00:00<00:00, 3374.52it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 280/311 [00:00<00:00, 3371.35it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 281/311 [00:00<00:00, 3377.93it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights:  90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 281/311 [00:00<00:00, 3374.75it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 282/311 [00:00<00:00, 3374.35it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 282/311 [00:00<00:00, 3370.98it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 283/311 [00:00<00:00, 3378.23it/s, Materializing param=model.layers.25.self_attn.k_norm.weight]
Loading weights:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 283/311 [00:00<00:00, 3375.11it/s, Materializing param=model.layers.25.self_attn.k_norm.weight]
Loading weights:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 284/311 [00:00<00:00, 3381.63it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights:  91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 284/311 [00:00<00:00, 3378.45it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 285/311 [00:00<00:00, 3361.73it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 285/311 [00:00<00:00, 3358.48it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 286/311 [00:00<00:00, 3365.54it/s, Materializing param=model.layers.25.self_attn.q_norm.weight]
Loading weights:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 286/311 [00:00<00:00, 3362.51it/s, Materializing param=model.layers.25.self_attn.q_norm.weight]
Loading weights:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 287/311 [00:00<00:00, 3368.88it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 287/311 [00:00<00:00, 3365.75it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 288/311 [00:00<00:00, 3372.84it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 288/311 [00:00<00:00, 3369.83it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 289/311 [00:00<00:00, 3376.31it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 289/311 [00:00<00:00, 3373.20it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 290/311 [00:00<00:00, 3380.17it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 290/311 [00:00<00:00, 3377.15it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 291/311 [00:00<00:00, 3383.54it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 291/311 [00:00<00:00, 3380.51it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 292/311 [00:00<00:00, 3387.44it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 292/311 [00:00<00:00, 3384.46it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 293/311 [00:00<00:00, 3369.56it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights:  94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 293/311 [00:00<00:00, 3366.28it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 294/311 [00:00<00:00, 3373.14it/s, Materializing param=model.layers.26.self_attn.k_norm.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 294/311 [00:00<00:00, 3370.14it/s, Materializing param=model.layers.26.self_attn.k_norm.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 295/311 [00:00<00:00, 3376.43it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 295/311 [00:00<00:00, 3373.38it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 296/311 [00:00<00:00, 3380.22it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 296/311 [00:00<00:00, 3377.31it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 297/311 [00:00<00:00, 3383.58it/s, Materializing param=model.layers.26.self_attn.q_norm.weight]
Loading weights:  95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 297/311 [00:00<00:00, 3380.56it/s, Materializing param=model.layers.26.self_attn.q_norm.weight]
Loading weights:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 298/311 [00:00<00:00, 3387.44it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 298/311 [00:00<00:00, 3384.51it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 299/311 [00:00<00:00, 3375.01it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 299/311 [00:00<00:00, 3371.86it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 300/311 [00:00<00:00, 3378.53it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 300/311 [00:00<00:00, 3375.54it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 301/311 [00:00<00:00, 3376.54it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 301/311 [00:00<00:00, 3373.55it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 302/311 [00:00<00:00, 3380.54it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 302/311 [00:00<00:00, 3377.68it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 303/311 [00:00<00:00, 3384.74it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights:  97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 303/311 [00:00<00:00, 3381.89it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 304/311 [00:00<00:00, 3389.04it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 304/311 [00:00<00:00, 3385.98it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 305/311 [00:00<00:00, 3393.06it/s, Materializing param=model.layers.27.self_attn.k_norm.weight]
Loading weights:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 305/311 [00:00<00:00, 3390.17it/s, Materializing param=model.layers.27.self_attn.k_norm.weight]
Loading weights:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 306/311 [00:00<00:00, 3397.25it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights:  98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 306/311 [00:00<00:00, 3394.46it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 307/311 [00:00<00:00, 3401.58it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 307/311 [00:00<00:00, 3398.78it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 308/311 [00:00<00:00, 3405.90it/s, Materializing param=model.layers.27.self_attn.q_norm.weight]
Loading weights:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 308/311 [00:00<00:00, 3403.11it/s, Materializing param=model.layers.27.self_attn.q_norm.weight]
Loading weights:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 309/311 [00:00<00:00, 3410.25it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 309/311 [00:00<00:00, 3407.48it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 310/311 [00:00<00:00, 3414.51it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 310/311 [00:00<00:00, 3411.73it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 311/311 [00:00<00:00, 3418.79it/s, Materializing param=model.norm.weight]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 311/311 [00:00<00:00, 3416.01it/s, Materializing param=model.norm.weight]
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 311/311 [00:00<00:00, 3410.59it/s, Materializing param=model.norm.weight]
[2026-02-05 12:27:24,916] [WARNING] [torchao.<module>:39] [PID:40655] Skipping import of cpp extensions due to incompatible torch version 2.9.1+cu128 for torchao version 0.13.0
[2026-02-05 12:27:30,719] [WARNING] [accelerate.utils.dataclasses.__post_init__:1962] [PID:40655] sharding_strategy is deprecated in favor of reshard_after_forward. This will be removed in a future version of Accelerate.
[2026-02-05 16:18:59,605] [WARNING] [py.warnings._showwarnmsg:110] [PID:40655] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(