File size: 121,082 Bytes
6c2a440 | 1 2 3 4 5 6 7 8 |
Loading dataset from disk: 0%| | 0/224 [00:00<?, ?it/s]
Loading dataset from disk: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 224/224 [00:00<00:00, 19756.58it/s]
Loading weights: 0%| | 0/311 [00:00<?, ?it/s]
Loading weights: 0%|β | 1/311 [00:00<00:00, 18724.57it/s, Materializing param=lm_head.weight]
Loading weights: 0%|β | 1/311 [00:00<00:00, 6278.90it/s, Materializing param=lm_head.weight]
Loading weights: 1%|β | 2/311 [00:00<00:00, 2680.07it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%|β | 2/311 [00:00<00:00, 2132.88it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%|β | 3/311 [00:00<00:00, 2615.99it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%|β | 3/311 [00:00<00:00, 2394.46it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%|β | 4/311 [00:00<00:00, 2873.30it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 1%|β | 4/311 [00:00<00:00, 2706.88it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 2%|β | 5/311 [00:00<00:00, 3080.88it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 2%|β | 5/311 [00:00<00:00, 2926.12it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 2%|β | 6/311 [00:00<00:00, 2899.62it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|β | 6/311 [00:00<00:00, 2328.88it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|β | 7/311 [00:00<00:00, 2429.07it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 2%|β | 7/311 [00:00<00:00, 2364.89it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 3%|β | 8/311 [00:00<00:00, 2572.80it/s, Materializing param=model.layers.0.self_attn.k_norm.weight]
Loading weights: 3%|β | 8/311 [00:00<00:00, 2338.29it/s, Materializing param=model.layers.0.self_attn.k_norm.weight]
Loading weights: 3%|β | 9/311 [00:00<00:00, 2540.46it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 3%|β | 9/311 [00:00<00:00, 2486.74it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 3%|ββ | 10/311 [00:00<00:00, 2597.57it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 3%|ββ | 10/311 [00:00<00:00, 2545.09it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 4%|ββ | 11/311 [00:00<00:00, 2721.97it/s, Materializing param=model.layers.0.self_attn.q_norm.weight]
Loading weights: 4%|ββ | 11/311 [00:00<00:00, 1941.07it/s, Materializing param=model.layers.0.self_attn.q_norm.weight]
Loading weights: 4%|ββ | 12/311 [00:00<00:00, 1814.54it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 4%|ββ | 12/311 [00:00<00:00, 1792.12it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 4%|ββ | 13/311 [00:00<00:00, 1617.64it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 4%|ββ | 13/311 [00:00<00:00, 1495.25it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 5%|ββ | 14/311 [00:00<00:00, 1572.54it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 5%|ββ | 14/311 [00:00<00:00, 1556.37it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 5%|ββ | 15/311 [00:00<00:00, 1644.22it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 5%|ββ | 15/311 [00:00<00:00, 1629.66it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 5%|ββ | 16/311 [00:00<00:00, 1712.97it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 5%|ββ | 16/311 [00:00<00:00, 1697.71it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 5%|βββ | 17/311 [00:00<00:00, 1778.31it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 5%|βββ | 17/311 [00:00<00:00, 1763.97it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 6%|ββ | 18/311 [00:00<00:00, 1803.05it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 6%|ββ | 18/311 [00:00<00:00, 1546.16it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 6%|βββ | 19/311 [00:00<00:00, 1556.45it/s, Materializing param=model.layers.1.self_attn.k_norm.weight]
Loading weights: 6%|βββ | 19/311 [00:00<00:00, 1542.98it/s, Materializing param=model.layers.1.self_attn.k_norm.weight]
Loading weights: 6%|βββ | 20/311 [00:00<00:00, 1606.71it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 6%|βββ | 20/311 [00:00<00:00, 1596.04it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 7%|βββ | 21/311 [00:00<00:00, 1657.39it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 7%|βββ | 21/311 [00:00<00:00, 1647.16it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 7%|βββ | 22/311 [00:00<00:00, 1698.91it/s, Materializing param=model.layers.1.self_attn.q_norm.weight]
Loading weights: 7%|βββ | 22/311 [00:00<00:00, 1688.56it/s, Materializing param=model.layers.1.self_attn.q_norm.weight]
Loading weights: 7%|βββ | 23/311 [00:00<00:00, 1749.78it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 7%|βββ | 23/311 [00:00<00:00, 1739.53it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 8%|βββ | 24/311 [00:00<00:00, 1799.81it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 8%|βββ | 24/311 [00:00<00:00, 1789.57it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 8%|βββ | 25/311 [00:00<00:00, 1831.51it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 8%|βββ | 25/311 [00:00<00:00, 1820.57it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 8%|ββββ | 26/311 [00:00<00:00, 1864.10it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 8%|ββββ | 26/311 [00:00<00:00, 1853.46it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 9%|ββββ | 27/311 [00:00<00:00, 1874.25it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 9%|ββββ | 27/311 [00:00<00:00, 1864.01it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 9%|ββββ | 28/311 [00:00<00:00, 1917.83it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 9%|ββββ | 28/311 [00:00<00:00, 1908.14it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 9%|βββ | 29/311 [00:00<00:00, 1958.47it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 9%|βββ | 29/311 [00:00<00:00, 1947.28it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 10%|ββββ | 30/311 [00:00<00:00, 1990.31it/s, Materializing param=model.layers.2.self_attn.k_norm.weight]
Loading weights: 10%|ββββ | 30/311 [00:00<00:00, 1979.85it/s, Materializing param=model.layers.2.self_attn.k_norm.weight]
Loading weights: 10%|ββββ | 31/311 [00:00<00:00, 2030.03it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 10%|ββββ | 31/311 [00:00<00:00, 2019.56it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 10%|ββββ | 32/311 [00:00<00:00, 2068.68it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 10%|ββββ | 32/311 [00:00<00:00, 2058.62it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 11%|ββββ | 33/311 [00:00<00:00, 2105.29it/s, Materializing param=model.layers.2.self_attn.q_norm.weight]
Loading weights: 11%|ββββ | 33/311 [00:00<00:00, 2094.87it/s, Materializing param=model.layers.2.self_attn.q_norm.weight]
Loading weights: 11%|ββββ | 34/311 [00:00<00:00, 2140.56it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 11%|ββββ | 34/311 [00:00<00:00, 2130.11it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 11%|βββββ | 35/311 [00:00<00:00, 2177.21it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 11%|βββββ | 35/311 [00:00<00:00, 2166.86it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 12%|βββββ | 36/311 [00:00<00:00, 2213.09it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 12%|βββββ | 36/311 [00:00<00:00, 2202.83it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 12%|βββββ | 37/311 [00:00<00:00, 2245.83it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 12%|βββββ | 37/311 [00:00<00:00, 2234.61it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 12%|βββββ | 38/311 [00:00<00:00, 2277.04it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 12%|βββββ | 38/311 [00:00<00:00, 2266.35it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 13%|ββββββ | 39/311 [00:00<00:00, 2310.32it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 13%|ββββββ | 39/311 [00:00<00:00, 2299.99it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 13%|ββββ | 40/311 [00:00<00:00, 2340.38it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 13%|ββββ | 40/311 [00:00<00:00, 2329.52it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 13%|βββββ | 41/311 [00:00<00:00, 2371.79it/s, Materializing param=model.layers.3.self_attn.k_norm.weight]
Loading weights: 13%|βββββ | 41/311 [00:00<00:00, 2361.07it/s, Materializing param=model.layers.3.self_attn.k_norm.weight]
Loading weights: 14%|βββββ | 42/311 [00:00<00:00, 2399.59it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 14%|βββββ | 42/311 [00:00<00:00, 2388.81it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 14%|βββββ | 43/311 [00:00<00:00, 2429.48it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 14%|βββββ | 43/311 [00:00<00:00, 2417.56it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 14%|ββββββ | 44/311 [00:00<00:00, 2455.75it/s, Materializing param=model.layers.3.self_attn.q_norm.weight]
Loading weights: 14%|ββββββ | 44/311 [00:00<00:00, 2445.08it/s, Materializing param=model.layers.3.self_attn.q_norm.weight]
Loading weights: 14%|ββββββ | 45/311 [00:00<00:00, 2482.59it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 14%|ββββββ | 45/311 [00:00<00:00, 2471.60it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 15%|ββββββ | 46/311 [00:00<00:00, 2509.11it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 15%|ββββββ | 46/311 [00:00<00:00, 2498.45it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 15%|ββββββ | 47/311 [00:00<00:00, 2424.45it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 15%|ββββββ | 47/311 [00:00<00:00, 2413.91it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 15%|βββββββ | 48/311 [00:00<00:00, 2450.48it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 15%|βββββββ | 48/311 [00:00<00:00, 2440.77it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 16%|βββββββ | 49/311 [00:00<00:00, 2466.65it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 16%|βββββββ | 49/311 [00:00<00:00, 2452.72it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 16%|βββββββ | 50/311 [00:00<00:00, 2479.93it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 16%|βββββββ | 50/311 [00:00<00:00, 2469.94it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 16%|βββββ | 51/311 [00:00<00:00, 2504.97it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 16%|βββββ | 51/311 [00:00<00:00, 2495.36it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 17%|βββββββ | 52/311 [00:00<00:00, 2521.02it/s, Materializing param=model.layers.4.self_attn.k_norm.weight]
Loading weights: 17%|βββββββ | 52/311 [00:00<00:00, 2494.04it/s, Materializing param=model.layers.4.self_attn.k_norm.weight]
Loading weights: 17%|βββββββ | 53/311 [00:00<00:00, 2524.80it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 17%|βββββββ | 53/311 [00:00<00:00, 2515.37it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 17%|βββββββ | 54/311 [00:00<00:00, 2548.78it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 17%|βββββββ | 54/311 [00:00<00:00, 2539.52it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 18%|βββββββ | 55/311 [00:00<00:00, 2570.84it/s, Materializing param=model.layers.4.self_attn.q_norm.weight]
Loading weights: 18%|βββββββ | 55/311 [00:00<00:00, 2561.28it/s, Materializing param=model.layers.4.self_attn.q_norm.weight]
Loading weights: 18%|βββββββ | 56/311 [00:00<00:00, 2593.85it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 18%|βββββββ | 56/311 [00:00<00:00, 2584.72it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 18%|βββββββ | 57/311 [00:00<00:00, 2556.87it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 18%|βββββββ | 57/311 [00:00<00:00, 2547.26it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 19%|βββββββ | 58/311 [00:00<00:00, 2578.40it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 19%|βββββββ | 58/311 [00:00<00:00, 2569.63it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 19%|ββββββββ | 59/311 [00:00<00:00, 2600.70it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 19%|ββββββββ | 59/311 [00:00<00:00, 2592.06it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 19%|ββββββββ | 60/311 [00:00<00:00, 2603.76it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 19%|ββββββββ | 60/311 [00:00<00:00, 2594.82it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 20%|βββββββββ | 61/311 [00:00<00:00, 2625.07it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 20%|βββββββββ | 61/311 [00:00<00:00, 2616.43it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 20%|ββββββ | 62/311 [00:00<00:00, 2645.93it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 20%|ββββββ | 62/311 [00:00<00:00, 2637.04it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 20%|ββββββββ | 63/311 [00:00<00:00, 2662.27it/s, Materializing param=model.layers.5.self_attn.k_norm.weight]
Loading weights: 20%|ββββββββ | 63/311 [00:00<00:00, 2652.92it/s, Materializing param=model.layers.5.self_attn.k_norm.weight]
Loading weights: 21%|ββββββββ | 64/311 [00:00<00:00, 2681.94it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 21%|ββββββββ | 64/311 [00:00<00:00, 2673.45it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 21%|ββββββββ | 65/311 [00:00<00:00, 2700.56it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 21%|ββββββββ | 65/311 [00:00<00:00, 2691.66it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 21%|ββββββββ | 66/311 [00:00<00:00, 2719.77it/s, Materializing param=model.layers.5.self_attn.q_norm.weight]
Loading weights: 21%|ββββββββ | 66/311 [00:00<00:00, 2711.14it/s, Materializing param=model.layers.5.self_attn.q_norm.weight]
Loading weights: 22%|ββββββββ | 67/311 [00:00<00:00, 2737.13it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 22%|ββββββββ | 67/311 [00:00<00:00, 2728.04it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 22%|ββββββββ | 68/311 [00:00<00:00, 2755.44it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 22%|ββββββββ | 68/311 [00:00<00:00, 2746.50it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 22%|βββββββββ | 69/311 [00:00<00:00, 2771.73it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 22%|βββββββββ | 69/311 [00:00<00:00, 2762.81it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 23%|βββββββββ | 70/311 [00:00<00:00, 2789.88it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 23%|βββββββββ | 70/311 [00:00<00:00, 2781.26it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 23%|ββββββββββ | 71/311 [00:00<00:00, 2806.22it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 23%|ββββββββββ | 71/311 [00:00<00:00, 2797.33it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 23%|ββββββββββ | 72/311 [00:00<00:00, 2823.68it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 23%|ββββββββββ | 72/311 [00:00<00:00, 2815.26it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 23%|βββββββ | 73/311 [00:00<00:00, 2839.72it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 23%|βββββββ | 73/311 [00:00<00:00, 2830.14it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 24%|βββββββββ | 74/311 [00:00<00:00, 2855.60it/s, Materializing param=model.layers.6.self_attn.k_norm.weight]
Loading weights: 24%|βββββββββ | 74/311 [00:00<00:00, 2847.07it/s, Materializing param=model.layers.6.self_attn.k_norm.weight]
Loading weights: 24%|βββββββββ | 75/311 [00:00<00:00, 2862.69it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 24%|βββββββββ | 75/311 [00:00<00:00, 2854.12it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 24%|βββββββββ | 76/311 [00:00<00:00, 2879.32it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 24%|βββββββββ | 76/311 [00:00<00:00, 2870.90it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 25%|ββββββββββ | 77/311 [00:00<00:00, 2894.13it/s, Materializing param=model.layers.6.self_attn.q_norm.weight]
Loading weights: 25%|ββββββββββ | 77/311 [00:00<00:00, 2885.49it/s, Materializing param=model.layers.6.self_attn.q_norm.weight]
Loading weights: 25%|ββββββββββ | 78/311 [00:00<00:00, 2870.92it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 25%|ββββββββββ | 78/311 [00:00<00:00, 2862.03it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 25%|ββββββββββ | 79/311 [00:00<00:00, 2886.05it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 25%|ββββββββββ | 79/311 [00:00<00:00, 2877.90it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 26%|ββββββββββ | 80/311 [00:00<00:00, 2900.10it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 26%|ββββββββββ | 80/311 [00:00<00:00, 2891.80it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 26%|βββββββββββ | 81/311 [00:00<00:00, 2915.79it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 26%|βββββββββββ | 81/311 [00:00<00:00, 2907.75it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 26%|βββββββββββ | 82/311 [00:00<00:00, 2917.26it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 26%|βββββββββββ | 82/311 [00:00<00:00, 2908.84it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 27%|ββββββββββββ | 83/311 [00:00<00:00, 2931.89it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 27%|ββββββββββββ | 83/311 [00:00<00:00, 2923.99it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 27%|ββββββββ | 84/311 [00:00<00:00, 2945.44it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 27%|ββββββββ | 84/311 [00:00<00:00, 2937.19it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 27%|ββββββββββ | 85/311 [00:00<00:00, 2960.09it/s, Materializing param=model.layers.7.self_attn.k_norm.weight]
Loading weights: 27%|ββββββββββ | 85/311 [00:00<00:00, 2952.00it/s, Materializing param=model.layers.7.self_attn.k_norm.weight]
Loading weights: 28%|βββββββββββ | 86/311 [00:00<00:00, 2936.30it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 28%|βββββββββββ | 86/311 [00:00<00:00, 2928.03it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 28%|βββββββββββ | 87/311 [00:00<00:00, 2950.13it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 28%|βββββββββββ | 87/311 [00:00<00:00, 2942.45it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 28%|βββββββββββ | 88/311 [00:00<00:00, 2930.26it/s, Materializing param=model.layers.7.self_attn.q_norm.weight]
Loading weights: 28%|βββββββββββ | 88/311 [00:00<00:00, 2922.10it/s, Materializing param=model.layers.7.self_attn.q_norm.weight]
Loading weights: 29%|βββββββββββ | 89/311 [00:00<00:00, 2943.93it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 29%|βββββββββββ | 89/311 [00:00<00:00, 2936.49it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 29%|βββββββββββ | 90/311 [00:00<00:00, 2956.51it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 29%|βββββββββββ | 90/311 [00:00<00:00, 2948.77it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 29%|βββββββββββ | 91/311 [00:00<00:00, 2970.03it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 29%|βββββββββββ | 91/311 [00:00<00:00, 2962.70it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 30%|ββββββββββββ | 92/311 [00:00<00:00, 2982.41it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 30%|ββββββββββββ | 92/311 [00:00<00:00, 2974.78it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 30%|ββββββββββββ | 93/311 [00:00<00:00, 2995.63it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 30%|ββββββββββββ | 93/311 [00:00<00:00, 2988.04it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 30%|βββββββββββββ | 94/311 [00:00<00:00, 3007.24it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 30%|βββββββββββββ | 94/311 [00:00<00:00, 2999.74it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 31%|βββββββββ | 95/311 [00:00<00:00, 3020.53it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 31%|βββββββββ | 95/311 [00:00<00:00, 3013.06it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 31%|ββββββββββββ | 96/311 [00:00<00:00, 3024.99it/s, Materializing param=model.layers.8.self_attn.k_norm.weight]
Loading weights: 31%|ββββββββββββ | 96/311 [00:00<00:00, 3017.35it/s, Materializing param=model.layers.8.self_attn.k_norm.weight]
Loading weights: 31%|ββββββββββββ | 97/311 [00:00<00:00, 3001.26it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 31%|ββββββββββββ | 97/311 [00:00<00:00, 2993.86it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 32%|ββββββββββββ | 98/311 [00:00<00:00, 3013.86it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 32%|ββββββββββββ | 98/311 [00:00<00:00, 3006.69it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 32%|ββββββββββββ | 99/311 [00:00<00:00, 3010.75it/s, Materializing param=model.layers.8.self_attn.q_norm.weight]
Loading weights: 32%|ββββββββββββ | 99/311 [00:00<00:00, 3003.39it/s, Materializing param=model.layers.8.self_attn.q_norm.weight]
Loading weights: 32%|ββββββββββββ | 100/311 [00:00<00:00, 3023.10it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 32%|ββββββββββββ | 100/311 [00:00<00:00, 3016.14it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 32%|ββββββββββββ | 101/311 [00:00<00:00, 3034.30it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 32%|ββββββββββββ | 101/311 [00:00<00:00, 3027.17it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 33%|βββββββββββββ | 102/311 [00:00<00:00, 3033.12it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 33%|βββββββββββββ | 102/311 [00:00<00:00, 3025.74it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 33%|βββββββββββββ | 103/311 [00:00<00:00, 2960.19it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 33%|βββββββββββββ | 103/311 [00:00<00:00, 2953.27it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 33%|βββββββββββββ | 104/311 [00:00<00:00, 2971.99it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 33%|βββββββββββββ | 104/311 [00:00<00:00, 2965.40it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 34%|ββββββββββββββ | 105/311 [00:00<00:00, 2982.64it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 34%|ββββββββββββββ | 105/311 [00:00<00:00, 2976.15it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 34%|ββββββββββ | 106/311 [00:00<00:00, 2994.74it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 34%|ββββββββββ | 106/311 [00:00<00:00, 2988.28it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 34%|βββββββββββββ | 107/311 [00:00<00:00, 2969.29it/s, Materializing param=model.layers.9.self_attn.k_norm.weight]
Loading weights: 34%|βββββββββββββ | 107/311 [00:00<00:00, 2962.61it/s, Materializing param=model.layers.9.self_attn.k_norm.weight]
Loading weights: 35%|βββββββββββββ | 108/311 [00:00<00:00, 2980.46it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 35%|βββββββββββββ | 108/311 [00:00<00:00, 2973.96it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 35%|βββββββββββββ | 109/311 [00:00<00:00, 2971.94it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 35%|βββββββββββββ | 109/311 [00:00<00:00, 2965.23it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 35%|βββββββββββββ | 110/311 [00:00<00:00, 2982.90it/s, Materializing param=model.layers.9.self_attn.q_norm.weight]
Loading weights: 35%|βββββββββββββ | 110/311 [00:00<00:00, 2976.68it/s, Materializing param=model.layers.9.self_attn.q_norm.weight]
Loading weights: 36%|βββββββββββββ | 111/311 [00:00<00:00, 2994.43it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 36%|βββββββββββββ | 111/311 [00:00<00:00, 2988.11it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 36%|βββββββββββββ | 112/311 [00:00<00:00, 3002.77it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 36%|βββββββββββββ | 112/311 [00:00<00:00, 2996.24it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 36%|βββββββββββββ | 113/311 [00:00<00:00, 3012.08it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 36%|βββββββββββββ | 113/311 [00:00<00:00, 3005.72it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 37%|ββββββββββββββ | 114/311 [00:00<00:00, 3022.96it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 37%|ββββββββββββββ | 114/311 [00:00<00:00, 3016.53it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 37%|ββββββββββββββ | 115/311 [00:00<00:00, 3032.19it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 37%|ββββββββββββββ | 115/311 [00:00<00:00, 3025.78it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 37%|βββββββββββββββ | 116/311 [00:00<00:00, 3042.75it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 37%|βββββββββββββββ | 116/311 [00:00<00:00, 3036.69it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 38%|βββββββββββ | 117/311 [00:00<00:00, 3052.26it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 38%|βββββββββββ | 117/311 [00:00<00:00, 3045.78it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 38%|ββββββββββββββ | 118/311 [00:00<00:00, 3062.54it/s, Materializing param=model.layers.10.self_attn.k_norm.weight]
Loading weights: 38%|ββββββββββββββ | 118/311 [00:00<00:00, 3056.37it/s, Materializing param=model.layers.10.self_attn.k_norm.weight]
Loading weights: 38%|ββββββββββββββ | 119/311 [00:00<00:00, 3071.58it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 38%|ββββββββββββββ | 119/311 [00:00<00:00, 3065.28it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 39%|ββββββββββββββ | 120/311 [00:00<00:00, 3075.47it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 39%|ββββββββββββββ | 120/311 [00:00<00:00, 3069.25it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 39%|ββββββββββββββ | 121/311 [00:00<00:00, 3085.53it/s, Materializing param=model.layers.10.self_attn.q_norm.weight]
Loading weights: 39%|ββββββββββββββ | 121/311 [00:00<00:00, 3079.31it/s, Materializing param=model.layers.10.self_attn.q_norm.weight]
Loading weights: 39%|ββββββββββββββ | 122/311 [00:00<00:00, 3070.04it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 39%|ββββββββββββββ | 122/311 [00:00<00:00, 3063.66it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 40%|ββββββββββββββ | 123/311 [00:00<00:00, 3078.12it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 40%|ββββββββββββββ | 123/311 [00:00<00:00, 3071.89it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 40%|βββββββββββββββ | 124/311 [00:00<00:00, 3087.86it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 40%|βββββββββββββββ | 124/311 [00:00<00:00, 3081.87it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 40%|ββββββββββββββββ | 125/311 [00:00<00:00, 3096.47it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 40%|ββββββββββββββββ | 125/311 [00:00<00:00, 3090.43it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 41%|ββββββββββββββββ | 126/311 [00:00<00:00, 3105.94it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 41%|ββββββββββββββββ | 126/311 [00:00<00:00, 3100.09it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 41%|βββββββββββββββββ | 127/311 [00:00<00:00, 3114.23it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 41%|βββββββββββββββββ | 127/311 [00:00<00:00, 3108.16it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 41%|βββββββββββ | 128/311 [00:00<00:00, 3064.91it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 41%|βββββββββββ | 128/311 [00:00<00:00, 3058.71it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 41%|βββββββββββββββ | 129/311 [00:00<00:00, 3073.82it/s, Materializing param=model.layers.11.self_attn.k_norm.weight]
Loading weights: 41%|βββββββββββββββ | 129/311 [00:00<00:00, 3068.03it/s, Materializing param=model.layers.11.self_attn.k_norm.weight]
Loading weights: 42%|βββββββββββββββ | 130/311 [00:00<00:00, 3083.16it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 42%|βββββββββββββββ | 130/311 [00:00<00:00, 3077.47it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 42%|βββββββββββββββ | 131/311 [00:00<00:00, 3091.09it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 42%|βββββββββββββββ | 131/311 [00:00<00:00, 3085.26it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 42%|βββββββββββββββ | 132/311 [00:00<00:00, 3100.16it/s, Materializing param=model.layers.11.self_attn.q_norm.weight]
Loading weights: 42%|βββββββββββββββ | 132/311 [00:00<00:00, 3094.53it/s, Materializing param=model.layers.11.self_attn.q_norm.weight]
Loading weights: 43%|βββββββββββββββ | 133/311 [00:00<00:00, 3108.12it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 43%|βββββββββββββββ | 133/311 [00:00<00:00, 3102.47it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 43%|βββββββββββββββ | 134/311 [00:00<00:00, 3117.35it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 43%|βββββββββββββββ | 134/311 [00:00<00:00, 3111.91it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 43%|ββββββββββββββββ | 135/311 [00:00<00:00, 3125.52it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 43%|ββββββββββββββββ | 135/311 [00:00<00:00, 3119.85it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 44%|βββββββββββββββββ | 136/311 [00:00<00:00, 3134.67it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 44%|βββββββββββββββββ | 136/311 [00:00<00:00, 3129.13it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 44%|βββββββββββββββββ | 137/311 [00:00<00:00, 3142.40it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 44%|βββββββββββββββββ | 137/311 [00:00<00:00, 3136.78it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 44%|ββββββββββββββββββ | 138/311 [00:00<00:00, 3151.29it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 44%|ββββββββββββββββββ | 138/311 [00:00<00:00, 3145.76it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 45%|ββββββββββββ | 139/311 [00:00<00:00, 3158.87it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 45%|ββββββββββββ | 139/311 [00:00<00:00, 3153.13it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 45%|ββββββββββββββββ | 140/311 [00:00<00:00, 3167.49it/s, Materializing param=model.layers.12.self_attn.k_norm.weight]
Loading weights: 45%|ββββββββββββββββ | 140/311 [00:00<00:00, 3162.09it/s, Materializing param=model.layers.12.self_attn.k_norm.weight]
Loading weights: 45%|ββββββββββββββββ | 141/311 [00:00<00:00, 3175.13it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 45%|ββββββββββββββββ | 141/311 [00:00<00:00, 3169.33it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 46%|ββββββββββββββββ | 142/311 [00:00<00:00, 3183.12it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 46%|ββββββββββββββββ | 142/311 [00:00<00:00, 3177.55it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 46%|ββββββββββββββββ | 143/311 [00:00<00:00, 3190.25it/s, Materializing param=model.layers.12.self_attn.q_norm.weight]
Loading weights: 46%|ββββββββββββββββ | 143/311 [00:00<00:00, 3184.52it/s, Materializing param=model.layers.12.self_attn.q_norm.weight]
Loading weights: 46%|βββββββββββββββββ | 144/311 [00:00<00:00, 3198.35it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 46%|βββββββββββββββββ | 144/311 [00:00<00:00, 3192.82it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 47%|βββββββββββββββββ | 145/311 [00:00<00:00, 3205.56it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 47%|βββββββββββββββββ | 145/311 [00:00<00:00, 3199.96it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 47%|βββββββββββββββββ | 146/311 [00:00<00:00, 3213.65it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 47%|βββββββββββββββββ | 146/311 [00:00<00:00, 3208.22it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 47%|ββββββββββββββββββ | 147/311 [00:00<00:00, 3180.37it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 47%|ββββββββββββββββββ | 147/311 [00:00<00:00, 3174.44it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 48%|ββββββββββββββββββ | 148/311 [00:00<00:00, 3187.90it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 48%|ββββββββββββββββββ | 148/311 [00:00<00:00, 3182.67it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 48%|ββββββββββββββββββββ | 149/311 [00:00<00:00, 3196.35it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 48%|ββββββββββββββββββββ | 149/311 [00:00<00:00, 3191.20it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 48%|βββββββββββββ | 150/311 [00:00<00:00, 3203.70it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 48%|βββββββββββββ | 150/311 [00:00<00:00, 3198.26it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 49%|βββββββββββββββββ | 151/311 [00:00<00:00, 3211.71it/s, Materializing param=model.layers.13.self_attn.k_norm.weight]
Loading weights: 49%|βββββββββββββββββ | 151/311 [00:00<00:00, 3206.51it/s, Materializing param=model.layers.13.self_attn.k_norm.weight]
Loading weights: 49%|βββββββββββββββββ | 152/311 [00:00<00:00, 3218.67it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 49%|βββββββββββββββββ | 152/311 [00:00<00:00, 3213.39it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 49%|ββββββββββββββββββ | 153/311 [00:00<00:00, 3226.53it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 49%|ββββββββββββββββββ | 153/311 [00:00<00:00, 3221.43it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 50%|ββββββββββββββββββ | 154/311 [00:00<00:00, 3179.94it/s, Materializing param=model.layers.13.self_attn.q_norm.weight]
Loading weights: 50%|ββββββββββββββββββ | 154/311 [00:00<00:00, 3174.54it/s, Materializing param=model.layers.13.self_attn.q_norm.weight]
Loading weights: 50%|ββββββββββββββββββ | 155/311 [00:00<00:00, 3187.46it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 50%|ββββββββββββββββββ | 155/311 [00:00<00:00, 3182.46it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 50%|ββββββββββββββββββ | 156/311 [00:00<00:00, 3194.43it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 50%|ββββββββββββββββββ | 156/311 [00:00<00:00, 3189.27it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 50%|βββββββββββββββββββ | 157/311 [00:00<00:00, 3202.09it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 50%|βββββββββββββββββββ | 157/311 [00:00<00:00, 3197.06it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 51%|ββββββββββββββββββββ | 158/311 [00:00<00:00, 3208.83it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 51%|ββββββββββββββββββββ | 158/311 [00:00<00:00, 3203.74it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 51%|ββββββββββββββββββββ | 159/311 [00:00<00:00, 3216.44it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 51%|ββββββββββββββββββββ | 159/311 [00:00<00:00, 3211.52it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 51%|βββββββββββββββββββββ | 160/311 [00:00<00:00, 3223.13it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 51%|βββββββββββββββββββββ | 160/311 [00:00<00:00, 3218.03it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 52%|ββββββββββββββ | 161/311 [00:00<00:00, 3230.46it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 52%|ββββββββββββββ | 161/311 [00:00<00:00, 3225.60it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 52%|βββββββββββββββββββ | 162/311 [00:00<00:00, 3237.16it/s, Materializing param=model.layers.14.self_attn.k_norm.weight]
Loading weights: 52%|βββββββββββββββββββ | 162/311 [00:00<00:00, 3232.07it/s, Materializing param=model.layers.14.self_attn.k_norm.weight]
Loading weights: 52%|βββββββββββββββββββ | 163/311 [00:00<00:00, 3230.32it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 52%|βββββββββββββββββββ | 163/311 [00:00<00:00, 3224.96it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 53%|βββββββββββββββββββ | 164/311 [00:00<00:00, 3237.15it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 53%|βββββββββββββββββββ | 164/311 [00:00<00:00, 3232.21it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 53%|βββββββββββββββββββ | 165/311 [00:00<00:00, 3238.86it/s, Materializing param=model.layers.14.self_attn.q_norm.weight]
Loading weights: 53%|βββββββββββββββββββ | 165/311 [00:00<00:00, 3233.70it/s, Materializing param=model.layers.14.self_attn.q_norm.weight]
Loading weights: 53%|βββββββββββββββββββ | 166/311 [00:00<00:00, 3245.91it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 53%|βββββββββββββββββββ | 166/311 [00:00<00:00, 3240.96it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 54%|βββββββββββββββββββ | 167/311 [00:00<00:00, 3252.12it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 54%|βββββββββββββββββββ | 167/311 [00:00<00:00, 3247.27it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 54%|ββββββββββββββββββββ | 168/311 [00:00<00:00, 3259.49it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 54%|ββββββββββββββββββββ | 168/311 [00:00<00:00, 3254.66it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 54%|βββββββββββββββββββββ | 169/311 [00:00<00:00, 3265.76it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 54%|βββββββββββββββββββββ | 169/311 [00:00<00:00, 3260.76it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 55%|βββββββββββββββββββββ | 170/311 [00:00<00:00, 3260.21it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 55%|βββββββββββββββββββββ | 170/311 [00:00<00:00, 3255.14it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββ | 171/311 [00:00<00:00, 3247.04it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββ | 171/311 [00:00<00:00, 3241.99it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 55%|βββββββββββββββ | 172/311 [00:00<00:00, 3253.63it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 55%|βββββββββββββββ | 172/311 [00:00<00:00, 3248.78it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 56%|ββββββββββββββββββββ | 173/311 [00:00<00:00, 3259.49it/s, Materializing param=model.layers.15.self_attn.k_norm.weight]
Loading weights: 56%|ββββββββββββββββββββ | 173/311 [00:00<00:00, 3254.56it/s, Materializing param=model.layers.15.self_attn.k_norm.weight]
Loading weights: 56%|ββββββββββββββββββββ | 174/311 [00:00<00:00, 3265.92it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 56%|ββββββββββββββββββββ | 174/311 [00:00<00:00, 3261.18it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 56%|ββββββββββββββββββββ | 175/311 [00:00<00:00, 3220.67it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 56%|ββββββββββββββββββββ | 175/311 [00:00<00:00, 3215.56it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 57%|ββββββββββββββββββββ | 176/311 [00:00<00:00, 3226.78it/s, Materializing param=model.layers.15.self_attn.q_norm.weight]
Loading weights: 57%|ββββββββββββββββββββ | 176/311 [00:00<00:00, 3222.33it/s, Materializing param=model.layers.15.self_attn.q_norm.weight]
Loading weights: 57%|ββββββββββββββββββββ | 177/311 [00:00<00:00, 3227.92it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 57%|ββββββββββββββββββββ | 177/311 [00:00<00:00, 3223.24it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 57%|ββββββββββββββββββββ | 178/311 [00:00<00:00, 3234.51it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 57%|ββββββββββββββββββββ | 178/311 [00:00<00:00, 3230.04it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 58%|βββββββββββββββββββββ | 179/311 [00:00<00:00, 3228.76it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 58%|βββββββββββββββββββββ | 179/311 [00:00<00:00, 3224.06it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 58%|ββββββββββββββββββββββ | 180/311 [00:00<00:00, 3235.22it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββ | 180/311 [00:00<00:00, 3230.72it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββ | 181/311 [00:00<00:00, 3241.92it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββ | 181/311 [00:00<00:00, 3237.45it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 59%|ββββββββββββββββββββββββ | 182/311 [00:00<00:00, 3247.61it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 59%|ββββββββββββββββββββββββ | 182/311 [00:00<00:00, 3242.96it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 59%|ββββββββββββββββ | 183/311 [00:00<00:00, 3254.05it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 59%|ββββββββββββββββ | 183/311 [00:00<00:00, 3249.42it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 59%|βββββββββββββββββββββ | 184/311 [00:00<00:00, 3259.27it/s, Materializing param=model.layers.16.self_attn.k_norm.weight]
Loading weights: 59%|βββββββββββββββββββββ | 184/311 [00:00<00:00, 3249.18it/s, Materializing param=model.layers.16.self_attn.k_norm.weight]
Loading weights: 59%|βββββββββββββββββββββ | 185/311 [00:00<00:00, 3259.31it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 59%|βββββββββββββββββββββ | 185/311 [00:00<00:00, 3254.79it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 60%|βββββββββββββββββββββ | 186/311 [00:00<00:00, 3264.53it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 60%|βββββββββββββββββββββ | 186/311 [00:00<00:00, 3259.99it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 60%|βββββββββββββββββββββ | 187/311 [00:00<00:00, 3270.77it/s, Materializing param=model.layers.16.self_attn.q_norm.weight]
Loading weights: 60%|βββββββββββββββββββββ | 187/311 [00:00<00:00, 3266.35it/s, Materializing param=model.layers.16.self_attn.q_norm.weight]
Loading weights: 60%|ββββββββββββββββββββββ | 188/311 [00:00<00:00, 3276.13it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββ | 188/311 [00:00<00:00, 3271.58it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββ | 189/311 [00:00<00:00, 3282.15it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββ | 189/311 [00:00<00:00, 3277.61it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββ | 190/311 [00:00<00:00, 3286.94it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 61%|ββββββββββββββββββββββ | 190/311 [00:00<00:00, 3282.37it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 61%|ββββββββββββββββββββββββ | 191/311 [00:00<00:00, 3293.03it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββ | 191/311 [00:00<00:00, 3288.62it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββ | 192/311 [00:00<00:00, 3298.11it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββ | 192/311 [00:00<00:00, 3293.70it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 62%|βββββββββββββββββββββββββ | 193/311 [00:00<00:00, 3290.64it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 62%|βββββββββββββββββββββββββ | 193/311 [00:00<00:00, 3285.96it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 62%|βββββββββββββββββ | 194/311 [00:00<00:00, 3296.31it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 62%|βββββββββββββββββ | 194/311 [00:00<00:00, 3291.90it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 63%|ββββββββββββββββββββββ | 195/311 [00:00<00:00, 3301.34it/s, Materializing param=model.layers.17.self_attn.k_norm.weight]
Loading weights: 63%|ββββββββββββββββββββββ | 195/311 [00:00<00:00, 3296.86it/s, Materializing param=model.layers.17.self_attn.k_norm.weight]
Loading weights: 63%|ββββββββββββββββββββββ | 196/311 [00:00<00:00, 3307.09it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββ | 196/311 [00:00<00:00, 3302.81it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 63%|βββββββββββββββββββββββ | 197/311 [00:00<00:00, 3279.27it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 63%|βββββββββββββββββββββββ | 197/311 [00:00<00:00, 3274.62it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββ | 198/311 [00:00<00:00, 3284.59it/s, Materializing param=model.layers.17.self_attn.q_norm.weight]
Loading weights: 64%|βββββββββββββββββββββββ | 198/311 [00:00<00:00, 3280.22it/s, Materializing param=model.layers.17.self_attn.q_norm.weight]
Loading weights: 64%|βββββββββββββββββββββββ | 199/311 [00:00<00:00, 3289.31it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββ | 199/311 [00:00<00:00, 3285.00it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββ | 200/311 [00:00<00:00, 3294.96it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββ | 200/311 [00:00<00:00, 3290.75it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 65%|ββββββββββββββββββββββββ | 201/311 [00:00<00:00, 3300.02it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 65%|ββββββββββββββββββββββββ | 201/311 [00:00<00:00, 3295.73it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 65%|βββββββββββββββββββββββββ | 202/311 [00:00<00:00, 3293.17it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββ | 202/311 [00:00<00:00, 3288.68it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββ | 203/311 [00:00<00:00, 3298.39it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββ | 203/311 [00:00<00:00, 3294.24it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββ | 204/311 [00:00<00:00, 3303.44it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββ | 204/311 [00:00<00:00, 3299.24it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 66%|ββββββββββββββββββ | 205/311 [00:00<00:00, 3309.02it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 66%|ββββββββββββββββββ | 205/311 [00:00<00:00, 3304.98it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 66%|ββββββββββββββββββββββββ | 206/311 [00:00<00:00, 3314.28it/s, Materializing param=model.layers.18.self_attn.k_norm.weight]
Loading weights: 66%|ββββββββββββββββββββββββ | 206/311 [00:00<00:00, 3310.12it/s, Materializing param=model.layers.18.self_attn.k_norm.weight]
Loading weights: 67%|ββββββββββββββββββββββββ | 207/311 [00:00<00:00, 3319.86it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββ | 207/311 [00:00<00:00, 3315.78it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββ | 208/311 [00:00<00:00, 3309.44it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββ | 208/311 [00:00<00:00, 3305.10it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββ | 209/311 [00:00<00:00, 3281.01it/s, Materializing param=model.layers.18.self_attn.q_norm.weight]
Loading weights: 67%|ββββββββββββββββββββββββ | 209/311 [00:00<00:00, 3276.63it/s, Materializing param=model.layers.18.self_attn.q_norm.weight]
Loading weights: 68%|ββββββββββββββββββββββββ | 210/311 [00:00<00:00, 3286.12it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 68%|ββββββββββββββββββββββββ | 210/311 [00:00<00:00, 3281.93it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 68%|ββββββββββββββββββββββββ | 211/311 [00:00<00:00, 3290.62it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 68%|ββββββββββββββββββββββββ | 211/311 [00:00<00:00, 3286.47it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββ | 212/311 [00:00<00:00, 3295.98it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 68%|βββββββββββββββββββββββββ | 212/311 [00:00<00:00, 3291.94it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 68%|ββββββββββββββββββββββββββ | 213/311 [00:00<00:00, 3300.60it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 68%|ββββββββββββββββββββββββββ | 213/311 [00:00<00:00, 3296.47it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 69%|βββββββββββββββββββββββββββ | 214/311 [00:00<00:00, 3305.85it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 69%|βββββββββββββββββββββββββββ | 214/311 [00:00<00:00, 3301.92it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββ | 215/311 [00:00<00:00, 3310.36it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββ | 215/311 [00:00<00:00, 3306.33it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 69%|βββββββββββββββββββ | 216/311 [00:00<00:00, 3315.67it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 69%|βββββββββββββββββββ | 216/311 [00:00<00:00, 3311.64it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 70%|βββββββββββββββββββββββββ | 217/311 [00:00<00:00, 3320.12it/s, Materializing param=model.layers.19.self_attn.k_norm.weight]
Loading weights: 70%|βββββββββββββββββββββββββ | 217/311 [00:00<00:00, 3315.92it/s, Materializing param=model.layers.19.self_attn.k_norm.weight]
Loading weights: 70%|βββββββββββββββββββββββββ | 218/311 [00:00<00:00, 3324.99it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 70%|βββββββββββββββββββββββββ | 218/311 [00:00<00:00, 3320.93it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 70%|βββββββββββββββββββββββββ | 219/311 [00:00<00:00, 3325.55it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 70%|βββββββββββββββββββββββββ | 219/311 [00:00<00:00, 3321.40it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββ | 220/311 [00:00<00:00, 3329.83it/s, Materializing param=model.layers.19.self_attn.q_norm.weight]
Loading weights: 71%|βββββββββββββββββββββββββ | 220/311 [00:00<00:00, 3325.90it/s, Materializing param=model.layers.19.self_attn.q_norm.weight]
Loading weights: 71%|βββββββββββββββββββββββββ | 221/311 [00:00<00:00, 3303.59it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββ | 221/311 [00:00<00:00, 3299.46it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββ | 222/311 [00:00<00:00, 3293.87it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββ | 222/311 [00:00<00:00, 3289.92it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββ | 223/311 [00:00<00:00, 3299.08it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 72%|ββββββββββββββββββββββββββ | 223/311 [00:00<00:00, 3295.36it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββ | 224/311 [00:00<00:00, 3303.76it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββ | 224/311 [00:00<00:00, 3299.93it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββ | 225/311 [00:00<00:00, 3308.15it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββ | 225/311 [00:00<00:00, 3304.35it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 73%|βββββββββββββββββββββββββββββ | 226/311 [00:00<00:00, 3312.56it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 73%|βββββββββββββββββββββββββββββ | 226/311 [00:00<00:00, 3308.83it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 73%|ββββββββββββββββββββ | 227/311 [00:00<00:00, 3317.10it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 73%|ββββββββββββββββββββ | 227/311 [00:00<00:00, 3313.26it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 73%|ββββββββββββββββββββββββββ | 228/311 [00:00<00:00, 3321.40it/s, Materializing param=model.layers.20.self_attn.k_norm.weight]
Loading weights: 73%|ββββββββββββββββββββββββββ | 228/311 [00:00<00:00, 3317.66it/s, Materializing param=model.layers.20.self_attn.k_norm.weight]
Loading weights: 74%|ββββββββββββββββββββββββββ | 229/311 [00:00<00:00, 3326.53it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββ | 229/311 [00:00<00:00, 3322.81it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββ | 230/311 [00:00<00:00, 3330.84it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββ | 230/311 [00:00<00:00, 3327.06it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββ | 231/311 [00:00<00:00, 3335.85it/s, Materializing param=model.layers.20.self_attn.q_norm.weight]
Loading weights: 74%|ββββββββββββββββββββββββββ | 231/311 [00:00<00:00, 3332.13it/s, Materializing param=model.layers.20.self_attn.q_norm.weight]
Loading weights: 75%|ββββββββββββββββββββββββββ | 232/311 [00:00<00:00, 3333.28it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 75%|ββββββββββββββββββββββββββ | 232/311 [00:00<00:00, 3329.43it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββ | 233/311 [00:00<00:00, 3338.00it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββ | 233/311 [00:00<00:00, 3334.38it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββ | 234/311 [00:00<00:00, 3342.36it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 75%|βββββββββββββββββββββββββββ | 234/311 [00:00<00:00, 3338.65it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 76%|βββββββββββββββββββββββββββββ | 235/311 [00:00<00:00, 3338.06it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 76%|βββββββββββββββββββββββββββββ | 235/311 [00:00<00:00, 3334.17it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 76%|βββββββββββββββββββββββββββββ | 236/311 [00:00<00:00, 3342.77it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 76%|βββββββββββββββββββββββββββββ | 236/311 [00:00<00:00, 3339.15it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 76%|βββββββββββββββββββββββββββββββ | 237/311 [00:00<00:00, 3331.50it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 76%|βββββββββββββββββββββββββββββββ | 237/311 [00:00<00:00, 3327.73it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 77%|βββββββββββββββββββββ | 238/311 [00:00<00:00, 3336.36it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 77%|βββββββββββββββββββββ | 238/311 [00:00<00:00, 3332.69it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 77%|βββββββββββββββββββββββββββ | 239/311 [00:00<00:00, 3340.57it/s, Materializing param=model.layers.21.self_attn.k_norm.weight]
Loading weights: 77%|βββββββββββββββββββββββββββ | 239/311 [00:00<00:00, 3336.90it/s, Materializing param=model.layers.21.self_attn.k_norm.weight]
Loading weights: 77%|βββββββββββββββββββββββββββ | 240/311 [00:00<00:00, 3338.35it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββ | 240/311 [00:00<00:00, 3334.58it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββ | 241/311 [00:00<00:00, 3342.12it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββ | 241/311 [00:00<00:00, 3338.43it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββ | 242/311 [00:00<00:00, 3346.03it/s, Materializing param=model.layers.21.self_attn.q_norm.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββ | 242/311 [00:00<00:00, 3342.43it/s, Materializing param=model.layers.21.self_attn.q_norm.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββ | 243/311 [00:00<00:00, 3350.88it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββ | 243/311 [00:00<00:00, 3347.40it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββ | 244/311 [00:00<00:00, 3316.82it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββ | 244/311 [00:00<00:00, 3313.03it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββ | 245/311 [00:00<00:00, 3321.09it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββ | 245/311 [00:00<00:00, 3317.56it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 79%|ββββββββββββββββββββββββββββββ | 246/311 [00:00<00:00, 3325.17it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 79%|ββββββββββββββββββββββββββββββ | 246/311 [00:00<00:00, 3321.69it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββ | 247/311 [00:00<00:00, 3329.93it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββ | 247/311 [00:00<00:00, 3326.53it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββ | 248/311 [00:00<00:00, 3334.28it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββ | 248/311 [00:00<00:00, 3330.69it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββ | 249/311 [00:00<00:00, 3338.87it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 80%|ββββββββββββββββββββββ | 249/311 [00:00<00:00, 3335.32it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββ | 250/311 [00:00<00:00, 3343.01it/s, Materializing param=model.layers.22.self_attn.k_norm.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββ | 250/311 [00:00<00:00, 3339.52it/s, Materializing param=model.layers.22.self_attn.k_norm.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββ | 251/311 [00:00<00:00, 3347.53it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββ | 251/311 [00:00<00:00, 3344.13it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββ | 252/311 [00:00<00:00, 3351.78it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββ | 252/311 [00:00<00:00, 3348.24it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββ | 253/311 [00:00<00:00, 3356.29it/s, Materializing param=model.layers.22.self_attn.q_norm.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββ | 253/311 [00:00<00:00, 3352.82it/s, Materializing param=model.layers.22.self_attn.q_norm.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββ | 254/311 [00:00<00:00, 3360.33it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββ | 254/311 [00:00<00:00, 3356.52it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββ | 255/311 [00:00<00:00, 3355.32it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββ | 255/311 [00:00<00:00, 3351.78it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββ | 256/311 [00:00<00:00, 3336.59it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββ | 256/311 [00:00<00:00, 3332.98it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββ | 257/311 [00:00<00:00, 3340.84it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββ | 257/311 [00:00<00:00, 3337.48it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββ | 258/311 [00:00<00:00, 3322.84it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββ | 258/311 [00:00<00:00, 3319.35it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββ | 259/311 [00:00<00:00, 3327.03it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββ | 259/311 [00:00<00:00, 3323.86it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 84%|βββββββββββββββββββββββ | 260/311 [00:00<00:00, 3331.12it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 84%|βββββββββββββββββββββββ | 260/311 [00:00<00:00, 3327.74it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββ | 261/311 [00:00<00:00, 3335.63it/s, Materializing param=model.layers.23.self_attn.k_norm.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββ | 261/311 [00:00<00:00, 3332.27it/s, Materializing param=model.layers.23.self_attn.k_norm.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββ | 262/311 [00:00<00:00, 3339.32it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββ | 262/311 [00:00<00:00, 3336.05it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββ | 263/311 [00:00<00:00, 3343.92it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββ | 263/311 [00:00<00:00, 3340.65it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββ | 264/311 [00:00<00:00, 3347.58it/s, Materializing param=model.layers.23.self_attn.q_norm.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββ | 264/311 [00:00<00:00, 3344.18it/s, Materializing param=model.layers.23.self_attn.q_norm.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββ | 265/311 [00:00<00:00, 3351.88it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββ | 265/311 [00:00<00:00, 3348.61it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββ | 266/311 [00:00<00:00, 3355.49it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββ | 266/311 [00:00<00:00, 3352.14it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββ | 267/311 [00:00<00:00, 3359.84it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββ | 267/311 [00:00<00:00, 3356.57it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββ | 268/311 [00:00<00:00, 3363.50it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββ | 268/311 [00:00<00:00, 3360.17it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββ | 269/311 [00:00<00:00, 3332.47it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββ | 269/311 [00:00<00:00, 3329.07it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 87%|βββββββββββββββββββββββββββββββββββ | 270/311 [00:00<00:00, 3336.62it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 87%|βββββββββββββββββββββββββββββββββββ | 270/311 [00:00<00:00, 3333.48it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββ | 271/311 [00:00<00:00, 3340.26it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 87%|ββββββββββββββββββββββββ | 271/311 [00:00<00:00, 3336.87it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 87%|βββββββββββββββββββββββββββββββ | 272/311 [00:00<00:00, 3344.32it/s, Materializing param=model.layers.24.self_attn.k_norm.weight]
Loading weights: 87%|βββββββββββββββββββββββββββββββ | 272/311 [00:00<00:00, 3341.14it/s, Materializing param=model.layers.24.self_attn.k_norm.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββ | 273/311 [00:00<00:00, 3347.89it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββ | 273/311 [00:00<00:00, 3344.42it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββ | 274/311 [00:00<00:00, 3351.72it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββ | 274/311 [00:00<00:00, 3348.46it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββ | 275/311 [00:00<00:00, 3355.19it/s, Materializing param=model.layers.24.self_attn.q_norm.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββ | 275/311 [00:00<00:00, 3351.90it/s, Materializing param=model.layers.24.self_attn.q_norm.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββ | 276/311 [00:00<00:00, 3359.30it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββ | 276/311 [00:00<00:00, 3356.20it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββ | 277/311 [00:00<00:00, 3362.99it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββ | 277/311 [00:00<00:00, 3359.71it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββ | 278/311 [00:00<00:00, 3367.02it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββ | 278/311 [00:00<00:00, 3363.89it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββ | 279/311 [00:00<00:00, 3370.58it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββ | 279/311 [00:00<00:00, 3367.28it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββ | 280/311 [00:00<00:00, 3374.52it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββ | 280/311 [00:00<00:00, 3371.35it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββ | 281/311 [00:00<00:00, 3377.93it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββ | 281/311 [00:00<00:00, 3374.75it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββ | 282/311 [00:00<00:00, 3374.35it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 91%|βββββββββββββββββββββββββ | 282/311 [00:00<00:00, 3370.98it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 91%|ββββββββββββββββββββββββββββββββ | 283/311 [00:00<00:00, 3378.23it/s, Materializing param=model.layers.25.self_attn.k_norm.weight]
Loading weights: 91%|ββββββββββββββββββββββββββββββββ | 283/311 [00:00<00:00, 3375.11it/s, Materializing param=model.layers.25.self_attn.k_norm.weight]
Loading weights: 91%|ββββββββββββββββββββββββββββββββ | 284/311 [00:00<00:00, 3381.63it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 91%|ββββββββββββββββββββββββββββββββ | 284/311 [00:00<00:00, 3378.45it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 92%|ββββββββββββββββββββββββββββββββ | 285/311 [00:00<00:00, 3361.73it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 92%|ββββββββββββββββββββββββββββββββ | 285/311 [00:00<00:00, 3358.48it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββ | 286/311 [00:00<00:00, 3365.54it/s, Materializing param=model.layers.25.self_attn.q_norm.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββ | 286/311 [00:00<00:00, 3362.51it/s, Materializing param=model.layers.25.self_attn.q_norm.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββ | 287/311 [00:00<00:00, 3368.88it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββ | 287/311 [00:00<00:00, 3365.75it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββ | 288/311 [00:00<00:00, 3372.84it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββ | 288/311 [00:00<00:00, 3369.83it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββ | 289/311 [00:00<00:00, 3376.31it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββ | 289/311 [00:00<00:00, 3373.20it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββ | 290/311 [00:00<00:00, 3380.17it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββ | 290/311 [00:00<00:00, 3377.15it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββ | 291/311 [00:00<00:00, 3383.54it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββ | 291/311 [00:00<00:00, 3380.51it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββ | 292/311 [00:00<00:00, 3387.44it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββ | 292/311 [00:00<00:00, 3384.46it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββ | 293/311 [00:00<00:00, 3369.56it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 94%|ββββββββββββββββββββββββββ | 293/311 [00:00<00:00, 3366.28it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββ | 294/311 [00:00<00:00, 3373.14it/s, Materializing param=model.layers.26.self_attn.k_norm.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββ | 294/311 [00:00<00:00, 3370.14it/s, Materializing param=model.layers.26.self_attn.k_norm.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββ | 295/311 [00:00<00:00, 3376.43it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββ | 295/311 [00:00<00:00, 3373.38it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββ | 296/311 [00:00<00:00, 3380.22it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββ | 296/311 [00:00<00:00, 3377.31it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββ | 297/311 [00:00<00:00, 3383.58it/s, Materializing param=model.layers.26.self_attn.q_norm.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββ | 297/311 [00:00<00:00, 3380.56it/s, Materializing param=model.layers.26.self_attn.q_norm.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββ | 298/311 [00:00<00:00, 3387.44it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββ | 298/311 [00:00<00:00, 3384.51it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββ | 299/311 [00:00<00:00, 3375.01it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββ | 299/311 [00:00<00:00, 3371.86it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββ | 300/311 [00:00<00:00, 3378.53it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββ | 300/311 [00:00<00:00, 3375.54it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββ | 301/311 [00:00<00:00, 3376.54it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββ | 301/311 [00:00<00:00, 3373.55it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββ | 302/311 [00:00<00:00, 3380.54it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββ | 302/311 [00:00<00:00, 3377.68it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββββ | 303/311 [00:00<00:00, 3384.74it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββββ | 303/311 [00:00<00:00, 3381.89it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββ| 304/311 [00:00<00:00, 3389.04it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββ| 304/311 [00:00<00:00, 3385.98it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββ| 305/311 [00:00<00:00, 3393.06it/s, Materializing param=model.layers.27.self_attn.k_norm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββ| 305/311 [00:00<00:00, 3390.17it/s, Materializing param=model.layers.27.self_attn.k_norm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββ| 306/311 [00:00<00:00, 3397.25it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββ| 306/311 [00:00<00:00, 3394.46it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββ| 307/311 [00:00<00:00, 3401.58it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββ| 307/311 [00:00<00:00, 3398.78it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββ| 308/311 [00:00<00:00, 3405.90it/s, Materializing param=model.layers.27.self_attn.q_norm.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββ| 308/311 [00:00<00:00, 3403.11it/s, Materializing param=model.layers.27.self_attn.q_norm.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββ| 309/311 [00:00<00:00, 3410.25it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββ| 309/311 [00:00<00:00, 3407.48it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββ| 310/311 [00:00<00:00, 3414.51it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββ| 310/311 [00:00<00:00, 3411.73it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 311/311 [00:00<00:00, 3418.79it/s, Materializing param=model.norm.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 311/311 [00:00<00:00, 3416.01it/s, Materializing param=model.norm.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 311/311 [00:00<00:00, 3410.59it/s, Materializing param=model.norm.weight]
[2026-02-05 12:27:24,916] [WARNING] [torchao.<module>:39] [PID:40655] Skipping import of cpp extensions due to incompatible torch version 2.9.1+cu128 for torchao version 0.13.0
[2026-02-05 12:27:30,719] [WARNING] [accelerate.utils.dataclasses.__post_init__:1962] [PID:40655] sharding_strategy is deprecated in favor of reshard_after_forward. This will be removed in a future version of Accelerate.
[2026-02-05 16:18:59,605] [WARNING] [py.warnings._showwarnmsg:110] [PID:40655] /root/miniconda3/envs/py3.11/lib/python3.11/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
warnings.warn(
|