| |
Loading weights: 0%| | 0/146 [00:00<?, ?it/s]
Loading weights: 1%| | 1/146 [00:00<00:00, 7423.55it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%| | 1/146 [00:00<00:00, 3728.27it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%| | 2/146 [00:00<00:00, 3985.09it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%| | 2/146 [00:00<00:00, 1457.11it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 2%| | 3/146 [00:00<00:00, 1893.59it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 2%| | 3/146 [00:00<00:00, 1740.13it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 3%|β | 4/146 [00:00<00:00, 2094.01it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 3%|β | 4/146 [00:00<00:00, 1979.85it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 3%|β | 5/146 [00:00<00:00, 2270.87it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 3%|β | 5/146 [00:00<00:00, 2033.11it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 4%| | 6/146 [00:00<00:00, 2282.41it/s, Materializing param=model.layers.0.post_attention_layernorm.
Loading weights: 4%| | 6/146 [00:00<00:00, 2201.54it/s, Materializing param=model.layers.0.post_attention_layernorm.
Loading weights: 5%| | 7/146 [00:00<00:00, 2415.68it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 5%| | 7/146 [00:00<00:00, 2336.66it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 5%| | 8/146 [00:00<00:00, 2534.90it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 5%| | 8/146 [00:00<00:00, 2457.48it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 6%| | 9/146 [00:00<00:00, 2362.99it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 6%| | 9/146 [00:00<00:00, 2304.98it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 7%| | 10/146 [00:00<00:00, 2443.09it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 7%| | 10/146 [00:00<00:00, 2386.79it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 8%|β | 11/146 [00:00<00:00, 2534.46it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 8%|β | 11/146 [00:00<00:00, 2477.17it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 8%|β | 12/146 [00:00<00:00, 2605.16it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 8%|β | 12/146 [00:00<00:00, 2554.13it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 9%|β | 13/146 [00:00<00:00, 2584.17it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 9%|β | 13/146 [00:00<00:00, 2533.38it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 10%|β | 14/146 [00:00<00:00, 2450.35it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 10%|β | 14/146 [00:00<00:00, 2408.44it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 10%| | 15/146 [00:00<00:00, 2509.16it/s, Materializing param=model.layers.1.post_attention_layernorm
Loading weights: 10%| | 15/146 [00:00<00:00, 2468.59it/s, Materializing param=model.layers.1.post_attention_layernorm
Loading weights: 11%| | 16/146 [00:00<00:00, 2538.44it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 11%| | 16/146 [00:00<00:00, 2320.90it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 12%| | 17/146 [00:00<00:00, 2356.12it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 12%| | 17/146 [00:00<00:00, 2325.61it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 12%| | 18/146 [00:00<00:00, 2360.11it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 12%| | 18/146 [00:00<00:00, 2295.24it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 13%|β| 19/146 [00:00<00:00, 2315.41it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 13%|β| 19/146 [00:00<00:00, 2248.07it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 14%|β | 20/146 [00:00<00:00, 2114.44it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 14%|β | 20/146 [00:00<00:00, 2091.61it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 14%|β | 21/146 [00:00<00:00, 2156.14it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 14%|β | 21/146 [00:00<00:00, 2133.63it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 15%|β | 22/146 [00:00<00:00, 2176.34it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 15%|β | 22/146 [00:00<00:00, 2155.95it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 16%|β | 23/146 [00:00<00:00, 2219.87it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 16%|β | 23/146 [00:00<00:00, 2199.93it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 16%|β| 24/146 [00:00<00:00, 2259.10it/s, Materializing param=model.layers.2.post_attention_layernorm
Loading weights: 16%|β| 24/146 [00:00<00:00, 2236.47it/s, Materializing param=model.layers.2.post_attention_layernorm
Loading weights: 17%|β| 25/146 [00:00<00:00, 2246.50it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 17%|β| 25/146 [00:00<00:00, 2226.28it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 18%|β| 26/146 [00:00<00:00, 2284.38it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 18%|β| 26/146 [00:00<00:00, 2265.31it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 18%|β| 27/146 [00:00<00:00, 2308.27it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 18%|β| 27/146 [00:00<00:00, 2288.87it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 19%|β| 28/146 [00:00<00:00, 2342.67it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 19%|β| 28/146 [00:00<00:00, 2288.80it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 20%|β | 29/146 [00:00<00:00, 2285.34it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 20%|β | 29/146 [00:00<00:00, 2254.58it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 21%|β | 30/146 [00:00<00:00, 2288.47it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 21%|β | 30/146 [00:00<00:00, 2270.79it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 21%|β | 31/146 [00:00<00:00, 2270.24it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 21%|β | 31/146 [00:00<00:00, 2253.63it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 22%|ββ | 32/146 [00:00<00:00, 2279.82it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 22%|ββ | 32/146 [00:00<00:00, 2263.22it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 23%|β| 33/146 [00:00<00:00, 2305.98it/s, Materializing param=model.layers.3.post_attention_layernorm
Loading weights: 23%|β| 33/146 [00:00<00:00, 2270.32it/s, Materializing param=model.layers.3.post_attention_layernorm
Loading weights: 23%|β| 34/146 [00:00<00:00, 2311.77it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 23%|β| 34/146 [00:00<00:00, 2296.95it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 24%|β| 35/146 [00:00<00:00, 2327.44it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 24%|β| 35/146 [00:00<00:00, 2312.44it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 25%|β| 36/146 [00:00<00:00, 2352.46it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 25%|β| 36/146 [00:00<00:00, 2330.78it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 25%|β| 37/146 [00:00<00:00, 2344.89it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 25%|β| 37/146 [00:00<00:00, 2330.38it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 26%|β | 38/146 [00:00<00:00, 2361.13it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 26%|β | 38/146 [00:00<00:00, 2347.33it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 27%|β | 39/146 [00:00<00:00, 2386.88it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 27%|β | 39/146 [00:00<00:00, 2373.31it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 27%|β | 40/146 [00:00<00:00, 2412.04it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 27%|β | 40/146 [00:00<00:00, 2398.66it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 28%|ββ | 41/146 [00:00<00:00, 2435.20it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 28%|ββ | 41/146 [00:00<00:00, 2421.24it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 29%|β| 42/146 [00:00<00:00, 2458.18it/s, Materializing param=model.layers.4.post_attention_layernorm
Loading weights: 29%|β| 42/146 [00:00<00:00, 2443.52it/s, Materializing param=model.layers.4.post_attention_layernorm
Loading weights: 29%|β| 43/146 [00:00<00:00, 2478.09it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 29%|β| 43/146 [00:00<00:00, 2464.94it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 30%|β| 44/146 [00:00<00:00, 2500.77it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 30%|β| 44/146 [00:00<00:00, 2487.56it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 31%|β| 45/146 [00:00<00:00, 2521.26it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 31%|β| 45/146 [00:00<00:00, 2507.92it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 32%|β| 46/146 [00:00<00:00, 2541.83it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 32%|β| 46/146 [00:00<00:00, 2528.28it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 32%|β | 47/146 [00:00<00:00, 2560.09it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 32%|β | 47/146 [00:00<00:00, 2546.57it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 33%|ββ | 48/146 [00:00<00:00, 2579.69it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 33%|ββ | 48/146 [00:00<00:00, 2566.53it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 34%|ββ | 49/146 [00:00<00:00, 2597.45it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 34%|ββ | 49/146 [00:00<00:00, 2584.10it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 34%|ββ | 50/146 [00:00<00:00, 2615.49it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 34%|ββ | 50/146 [00:00<00:00, 2582.45it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 35%|β| 51/146 [00:00<00:00, 2610.40it/s, Materializing param=model.layers.5.post_attention_layernorm
Loading weights: 35%|β| 51/146 [00:00<00:00, 2597.03it/s, Materializing param=model.layers.5.post_attention_layernorm
Loading weights: 36%|β| 52/146 [00:00<00:00, 2627.41it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 36%|β| 52/146 [00:00<00:00, 2614.56it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 36%|β| 53/146 [00:00<00:00, 2642.07it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 36%|β| 53/146 [00:00<00:00, 2628.76it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 37%|β| 54/146 [00:00<00:00, 2650.37it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 37%|β| 54/146 [00:00<00:00, 2638.05it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 38%|β| 55/146 [00:00<00:00, 2665.33it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 38%|β| 55/146 [00:00<00:00, 2653.16it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 38%|β | 56/146 [00:00<00:00, 2681.87it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 38%|β | 56/146 [00:00<00:00, 2669.83it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 39%|ββ | 57/146 [00:00<00:00, 2696.94it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 39%|ββ | 57/146 [00:00<00:00, 2684.70it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 40%|ββ | 58/146 [00:00<00:00, 2710.25it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 40%|ββ | 58/146 [00:00<00:00, 2698.08it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 40%|βββ | 59/146 [00:00<00:00, 2724.23it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 40%|βββ | 59/146 [00:00<00:00, 2711.93it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 41%|β| 60/146 [00:00<00:00, 2738.54it/s, Materializing param=model.layers.6.post_attention_layernorm
Loading weights: 41%|β| 60/146 [00:00<00:00, 2726.79it/s, Materializing param=model.layers.6.post_attention_layernorm
Loading weights: 42%|β| 61/146 [00:00<00:00, 2751.55it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 42%|β| 61/146 [00:00<00:00, 2739.85it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 42%|β| 62/146 [00:00<00:00, 2765.96it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 42%|β| 62/146 [00:00<00:00, 2754.27it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 43%|β| 63/146 [00:00<00:00, 2778.74it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 43%|β| 63/146 [00:00<00:00, 2766.55it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 44%|β| 64/146 [00:00<00:00, 2769.83it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 44%|β| 64/146 [00:00<00:00, 2758.62it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 45%|β | 65/146 [00:00<00:00, 2781.77it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 45%|β | 65/146 [00:00<00:00, 2770.85it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 45%|ββ | 66/146 [00:00<00:00, 2786.86it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 45%|ββ | 66/146 [00:00<00:00, 2775.29it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 46%|ββ | 67/146 [00:00<00:00, 2701.19it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 46%|ββ | 67/146 [00:00<00:00, 2689.92it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 47%|βββ | 68/146 [00:00<00:00, 2705.08it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 47%|βββ | 68/146 [00:00<00:00, 2694.80it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 47%|β| 69/146 [00:00<00:00, 2690.18it/s, Materializing param=model.layers.7.post_attention_layernorm
Loading weights: 47%|β| 69/146 [00:00<00:00, 2679.87it/s, Materializing param=model.layers.7.post_attention_layernorm
Loading weights: 48%|β| 70/146 [00:00<00:00, 2703.71it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 48%|β| 70/146 [00:00<00:00, 2694.21it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 49%|β| 71/146 [00:00<00:00, 2718.20it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 49%|β| 71/146 [00:00<00:00, 2708.81it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 49%|β| 72/146 [00:00<00:00, 2732.79it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 49%|β| 72/146 [00:00<00:00, 2723.70it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 50%|β| 73/146 [00:00<00:00, 2747.28it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 50%|β| 73/146 [00:00<00:00, 2737.62it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 51%|β | 74/146 [00:00<00:00, 2760.59it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 51%|β | 74/146 [00:00<00:00, 2751.29it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 51%|ββ | 75/146 [00:00<00:00, 2774.60it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 51%|ββ | 75/146 [00:00<00:00, 2765.09it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 52%|ββ | 76/146 [00:00<00:00, 2787.47it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 52%|ββ | 76/146 [00:00<00:00, 2778.44it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 53%|ββββ | 77/146 [00:00<00:00, 2800.98it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 53%|ββββ | 77/146 [00:00<00:00, 2791.80it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 53%|β| 78/146 [00:00<00:00, 2814.12it/s, Materializing param=model.layers.8.post_attention_layernorm
Loading weights: 53%|β| 78/146 [00:00<00:00, 2804.74it/s, Materializing param=model.layers.8.post_attention_layernorm
Loading weights: 54%|β| 79/146 [00:00<00:00, 2826.57it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 54%|β| 79/146 [00:00<00:00, 2817.00it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 55%|β| 80/146 [00:00<00:00, 2838.35it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 55%|β| 80/146 [00:00<00:00, 2829.33it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 55%|β| 81/146 [00:00<00:00, 2850.63it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 55%|β| 81/146 [00:00<00:00, 2841.67it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 56%|β| 82/146 [00:00<00:00, 2862.84it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 56%|β| 82/146 [00:00<00:00, 2853.95it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 57%|ββ| 83/146 [00:00<00:00, 2875.11it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 57%|ββ| 83/146 [00:00<00:00, 2865.93it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 58%|βββ | 84/146 [00:00<00:00, 2886.18it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 58%|βββ | 84/146 [00:00<00:00, 2877.06it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 58%|βββ | 85/146 [00:00<00:00, 2897.30it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 58%|βββ | 85/146 [00:00<00:00, 2888.05it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 59%|ββββ | 86/146 [00:00<00:00, 2908.37it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 59%|ββββ | 86/146 [00:00<00:00, 2899.34it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 60%|β| 87/146 [00:00<00:00, 2919.24it/s, Materializing param=model.layers.9.post_attention_layernorm
Loading weights: 60%|β| 87/146 [00:00<00:00, 2909.62it/s, Materializing param=model.layers.9.post_attention_layernorm
Loading weights: 60%|β| 88/146 [00:00<00:00, 2929.08it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 60%|β| 88/146 [00:00<00:00, 2920.39it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 61%|β| 89/146 [00:00<00:00, 2939.92it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 61%|β| 89/146 [00:00<00:00, 2931.03it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 62%|β| 90/146 [00:00<00:00, 2950.36it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 62%|β| 90/146 [00:00<00:00, 2941.79it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 62%|β| 91/146 [00:00<00:00, 2961.27it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 62%|β| 91/146 [00:00<00:00, 2952.80it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 63%|β| 92/146 [00:00<00:00, 2971.84it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 63%|β| 92/146 [00:00<00:00, 2963.38it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 64%|ββ | 93/146 [00:00<00:00, 2982.94it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 64%|ββ | 93/146 [00:00<00:00, 2974.50it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 64%|ββ | 94/146 [00:00<00:00, 2993.00it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 64%|ββ | 94/146 [00:00<00:00, 2984.32it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 65%|ββββ | 95/146 [00:00<00:00, 3002.48it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 65%|ββββ | 95/146 [00:00<00:00, 2993.77it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 66%|β| 96/146 [00:00<00:00, 3012.47it/s, Materializing param=model.layers.10.post_attention_layernor
Loading weights: 66%|β| 96/146 [00:00<00:00, 3003.26it/s, Materializing param=model.layers.10.post_attention_layernor
Loading weights: 66%|β| 97/146 [00:00<00:00, 3020.46it/s, Materializing param=model.layers.10.self_attn.k_proj.weight
Loading weights: 66%|β| 97/146 [00:00<00:00, 3012.03it/s, Materializing param=model.layers.10.self_attn.k_proj.weight
Loading weights: 67%|β| 98/146 [00:00<00:00, 3029.78it/s, Materializing param=model.layers.10.self_attn.o_proj.weight
Loading weights: 67%|β| 98/146 [00:00<00:00, 3021.06it/s, Materializing param=model.layers.10.self_attn.o_proj.weight
Loading weights: 68%|β| 99/146 [00:00<00:00, 3038.91it/s, Materializing param=model.layers.10.self_attn.q_proj.weight
Loading weights: 68%|β| 99/146 [00:00<00:00, 3030.63it/s, Materializing param=model.layers.10.self_attn.q_proj.weight
Loading weights: 68%|β| 100/146 [00:00<00:00, 3048.54it/s, Materializing param=model.layers.10.self_attn.v_proj.weigh
Loading weights: 68%|β| 100/146 [00:00<00:00, 3039.48it/s, Materializing param=model.layers.10.self_attn.v_proj.weigh
Loading weights: 69%|β| 101/146 [00:00<00:00, 3057.14it/s, Materializing param=model.layers.11.input_layernorm.weight
Loading weights: 69%|β| 101/146 [00:00<00:00, 3049.06it/s, Materializing param=model.layers.11.input_layernorm.weight
Loading weights: 70%|ββ| 102/146 [00:00<00:00, 3066.91it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 70%|ββ| 102/146 [00:00<00:00, 3058.60it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 71%|ββ| 103/146 [00:00<00:00, 3076.10it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 71%|ββ| 103/146 [00:00<00:00, 3068.04it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 71%|βββ | 104/146 [00:00<00:00, 3085.16it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 71%|βββ | 104/146 [00:00<00:00, 3076.72it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 72%|β| 105/146 [00:00<00:00, 3093.64it/s, Materializing param=model.layers.11.post_attention_layerno
Loading weights: 72%|β| 105/146 [00:00<00:00, 3085.04it/s, Materializing param=model.layers.11.post_attention_layerno
Loading weights: 73%|β| 106/146 [00:00<00:00, 3101.34it/s, Materializing param=model.layers.11.self_attn.k_proj.weigh
Loading weights: 73%|β| 106/146 [00:00<00:00, 3093.14it/s, Materializing param=model.layers.11.self_attn.k_proj.weigh
Loading weights: 73%|β| 107/146 [00:00<00:00, 3109.88it/s, Materializing param=model.layers.11.self_attn.o_proj.weigh
Loading weights: 73%|β| 107/146 [00:00<00:00, 3101.46it/s, Materializing param=model.layers.11.self_attn.o_proj.weigh
Loading weights: 74%|β| 108/146 [00:00<00:00, 3118.06it/s, Materializing param=model.layers.11.self_attn.q_proj.weigh
Loading weights: 74%|β| 108/146 [00:00<00:00, 3109.71it/s, Materializing param=model.layers.11.self_attn.q_proj.weigh
Loading weights: 75%|β| 109/146 [00:00<00:00, 3125.82it/s, Materializing param=model.layers.11.self_attn.v_proj.weigh
Loading weights: 75%|β| 109/146 [00:00<00:00, 3117.55it/s, Materializing param=model.layers.11.self_attn.v_proj.weigh
Loading weights: 75%|β| 110/146 [00:00<00:00, 3133.29it/s, Materializing param=model.layers.12.input_layernorm.weight
Loading weights: 75%|β| 110/146 [00:00<00:00, 3125.52it/s, Materializing param=model.layers.12.input_layernorm.weight
Loading weights: 76%|ββ| 111/146 [00:00<00:00, 3142.06it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 76%|ββ| 111/146 [00:00<00:00, 3133.79it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 77%|ββ| 112/146 [00:00<00:00, 3149.47it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 77%|ββ| 112/146 [00:00<00:00, 3141.57it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 77%|βββ | 113/146 [00:00<00:00, 3157.46it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 77%|βββ | 113/146 [00:00<00:00, 3149.40it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 78%|β| 114/146 [00:00<00:00, 3164.90it/s, Materializing param=model.layers.12.post_attention_layerno
Loading weights: 78%|β| 114/146 [00:00<00:00, 3156.57it/s, Materializing param=model.layers.12.post_attention_layerno
Loading weights: 79%|β| 115/146 [00:00<00:00, 3172.36it/s, Materializing param=model.layers.12.self_attn.k_proj.weigh
Loading weights: 79%|β| 115/146 [00:00<00:00, 3164.83it/s, Materializing param=model.layers.12.self_attn.k_proj.weigh
Loading weights: 79%|β| 116/146 [00:00<00:00, 3180.56it/s, Materializing param=model.layers.12.self_attn.o_proj.weigh
Loading weights: 79%|β| 116/146 [00:00<00:00, 3172.70it/s, Materializing param=model.layers.12.self_attn.o_proj.weigh
Loading weights: 80%|β| 117/146 [00:00<00:00, 3188.15it/s, Materializing param=model.layers.12.self_attn.q_proj.weigh
Loading weights: 80%|β| 117/146 [00:00<00:00, 3179.99it/s, Materializing param=model.layers.12.self_attn.q_proj.weigh
Loading weights: 81%|β| 118/146 [00:00<00:00, 3195.62it/s, Materializing param=model.layers.12.self_attn.v_proj.weigh
Loading weights: 81%|β| 118/146 [00:00<00:00, 3188.02it/s, Materializing param=model.layers.12.self_attn.v_proj.weigh
Loading weights: 82%|β| 119/146 [00:00<00:00, 3203.57it/s, Materializing param=model.layers.13.input_layernorm.weight
Loading weights: 82%|β| 119/146 [00:00<00:00, 3195.94it/s, Materializing param=model.layers.13.input_layernorm.weight
Loading weights: 82%|ββ| 120/146 [00:00<00:00, 3211.09it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 82%|ββ| 120/146 [00:00<00:00, 3203.47it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 83%|ββ| 121/146 [00:00<00:00, 3218.57it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 83%|ββ| 121/146 [00:00<00:00, 3210.73it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 84%|ββββ| 122/146 [00:00<00:00, 3225.59it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 84%|ββββ| 122/146 [00:00<00:00, 3218.09it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 84%|β| 123/146 [00:00<00:00, 3232.78it/s, Materializing param=model.layers.13.post_attention_layerno
Loading weights: 84%|β| 123/146 [00:00<00:00, 3224.59it/s, Materializing param=model.layers.13.post_attention_layerno
Loading weights: 85%|β| 124/146 [00:00<00:00, 3239.05it/s, Materializing param=model.layers.13.self_attn.k_proj.weigh
Loading weights: 85%|β| 124/146 [00:00<00:00, 3231.12it/s, Materializing param=model.layers.13.self_attn.k_proj.weigh
Loading weights: 86%|β| 125/146 [00:00<00:00, 3244.34it/s, Materializing param=model.layers.13.self_attn.o_proj.weigh
Loading weights: 86%|β| 125/146 [00:00<00:00, 3236.71it/s, Materializing param=model.layers.13.self_attn.o_proj.weigh
Loading weights: 86%|β| 126/146 [00:00<00:00, 3250.90it/s, Materializing param=model.layers.13.self_attn.q_proj.weigh
Loading weights: 86%|β| 126/146 [00:00<00:00, 3242.90it/s, Materializing param=model.layers.13.self_attn.q_proj.weigh
Loading weights: 87%|β| 127/146 [00:00<00:00, 3256.13it/s, Materializing param=model.layers.13.self_attn.v_proj.weigh
Loading weights: 87%|β| 127/146 [00:00<00:00, 3248.23it/s, Materializing param=model.layers.13.self_attn.v_proj.weigh
Loading weights: 88%|β| 128/146 [00:00<00:00, 3262.11it/s, Materializing param=model.layers.14.input_layernorm.weight
Loading weights: 88%|β| 128/146 [00:00<00:00, 3254.26it/s, Materializing param=model.layers.14.input_layernorm.weight
Loading weights: 88%|ββ| 129/146 [00:00<00:00, 3268.17it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 88%|ββ| 129/146 [00:00<00:00, 3260.49it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 89%|ββ| 130/146 [00:00<00:00, 3274.08it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 89%|ββ| 130/146 [00:00<00:00, 3266.51it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 90%|ββββ| 131/146 [00:00<00:00, 3280.07it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 90%|ββββ| 131/146 [00:00<00:00, 3271.71it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 90%|β| 132/146 [00:00<00:00, 3284.61it/s, Materializing param=model.layers.14.post_attention_layerno
Loading weights: 90%|β| 132/146 [00:00<00:00, 3276.53it/s, Materializing param=model.layers.14.post_attention_layerno
Loading weights: 91%|β| 133/146 [00:00<00:00, 3289.65it/s, Materializing param=model.layers.14.self_attn.k_proj.weigh
Loading weights: 91%|β| 133/146 [00:00<00:00, 3282.10it/s, Materializing param=model.layers.14.self_attn.k_proj.weigh
Loading weights: 92%|β| 134/146 [00:00<00:00, 3294.99it/s, Materializing param=model.layers.14.self_attn.o_proj.weigh
Loading weights: 92%|β| 134/146 [00:00<00:00, 3287.73it/s, Materializing param=model.layers.14.self_attn.o_proj.weigh
Loading weights: 92%|β| 135/146 [00:00<00:00, 3300.98it/s, Materializing param=model.layers.14.self_attn.q_proj.weigh
Loading weights: 92%|β| 135/146 [00:00<00:00, 3293.59it/s, Materializing param=model.layers.14.self_attn.q_proj.weigh
Loading weights: 93%|β| 136/146 [00:00<00:00, 3306.60it/s, Materializing param=model.layers.14.self_attn.v_proj.weigh
Loading weights: 93%|β| 136/146 [00:00<00:00, 3299.51it/s, Materializing param=model.layers.14.self_attn.v_proj.weigh
Loading weights: 94%|β| 137/146 [00:00<00:00, 3312.77it/s, Materializing param=model.layers.15.input_layernorm.weight
Loading weights: 94%|β| 137/146 [00:00<00:00, 3305.32it/s, Materializing param=model.layers.15.input_layernorm.weight
Loading weights: 95%|ββ| 138/146 [00:00<00:00, 3318.39it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 95%|ββ| 138/146 [00:00<00:00, 3311.14it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 95%|ββ| 139/146 [00:00<00:00, 3324.35it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 95%|ββ| 139/146 [00:00<00:00, 3316.99it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 96%|ββββ| 140/146 [00:00<00:00, 3329.81it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 96%|ββββ| 140/146 [00:00<00:00, 3322.95it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 97%|β| 141/146 [00:00<00:00, 3335.99it/s, Materializing param=model.layers.15.post_attention_layerno
Loading weights: 97%|β| 141/146 [00:00<00:00, 3328.64it/s, Materializing param=model.layers.15.post_attention_layerno
Loading weights: 97%|β| 142/146 [00:00<00:00, 3341.25it/s, Materializing param=model.layers.15.self_attn.k_proj.weigh
Loading weights: 97%|β| 142/146 [00:00<00:00, 3334.14it/s, Materializing param=model.layers.15.self_attn.k_proj.weigh
Loading weights: 98%|β| 143/146 [00:00<00:00, 3346.87it/s, Materializing param=model.layers.15.self_attn.o_proj.weigh
Loading weights: 98%|β| 143/146 [00:00<00:00, 3339.73it/s, Materializing param=model.layers.15.self_attn.o_proj.weigh
Loading weights: 99%|β| 144/146 [00:00<00:00, 3352.24it/s, Materializing param=model.layers.15.self_attn.q_proj.weigh
Loading weights: 99%|β| 144/146 [00:00<00:00, 3345.20it/s, Materializing param=model.layers.15.self_attn.q_proj.weigh
Loading weights: 99%|β| 145/146 [00:00<00:00, 3357.85it/s, Materializing param=model.layers.15.self_attn.v_proj.weigh
Loading weights: 99%|β| 145/146 [00:00<00:00, 3350.73it/s, Materializing param=model.layers.15.self_attn.v_proj.weigh
Loading weights: 100%|βββββββββββββββββββββ| 146/146 [00:00<00:00, 3363.07it/s, Materializing param=model.norm.weight]
Loading weights: 100%|βββββββββββββββββββββ| 146/146 [00:00<00:00, 3355.99it/s, Materializing param=model.norm.weight]
Loading weights: 100%|βββββββββββββββββββββ| 146/146 [00:00<00:00, 3341.97it/s, Materializing param=model.norm.weight] |