Fizzarolli's picture
Upload folder using huggingface_hub
4423878 verified
[2026-02-15 03:59:23,355] [WARNING] [py.warnings._showwarnmsg:110] [PID:6181] /root/axolotl/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Loading weights: 0%| | 0/291 [00:00<?, ?it/s] Loading weights: 0%|▍ | 1/291 [00:00<00:00, 6584.46it/s, Materializing param=lm_head.weight] Loading weights: 0%|▍ | 1/291 [00:00<00:00, 3013.15it/s, Materializing param=lm_head.weight] Loading weights: 1%|▋ | 2/291 [00:00<00:00, 3184.74it/s, Materializing param=model.embed_tokens.weight] Loading weights: 1%|▋ | 2/291 [00:00<00:00, 2559.84it/s, Materializing param=model.embed_tokens.weight] Loading weights: 1%|▉ | 3/291 [00:00<00:00, 2903.30it/s, Materializing param=model.layers.0.input_layernorm.weight] Loading weights: 1%|▉ | 3/291 [00:00<00:00, 2543.54it/s, Materializing param=model.layers.0.input_layernorm.weight] Loading weights: 1%|█▎ | 4/291 [00:00<00:00, 2859.59it/s, Materializing param=model.layers.0.mlp.down_proj.weight] Loading weights: 1%|█▎ | 4/291 [00:00<00:00, 2592.28it/s, Materializing param=model.layers.0.mlp.down_proj.weight] Loading weights: 2%|█▋ | 5/291 [00:00<00:00, 2857.93it/s, Materializing param=model.layers.0.mlp.gate_proj.weight] Loading weights: 2%|█▋ | 5/291 [00:00<00:00, 2619.80it/s, Materializing param=model.layers.0.mlp.gate_proj.weight] Loading weights: 2%|██ | 6/291 [00:00<00:00, 2826.03it/s, Materializing param=model.layers.0.mlp.up_proj.weight] Loading weights: 2%|██ | 6/291 [00:00<00:00, 2651.27it/s, Materializing param=model.layers.0.mlp.up_proj.weight] Loading weights: 2%|██ | 7/291 [00:00<00:00, 2811.73it/s, Materializing param=model.layers.0.post_attention_layernorm.weight] Loading weights: 2%|██ | 7/291 [00:00<00:00, 2664.02it/s, Materializing param=model.layers.0.post_attention_layernorm.weight] Loading weights: 3%|██▌ | 8/291 [00:00<00:00, 2640.21it/s, Materializing param=model.layers.0.self_attn.k_proj.weight] Loading weights: 3%|██▌ | 8/291 [00:00<00:00, 2505.93it/s, Materializing param=model.layers.0.self_attn.k_proj.weight] Loading weights: 3%|██▉ | 9/291 [00:00<00:00, 2643.10it/s, Materializing param=model.layers.0.self_attn.o_proj.weight] Loading weights: 3%|██▉ | 9/291 [00:00<00:00, 2368.03it/s, Materializing param=model.layers.0.self_attn.o_proj.weight] Loading weights: 3%|███▏ | 10/291 [00:00<00:00, 2455.83it/s, Materializing param=model.layers.0.self_attn.q_proj.weight] Loading weights: 3%|███▏ | 10/291 [00:00<00:00, 2371.40it/s, Materializing param=model.layers.0.self_attn.q_proj.weight] Loading weights: 4%|███▌ | 11/291 [00:00<00:00, 2475.18it/s, Materializing param=model.layers.0.self_attn.v_proj.weight] Loading weights: 4%|███▌ | 11/291 [00:00<00:00, 2399.86it/s, Materializing param=model.layers.0.self_attn.v_proj.weight] Loading weights: 4%|███▉ | 12/291 [00:00<00:00, 2497.97it/s, Materializing param=model.layers.1.input_layernorm.weight] Loading weights: 4%|███▉ | 12/291 [00:00<00:00, 2428.08it/s, Materializing param=model.layers.1.input_layernorm.weight] Loading weights: 4%|████▎ | 13/291 [00:00<00:00, 2519.80it/s, Materializing param=model.layers.1.mlp.down_proj.weight] Loading weights: 4%|████▎ | 13/291 [00:00<00:00, 2453.14it/s, Materializing param=model.layers.1.mlp.down_proj.weight] Loading weights: 5%|████▌ | 14/291 [00:00<00:00, 2320.23it/s, Materializing param=model.layers.1.mlp.gate_proj.weight] Loading weights: 5%|████▌ | 14/291 [00:00<00:00, 2262.56it/s, Materializing param=model.layers.1.mlp.gate_proj.weight] Loading weights: 5%|█████ | 15/291 [00:00<00:00, 2344.41it/s, Materializing param=model.layers.1.mlp.up_proj.weight] Loading weights: 5%|█████ | 15/291 [00:00<00:00, 2297.41it/s, Materializing param=model.layers.1.mlp.up_proj.weight] Loading weights: 5%|████▋ | 16/291 [00:00<00:00, 2221.49it/s, Materializing param=model.layers.1.post_attention_layernorm.weight] Loading weights: 5%|████▋ | 16/291 [00:00<00:00, 2175.26it/s, Materializing param=model.layers.1.post_attention_layernorm.weight] Loading weights: 6%|█████▍ | 17/291 [00:00<00:00, 2185.81it/s, Materializing param=model.layers.1.self_attn.k_proj.weight] Loading weights: 6%|█████▍ | 17/291 [00:00<00:00, 2034.97it/s, Materializing param=model.layers.1.self_attn.k_proj.weight] Loading weights: 6%|█████▊ | 18/291 [00:00<00:00, 2093.43it/s, Materializing param=model.layers.1.self_attn.o_proj.weight] Loading weights: 6%|█████▊ | 18/291 [00:00<00:00, 2059.40it/s, Materializing param=model.layers.1.self_attn.o_proj.weight] Loading weights: 7%|██████ | 19/291 [00:00<00:00, 2073.15it/s, Materializing param=model.layers.1.self_attn.q_proj.weight] Loading weights: 7%|██████ | 19/291 [00:00<00:00, 2040.03it/s, Materializing param=model.layers.1.self_attn.q_proj.weight] Loading weights: 7%|██████▍ | 20/291 [00:00<00:00, 2100.14it/s, Materializing param=model.layers.1.self_attn.v_proj.weight] Loading weights: 7%|██████▍ | 20/291 [00:00<00:00, 2069.68it/s, Materializing param=model.layers.1.self_attn.v_proj.weight] Loading weights: 7%|██████▊ | 21/291 [00:00<00:00, 2079.82it/s, Materializing param=model.layers.2.input_layernorm.weight] Loading weights: 7%|██████▊ | 21/291 [00:00<00:00, 2058.58it/s, Materializing param=model.layers.2.input_layernorm.weight] Loading weights: 8%|███████▎ | 22/291 [00:00<00:00, 2126.34it/s, Materializing param=model.layers.2.mlp.down_proj.weight] Loading weights: 8%|███████▎ | 22/291 [00:00<00:00, 2106.68it/s, Materializing param=model.layers.2.mlp.down_proj.weight] Loading weights: 8%|███████▌ | 23/291 [00:00<00:00, 2026.15it/s, Materializing param=model.layers.2.mlp.gate_proj.weight] Loading weights: 8%|███████▌ | 23/291 [00:00<00:00, 2000.81it/s, Materializing param=model.layers.2.mlp.gate_proj.weight] Loading weights: 8%|████████ | 24/291 [00:00<00:00, 2059.19it/s, Materializing param=model.layers.2.mlp.up_proj.weight] Loading weights: 8%|████████ | 24/291 [00:00<00:00, 2039.70it/s, Materializing param=model.layers.2.mlp.up_proj.weight] Loading weights: 9%|███████▎ | 25/291 [00:00<00:00, 2089.46it/s, Materializing param=model.layers.2.post_attention_layernorm.weight] Loading weights: 9%|███████▎ | 25/291 [00:00<00:00, 2029.92it/s, Materializing param=model.layers.2.post_attention_layernorm.weight] Loading weights: 9%|████████▎ | 26/291 [00:00<00:00, 2081.22it/s, Materializing param=model.layers.2.self_attn.k_proj.weight] Loading weights: 9%|████████▎ | 26/291 [00:00<00:00, 2063.54it/s, Materializing param=model.layers.2.self_attn.k_proj.weight] Loading weights: 9%|████████▋ | 27/291 [00:00<00:00, 2110.75it/s, Materializing param=model.layers.2.self_attn.o_proj.weight] Loading weights: 9%|████████▋ | 27/291 [00:00<00:00, 2025.94it/s, Materializing param=model.layers.2.self_attn.o_proj.weight] Loading weights: 10%|████████▉ | 28/291 [00:00<00:00, 2048.68it/s, Materializing param=model.layers.2.self_attn.q_proj.weight] Loading weights: 10%|████████▉ | 28/291 [00:00<00:00, 2032.97it/s, Materializing param=model.layers.2.self_attn.q_proj.weight] Loading weights: 10%|█████████▎ | 29/291 [00:00<00:00, 2080.69it/s, Materializing param=model.layers.2.self_attn.v_proj.weight] Loading weights: 10%|█████████▎ | 29/291 [00:00<00:00, 2065.88it/s, Materializing param=model.layers.2.self_attn.v_proj.weight] Loading weights: 10%|█████████▋ | 30/291 [00:00<00:00, 2112.64it/s, Materializing param=model.layers.3.input_layernorm.weight] Loading weights: 10%|█████████▋ | 30/291 [00:00<00:00, 2097.96it/s, Materializing param=model.layers.3.input_layernorm.weight] Loading weights: 11%|██████████▏ | 31/291 [00:00<00:00, 2145.42it/s, Materializing param=model.layers.3.mlp.down_proj.weight] Loading weights: 11%|██████████▏ | 31/291 [00:00<00:00, 2130.59it/s, Materializing param=model.layers.3.mlp.down_proj.weight] Loading weights: 11%|██████████▌ | 32/291 [00:00<00:00, 2038.70it/s, Materializing param=model.layers.3.mlp.gate_proj.weight] Loading weights: 11%|██████████▌ | 32/291 [00:00<00:00, 2021.14it/s, Materializing param=model.layers.3.mlp.gate_proj.weight] Loading weights: 11%|███████████ | 33/291 [00:00<00:00, 2059.67it/s, Materializing param=model.layers.3.mlp.up_proj.weight] Loading weights: 11%|███████████ | 33/291 [00:00<00:00, 2045.73it/s, Materializing param=model.layers.3.mlp.up_proj.weight] Loading weights: 12%|█████████▉ | 34/291 [00:00<00:00, 2023.70it/s, Materializing param=model.layers.3.post_attention_layernorm.weight] Loading weights: 12%|█████████▉ | 34/291 [00:00<00:00, 2010.81it/s, Materializing param=model.layers.3.post_attention_layernorm.weight] Loading weights: 12%|███████████▏ | 35/291 [00:00<00:00, 2050.95it/s, Materializing param=model.layers.3.self_attn.k_proj.weight] Loading weights: 12%|███████████▏ | 35/291 [00:00<00:00, 2038.95it/s, Materializing param=model.layers.3.self_attn.k_proj.weight] Loading weights: 12%|███████████▌ | 36/291 [00:00<00:00, 2079.02it/s, Materializing param=model.layers.3.self_attn.o_proj.weight] Loading weights: 12%|███████████▌ | 36/291 [00:00<00:00, 2066.56it/s, Materializing param=model.layers.3.self_attn.o_proj.weight] Loading weights: 13%|███████████▊ | 37/291 [00:00<00:00, 2105.72it/s, Materializing param=model.layers.3.self_attn.q_proj.weight] Loading weights: 13%|███████████▊ | 37/291 [00:00<00:00, 2094.46it/s, Materializing param=model.layers.3.self_attn.q_proj.weight] Loading weights: 13%|████████████▏ | 38/291 [00:00<00:00, 2132.88it/s, Materializing param=model.layers.3.self_attn.v_proj.weight] Loading weights: 13%|████████████▏ | 38/291 [00:00<00:00, 1989.73it/s, Materializing param=model.layers.3.self_attn.v_proj.weight] Loading weights: 13%|████████████▌ | 39/291 [00:00<00:00, 2022.73it/s, Materializing param=model.layers.4.input_layernorm.weight] Loading weights: 13%|████████████▌ | 39/291 [00:00<00:00, 2011.90it/s, Materializing param=model.layers.4.input_layernorm.weight] Loading weights: 14%|█████████████▏ | 40/291 [00:00<00:00, 2047.65it/s, Materializing param=model.layers.4.mlp.down_proj.weight] Loading weights: 14%|█████████████▏ | 40/291 [00:00<00:00, 2037.21it/s, Materializing param=model.layers.4.mlp.down_proj.weight] Loading weights: 14%|█████████████▌ | 41/291 [00:00<00:00, 2037.78it/s, Materializing param=model.layers.4.mlp.gate_proj.weight] Loading weights: 14%|█████████████▌ | 41/291 [00:00<00:00, 2026.85it/s, Materializing param=model.layers.4.mlp.gate_proj.weight] Loading weights: 14%|██████████████▏ | 42/291 [00:00<00:00, 2059.83it/s, Materializing param=model.layers.4.mlp.up_proj.weight] Loading weights: 14%|██████████████▏ | 42/291 [00:00<00:00, 2049.38it/s, Materializing param=model.layers.4.mlp.up_proj.weight] Loading weights: 15%|████████████▌ | 43/291 [00:00<00:00, 2082.12it/s, Materializing param=model.layers.4.post_attention_layernorm.weight] Loading weights: 15%|████████████▌ | 43/291 [00:00<00:00, 2072.07it/s, Materializing param=model.layers.4.post_attention_layernorm.weight] Loading weights: 15%|██████████████ | 44/291 [00:00<00:00, 2103.17it/s, Materializing param=model.layers.4.self_attn.k_proj.weight] Loading weights: 15%|██████████████ | 44/291 [00:00<00:00, 2092.99it/s, Materializing param=model.layers.4.self_attn.k_proj.weight] Loading weights: 15%|██████████████▍ | 45/291 [00:00<00:00, 2122.31it/s, Materializing param=model.layers.4.self_attn.o_proj.weight] Loading weights: 15%|██████████████▍ | 45/291 [00:00<00:00, 2056.39it/s, Materializing param=model.layers.4.self_attn.o_proj.weight] Loading weights: 16%|██████████████▋ | 46/291 [00:00<00:00, 2083.29it/s, Materializing param=model.layers.4.self_attn.q_proj.weight] Loading weights: 16%|██████████████▋ | 46/291 [00:00<00:00, 2073.46it/s, Materializing param=model.layers.4.self_attn.q_proj.weight] Loading weights: 16%|███████████████ | 47/291 [00:00<00:00, 2103.69it/s, Materializing param=model.layers.4.self_attn.v_proj.weight] Loading weights: 16%|███████████████ | 47/291 [00:00<00:00, 2093.81it/s, Materializing param=model.layers.4.self_attn.v_proj.weight] Loading weights: 16%|███████████████▌ | 48/291 [00:00<00:00, 2102.67it/s, Materializing param=model.layers.5.input_layernorm.weight] Loading weights: 16%|███████████████▌ | 48/291 [00:00<00:00, 2092.60it/s, Materializing param=model.layers.5.input_layernorm.weight] Loading weights: 17%|████████████████▏ | 49/291 [00:00<00:00, 2121.37it/s, Materializing param=model.layers.5.mlp.down_proj.weight] Loading weights: 17%|████████████████▏ | 49/291 [00:00<00:00, 2112.46it/s, Materializing param=model.layers.5.mlp.down_proj.weight] Loading weights: 17%|████████████████▍ | 50/291 [00:00<00:00, 2140.43it/s, Materializing param=model.layers.5.mlp.gate_proj.weight] Loading weights: 17%|████████████████▍ | 50/291 [00:00<00:00, 2131.32it/s, Materializing param=model.layers.5.mlp.gate_proj.weight] Loading weights: 18%|█████████████████▏ | 51/291 [00:00<00:00, 2157.87it/s, Materializing param=model.layers.5.mlp.up_proj.weight] Loading weights: 18%|█████████████████▏ | 51/291 [00:00<00:00, 2148.68it/s, Materializing param=model.layers.5.mlp.up_proj.weight] Loading weights: 18%|███████████████▏ | 52/291 [00:00<00:00, 2174.34it/s, Materializing param=model.layers.5.post_attention_layernorm.weight] Loading weights: 18%|███████████████▏ | 52/291 [00:00<00:00, 2164.91it/s, Materializing param=model.layers.5.post_attention_layernorm.weight] Loading weights: 18%|████████████████▉ | 53/291 [00:00<00:00, 2191.66it/s, Materializing param=model.layers.5.self_attn.k_proj.weight] Loading weights: 18%|████████████████▉ | 53/291 [00:00<00:00, 2122.54it/s, Materializing param=model.layers.5.self_attn.k_proj.weight] Loading weights: 19%|█████████████████▎ | 54/291 [00:00<00:00, 2145.81it/s, Materializing param=model.layers.5.self_attn.o_proj.weight] Loading weights: 19%|█████████████████▎ | 54/291 [00:00<00:00, 2136.94it/s, Materializing param=model.layers.5.self_attn.o_proj.weight] Loading weights: 19%|█████████████████▌ | 55/291 [00:00<00:00, 2161.55it/s, Materializing param=model.layers.5.self_attn.q_proj.weight] Loading weights: 19%|█████████████████▌ | 55/291 [00:00<00:00, 2152.89it/s, Materializing param=model.layers.5.self_attn.q_proj.weight] Loading weights: 19%|█████████████████▉ | 56/291 [00:00<00:00, 2178.94it/s, Materializing param=model.layers.5.self_attn.v_proj.weight] Loading weights: 19%|█████████████████▉ | 56/291 [00:00<00:00, 2167.38it/s, Materializing param=model.layers.5.self_attn.v_proj.weight] Loading weights: 20%|██████████████████▍ | 57/291 [00:00<00:00, 2192.41it/s, Materializing param=model.layers.6.input_layernorm.weight] Loading weights: 20%|██████████████████▍ | 57/291 [00:00<00:00, 2183.40it/s, Materializing param=model.layers.6.input_layernorm.weight] Loading weights: 20%|███████████████████▏ | 58/291 [00:00<00:00, 2208.33it/s, Materializing param=model.layers.6.mlp.down_proj.weight] Loading weights: 20%|███████████████████▏ | 58/291 [00:00<00:00, 2199.86it/s, Materializing param=model.layers.6.mlp.down_proj.weight] Loading weights: 20%|███████████████████▍ | 59/291 [00:00<00:00, 2223.64it/s, Materializing param=model.layers.6.mlp.gate_proj.weight] Loading weights: 20%|███████████████████▍ | 59/291 [00:00<00:00, 2215.33it/s, Materializing param=model.layers.6.mlp.gate_proj.weight] Loading weights: 21%|████████████████████▏ | 60/291 [00:00<00:00, 2236.62it/s, Materializing param=model.layers.6.mlp.up_proj.weight] Loading weights: 21%|████████████████████▏ | 60/291 [00:00<00:00, 2227.79it/s, Materializing param=model.layers.6.mlp.up_proj.weight] Loading weights: 21%|█████████████████▊ | 61/291 [00:00<00:00, 2251.07it/s, Materializing param=model.layers.6.post_attention_layernorm.weight] Loading weights: 21%|█████████████████▊ | 61/291 [00:00<00:00, 2225.40it/s, Materializing param=model.layers.6.post_attention_layernorm.weight] Loading weights: 21%|███████████████████▊ | 62/291 [00:00<00:00, 2199.52it/s, Materializing param=model.layers.6.self_attn.k_proj.weight] Loading weights: 21%|███████████████████▊ | 62/291 [00:00<00:00, 2191.01it/s, Materializing param=model.layers.6.self_attn.k_proj.weight] Loading weights: 22%|████████████████████▏ | 63/291 [00:00<00:00, 2214.19it/s, Materializing param=model.layers.6.self_attn.o_proj.weight] Loading weights: 22%|████████████████████▏ | 63/291 [00:00<00:00, 2206.15it/s, Materializing param=model.layers.6.self_attn.o_proj.weight] Loading weights: 22%|████████████████████▍ | 64/291 [00:00<00:00, 2228.42it/s, Materializing param=model.layers.6.self_attn.q_proj.weight] Loading weights: 22%|████████████████████▍ | 64/291 [00:00<00:00, 2179.60it/s, Materializing param=model.layers.6.self_attn.q_proj.weight] Loading weights: 22%|████████████████████▊ | 65/291 [00:00<00:00, 2200.95it/s, Materializing param=model.layers.6.self_attn.v_proj.weight] Loading weights: 22%|████████████████████▊ | 65/291 [00:00<00:00, 2193.66it/s, Materializing param=model.layers.6.self_attn.v_proj.weight] Loading weights: 23%|█████████████████████▎ | 66/291 [00:00<00:00, 2179.55it/s, Materializing param=model.layers.7.input_layernorm.weight] Loading weights: 23%|█████████████████████▎ | 66/291 [00:00<00:00, 2172.24it/s, Materializing param=model.layers.7.input_layernorm.weight] Loading weights: 23%|██████████████████████ | 67/291 [00:00<00:00, 2194.46it/s, Materializing param=model.layers.7.mlp.down_proj.weight] Loading weights: 23%|██████████████████████ | 67/291 [00:00<00:00, 2187.65it/s, Materializing param=model.layers.7.mlp.down_proj.weight] Loading weights: 23%|██████████████████████▍ | 68/291 [00:00<00:00, 2207.63it/s, Materializing param=model.layers.7.mlp.gate_proj.weight] Loading weights: 23%|██████████████████████▍ | 68/291 [00:00<00:00, 2200.39it/s, Materializing param=model.layers.7.mlp.gate_proj.weight] Loading weights: 24%|███████████████████████▏ | 69/291 [00:00<00:00, 2220.32it/s, Materializing param=model.layers.7.mlp.up_proj.weight] Loading weights: 24%|███████████████████████▏ | 69/291 [00:00<00:00, 2211.97it/s, Materializing param=model.layers.7.mlp.up_proj.weight] Loading weights: 24%|████████████████████▍ | 70/291 [00:00<00:00, 2230.37it/s, Materializing param=model.layers.7.post_attention_layernorm.weight] Loading weights: 24%|████████████████████▍ | 70/291 [00:00<00:00, 2222.90it/s, Materializing param=model.layers.7.post_attention_layernorm.weight] Loading weights: 24%|██████████████████████▋ | 71/291 [00:00<00:00, 2242.72it/s, Materializing param=model.layers.7.self_attn.k_proj.weight] Loading weights: 24%|██████████████████████▋ | 71/291 [00:00<00:00, 2235.43it/s, Materializing param=model.layers.7.self_attn.k_proj.weight] Loading weights: 25%|███████████████████████ | 72/291 [00:00<00:00, 2255.44it/s, Materializing param=model.layers.7.self_attn.o_proj.weight] Loading weights: 25%|███████████████████████ | 72/291 [00:00<00:00, 2248.57it/s, Materializing param=model.layers.7.self_attn.o_proj.weight] Loading weights: 25%|███████████████████████▎ | 73/291 [00:00<00:00, 2268.50it/s, Materializing param=model.layers.7.self_attn.q_proj.weight] Loading weights: 25%|███████████████████████▎ | 73/291 [00:00<00:00, 2261.63it/s, Materializing param=model.layers.7.self_attn.q_proj.weight] Loading weights: 25%|███████████████████████▋ | 74/291 [00:00<00:00, 2282.04it/s, Materializing param=model.layers.7.self_attn.v_proj.weight] Loading weights: 25%|███████████████████████▋ | 74/291 [00:00<00:00, 2275.17it/s, Materializing param=model.layers.7.self_attn.v_proj.weight] Loading weights: 26%|████████████████████████▏ | 75/291 [00:00<00:00, 2294.64it/s, Materializing param=model.layers.8.input_layernorm.weight] Loading weights: 26%|████████████████████████▏ | 75/291 [00:00<00:00, 2286.92it/s, Materializing param=model.layers.8.input_layernorm.weight] Loading weights: 26%|█████████████████████████ | 76/291 [00:00<00:00, 2307.15it/s, Materializing param=model.layers.8.mlp.down_proj.weight] Loading weights: 26%|█████████████████████████ | 76/291 [00:00<00:00, 2300.36it/s, Materializing param=model.layers.8.mlp.down_proj.weight] Loading weights: 26%|█████████████████████████▍ | 77/291 [00:00<00:00, 2317.91it/s, Materializing param=model.layers.8.mlp.gate_proj.weight] Loading weights: 26%|█████████████████████████▍ | 77/291 [00:00<00:00, 2299.35it/s, Materializing param=model.layers.8.mlp.gate_proj.weight] Loading weights: 27%|██████████████████████████▎ | 78/291 [00:00<00:00, 2317.34it/s, Materializing param=model.layers.8.mlp.up_proj.weight] Loading weights: 27%|██████████████████████████▎ | 78/291 [00:00<00:00, 2310.21it/s, Materializing param=model.layers.8.mlp.up_proj.weight] Loading weights: 27%|███████████████████████ | 79/291 [00:00<00:00, 2328.09it/s, Materializing param=model.layers.8.post_attention_layernorm.weight] Loading weights: 27%|███████████████████████ | 79/291 [00:00<00:00, 2321.29it/s, Materializing param=model.layers.8.post_attention_layernorm.weight] Loading weights: 27%|█████████████████████████▌ | 80/291 [00:00<00:00, 2338.09it/s, Materializing param=model.layers.8.self_attn.k_proj.weight] Loading weights: 27%|█████████████████████████▌ | 80/291 [00:00<00:00, 2330.83it/s, Materializing param=model.layers.8.self_attn.k_proj.weight] Loading weights: 28%|█████████████████████████▉ | 81/291 [00:00<00:00, 2349.62it/s, Materializing param=model.layers.8.self_attn.o_proj.weight] Loading weights: 28%|█████████████████████████▉ | 81/291 [00:00<00:00, 2342.64it/s, Materializing param=model.layers.8.self_attn.o_proj.weight] Loading weights: 28%|██████████████████████████▏ | 82/291 [00:00<00:00, 2361.80it/s, Materializing param=model.layers.8.self_attn.q_proj.weight] Loading weights: 28%|██████████████████████████▏ | 82/291 [00:00<00:00, 2354.30it/s, Materializing param=model.layers.8.self_attn.q_proj.weight] Loading weights: 29%|██████████████████████████▌ | 83/291 [00:00<00:00, 2372.25it/s, Materializing param=model.layers.8.self_attn.v_proj.weight] Loading weights: 29%|██████████████████████████▌ | 83/291 [00:00<00:00, 2365.04it/s, Materializing param=model.layers.8.self_attn.v_proj.weight] Loading weights: 29%|███████████████████████████▏ | 84/291 [00:00<00:00, 2365.86it/s, Materializing param=model.layers.9.input_layernorm.weight] Loading weights: 29%|███████████████████████████▏ | 84/291 [00:00<00:00, 2358.67it/s, Materializing param=model.layers.9.input_layernorm.weight] Loading weights: 29%|████████████████████████████ | 85/291 [00:00<00:00, 2374.73it/s, Materializing param=model.layers.9.mlp.down_proj.weight] Loading weights: 29%|████████████████████████████ | 85/291 [00:00<00:00, 2367.57it/s, Materializing param=model.layers.9.mlp.down_proj.weight] Loading weights: 30%|████████████████████████████▎ | 86/291 [00:00<00:00, 2383.55it/s, Materializing param=model.layers.9.mlp.gate_proj.weight] Loading weights: 30%|████████████████████████████▎ | 86/291 [00:00<00:00, 2376.63it/s, Materializing param=model.layers.9.mlp.gate_proj.weight] Loading weights: 30%|█████████████████████████████▎ | 87/291 [00:00<00:00, 2327.67it/s, Materializing param=model.layers.9.mlp.up_proj.weight] Loading weights: 30%|█████████████████████████████▎ | 87/291 [00:00<00:00, 2320.92it/s, Materializing param=model.layers.9.mlp.up_proj.weight] Loading weights: 30%|█████████████████████████▋ | 88/291 [00:00<00:00, 2336.60it/s, Materializing param=model.layers.9.post_attention_layernorm.weight] Loading weights: 30%|█████████████████████████▋ | 88/291 [00:00<00:00, 2330.24it/s, Materializing param=model.layers.9.post_attention_layernorm.weight] Loading weights: 31%|████████████████████████████▍ | 89/291 [00:00<00:00, 2346.29it/s, Materializing param=model.layers.9.self_attn.k_proj.weight] Loading weights: 31%|████████████████████████████▍ | 89/291 [00:00<00:00, 2340.20it/s, Materializing param=model.layers.9.self_attn.k_proj.weight] Loading weights: 31%|████████████████████████████▊ | 90/291 [00:00<00:00, 2356.51it/s, Materializing param=model.layers.9.self_attn.o_proj.weight] Loading weights: 31%|████████████████████████████▊ | 90/291 [00:00<00:00, 2350.36it/s, Materializing param=model.layers.9.self_attn.o_proj.weight] Loading weights: 31%|█████████████████████████████ | 91/291 [00:00<00:00, 2366.90it/s, Materializing param=model.layers.9.self_attn.q_proj.weight] Loading weights: 31%|█████████████████████████████ | 91/291 [00:00<00:00, 2360.68it/s, Materializing param=model.layers.9.self_attn.q_proj.weight] Loading weights: 32%|█████████████████████████████▍ | 92/291 [00:00<00:00, 2377.34it/s, Materializing param=model.layers.9.self_attn.v_proj.weight] Loading weights: 32%|█████████████████████████████▍ | 92/291 [00:00<00:00, 2370.65it/s, Materializing param=model.layers.9.self_attn.v_proj.weight] Loading weights: 32%|█████████████████████████████▋ | 93/291 [00:00<00:00, 2387.20it/s, Materializing param=model.layers.10.input_layernorm.weight] Loading weights: 32%|█████████████████████████████▋ | 93/291 [00:00<00:00, 2363.19it/s, Materializing param=model.layers.10.input_layernorm.weight] Loading weights: 32%|██████████████████████████████▋ | 94/291 [00:00<00:00, 2338.45it/s, Materializing param=model.layers.10.mlp.down_proj.weight] Loading weights: 32%|██████████████████████████████▋ | 94/291 [00:00<00:00, 2322.50it/s, Materializing param=model.layers.10.mlp.down_proj.weight] Loading weights: 33%|███████████████████████████████ | 95/291 [00:00<00:00, 2337.19it/s, Materializing param=model.layers.10.mlp.gate_proj.weight] Loading weights: 33%|███████████████████████████████ | 95/291 [00:00<00:00, 2331.52it/s, Materializing param=model.layers.10.mlp.gate_proj.weight] Loading weights: 33%|████████████████████████████████ | 96/291 [00:00<00:00, 2346.81it/s, Materializing param=model.layers.10.mlp.up_proj.weight] Loading weights: 33%|████████████████████████████████ | 96/291 [00:00<00:00, 2341.10it/s, Materializing param=model.layers.10.mlp.up_proj.weight] Loading weights: 33%|████████████████████████████ | 97/291 [00:00<00:00, 2355.16it/s, Materializing param=model.layers.10.post_attention_layernorm.weight] Loading weights: 33%|████████████████████████████ | 97/291 [00:00<00:00, 2349.19it/s, Materializing param=model.layers.10.post_attention_layernorm.weight] Loading weights: 34%|██████████████████████████████▉ | 98/291 [00:00<00:00, 2363.30it/s, Materializing param=model.layers.10.self_attn.k_proj.weight] Loading weights: 34%|██████████████████████████████▉ | 98/291 [00:00<00:00, 2357.69it/s, Materializing param=model.layers.10.self_attn.k_proj.weight] Loading weights: 34%|███████████████████████████████▎ | 99/291 [00:00<00:00, 2371.68it/s, Materializing param=model.layers.10.self_attn.o_proj.weight] Loading weights: 34%|███████████████████████████████▎ | 99/291 [00:00<00:00, 2365.44it/s, Materializing param=model.layers.10.self_attn.o_proj.weight] Loading weights: 34%|███████████████████████████████▎ | 100/291 [00:00<00:00, 2379.67it/s, Materializing param=model.layers.10.self_attn.q_proj.weight] Loading weights: 34%|███████████████████████████████▎ | 100/291 [00:00<00:00, 2373.74it/s, Materializing param=model.layers.10.self_attn.q_proj.weight] Loading weights: 35%|███████████████████████████████▌ | 101/291 [00:00<00:00, 2381.85it/s, Materializing param=model.layers.10.self_attn.v_proj.weight] Loading weights: 35%|███████████████████████████████▌ | 101/291 [00:00<00:00, 2376.04it/s, Materializing param=model.layers.10.self_attn.v_proj.weight] Loading weights: 35%|████████████████████████████████▏ | 102/291 [00:00<00:00, 2355.99it/s, Materializing param=model.layers.11.input_layernorm.weight] Loading weights: 35%|████████████████████████████████▏ | 102/291 [00:00<00:00, 2350.21it/s, Materializing param=model.layers.11.input_layernorm.weight] Loading weights: 35%|█████████████████████████████████▎ | 103/291 [00:00<00:00, 2365.05it/s, Materializing param=model.layers.11.mlp.down_proj.weight] Loading weights: 35%|█████████████████████████████████▎ | 103/291 [00:00<00:00, 2359.75it/s, Materializing param=model.layers.11.mlp.down_proj.weight] Loading weights: 36%|█████████████████████████████████▌ | 104/291 [00:00<00:00, 2373.17it/s, Materializing param=model.layers.11.mlp.gate_proj.weight] Loading weights: 36%|█████████████████████████████████▌ | 104/291 [00:00<00:00, 2367.70it/s, Materializing param=model.layers.11.mlp.gate_proj.weight] Loading weights: 36%|██████████████████████████████████▋ | 105/291 [00:00<00:00, 2380.50it/s, Materializing param=model.layers.11.mlp.up_proj.weight] Loading weights: 36%|██████████████████████████████████▋ | 105/291 [00:00<00:00, 2375.24it/s, Materializing param=model.layers.11.mlp.up_proj.weight] Loading weights: 36%|██████████████████████████████▏ | 106/291 [00:00<00:00, 2388.70it/s, Materializing param=model.layers.11.post_attention_layernorm.weight] Loading weights: 36%|██████████████████████████████▏ | 106/291 [00:00<00:00, 2383.33it/s, Materializing param=model.layers.11.post_attention_layernorm.weight] Loading weights: 37%|█████████████████████████████████▍ | 107/291 [00:00<00:00, 2396.48it/s, Materializing param=model.layers.11.self_attn.k_proj.weight] Loading weights: 37%|█████████████████████████████████▍ | 107/291 [00:00<00:00, 2391.15it/s, Materializing param=model.layers.11.self_attn.k_proj.weight] Loading weights: 37%|█████████████████████████████████▊ | 108/291 [00:00<00:00, 2404.74it/s, Materializing param=model.layers.11.self_attn.o_proj.weight] Loading weights: 37%|█████████████████████████████████▊ | 108/291 [00:00<00:00, 2399.26it/s, Materializing param=model.layers.11.self_attn.o_proj.weight] Loading weights: 37%|██████████████████████████████████ | 109/291 [00:00<00:00, 2404.05it/s, Materializing param=model.layers.11.self_attn.q_proj.weight] Loading weights: 37%|██████████████████████████████████ | 109/291 [00:00<00:00, 2397.69it/s, Materializing param=model.layers.11.self_attn.q_proj.weight] Loading weights: 38%|██████████████████████████████████▍ | 110/291 [00:00<00:00, 2411.63it/s, Materializing param=model.layers.11.self_attn.v_proj.weight] Loading weights: 38%|██████████████████████████████████▍ | 110/291 [00:00<00:00, 2406.57it/s, Materializing param=model.layers.11.self_attn.v_proj.weight] Loading weights: 38%|███████████████████████████████████ | 111/291 [00:00<00:00, 2395.28it/s, Materializing param=model.layers.12.input_layernorm.weight] Loading weights: 38%|███████████████████████████████████ | 111/291 [00:00<00:00, 2389.57it/s, Materializing param=model.layers.12.input_layernorm.weight] Loading weights: 38%|████████████████████████████████████▏ | 112/291 [00:00<00:00, 2336.23it/s, Materializing param=model.layers.12.mlp.down_proj.weight] Loading weights: 38%|████████████████████████████████████▏ | 112/291 [00:00<00:00, 2330.65it/s, Materializing param=model.layers.12.mlp.down_proj.weight] Loading weights: 39%|████████████████████████████████████▌ | 113/291 [00:00<00:00, 2343.66it/s, Materializing param=model.layers.12.mlp.gate_proj.weight] Loading weights: 39%|████████████████████████████████████▌ | 113/291 [00:00<00:00, 2338.92it/s, Materializing param=model.layers.12.mlp.gate_proj.weight] Loading weights: 39%|█████████████████████████████████████▌ | 114/291 [00:00<00:00, 2319.84it/s, Materializing param=model.layers.12.mlp.up_proj.weight] Loading weights: 39%|█████████████████████████████████████▌ | 114/291 [00:00<00:00, 2314.22it/s, Materializing param=model.layers.12.mlp.up_proj.weight] Loading weights: 40%|████████████████████████████████▊ | 115/291 [00:00<00:00, 2326.47it/s, Materializing param=model.layers.12.post_attention_layernorm.weight] Loading weights: 40%|████████████████████████████████▊ | 115/291 [00:00<00:00, 2285.90it/s, Materializing param=model.layers.12.post_attention_layernorm.weight] Loading weights: 40%|████████████████████████████████████▎ | 116/291 [00:00<00:00, 2296.25it/s, Materializing param=model.layers.12.self_attn.k_proj.weight] Loading weights: 40%|████████████████████████████████████▎ | 116/291 [00:00<00:00, 2290.87it/s, Materializing param=model.layers.12.self_attn.k_proj.weight] Loading weights: 40%|████████████████████████████████████▌ | 117/291 [00:00<00:00, 2302.46it/s, Materializing param=model.layers.12.self_attn.o_proj.weight] Loading weights: 40%|████████████████████████████████████▌ | 117/291 [00:00<00:00, 2286.71it/s, Materializing param=model.layers.12.self_attn.o_proj.weight] Loading weights: 41%|████████████████████████████████████▉ | 118/291 [00:00<00:00, 2298.17it/s, Materializing param=model.layers.12.self_attn.q_proj.weight] Loading weights: 41%|████████████████████████████████████▉ | 118/291 [00:00<00:00, 2293.45it/s, Materializing param=model.layers.12.self_attn.q_proj.weight] Loading weights: 41%|█████████████████████████████████████▏ | 119/291 [00:00<00:00, 2306.10it/s, Materializing param=model.layers.12.self_attn.v_proj.weight] Loading weights: 41%|█████████████████████████████████████▏ | 119/291 [00:00<00:00, 2301.52it/s, Materializing param=model.layers.12.self_attn.v_proj.weight] Loading weights: 41%|█████████████████████████████████████▉ | 120/291 [00:00<00:00, 2292.30it/s, Materializing param=model.layers.13.input_layernorm.weight] Loading weights: 41%|█████████████████████████████████████▉ | 120/291 [00:00<00:00, 2279.23it/s, Materializing param=model.layers.13.input_layernorm.weight] Loading weights: 42%|███████████████████████████████████████ | 121/291 [00:00<00:00, 2290.23it/s, Materializing param=model.layers.13.mlp.down_proj.weight] Loading weights: 42%|███████████████████████████████████████ | 121/291 [00:00<00:00, 2285.71it/s, Materializing param=model.layers.13.mlp.down_proj.weight] Loading weights: 42%|███████████████████████████████████████▍ | 122/291 [00:00<00:00, 2297.49it/s, Materializing param=model.layers.13.mlp.gate_proj.weight] Loading weights: 42%|███████████████████████████████████████▍ | 122/291 [00:00<00:00, 2292.79it/s, Materializing param=model.layers.13.mlp.gate_proj.weight] Loading weights: 42%|████████████████████████████████████████▌ | 123/291 [00:00<00:00, 2304.91it/s, Materializing param=model.layers.13.mlp.up_proj.weight] Loading weights: 42%|████████████████████████████████████████▌ | 123/291 [00:00<00:00, 2292.55it/s, Materializing param=model.layers.13.mlp.up_proj.weight] Loading weights: 43%|███████████████████████████████████▎ | 124/291 [00:00<00:00, 2300.77it/s, Materializing param=model.layers.13.post_attention_layernorm.weight] Loading weights: 43%|███████████████████████████████████▎ | 124/291 [00:00<00:00, 2296.35it/s, Materializing param=model.layers.13.post_attention_layernorm.weight] Loading weights: 43%|███████████████████████████████████████ | 125/291 [00:00<00:00, 2307.95it/s, Materializing param=model.layers.13.self_attn.k_proj.weight] Loading weights: 43%|███████████████████████████████████████ | 125/291 [00:00<00:00, 2303.88it/s, Materializing param=model.layers.13.self_attn.k_proj.weight] Loading weights: 43%|███████████████████████████████████████▍ | 126/291 [00:00<00:00, 2314.93it/s, Materializing param=model.layers.13.self_attn.o_proj.weight] Loading weights: 43%|███████████████████████████████████████▍ | 126/291 [00:00<00:00, 2309.07it/s, Materializing param=model.layers.13.self_attn.o_proj.weight] Loading weights: 44%|███████████████████████████████████████▋ | 127/291 [00:00<00:00, 2317.71it/s, Materializing param=model.layers.13.self_attn.q_proj.weight] Loading weights: 44%|███████████████████████████████████████▋ | 127/291 [00:00<00:00, 2311.83it/s, Materializing param=model.layers.13.self_attn.q_proj.weight] Loading weights: 44%|████████████████████████████████████████ | 128/291 [00:00<00:00, 2321.09it/s, Materializing param=model.layers.13.self_attn.v_proj.weight] Loading weights: 44%|████████████████████████████████████████ | 128/291 [00:00<00:00, 2315.33it/s, Materializing param=model.layers.13.self_attn.v_proj.weight] Loading weights: 44%|████████████████████████████████████████▊ | 129/291 [00:00<00:00, 2323.35it/s, Materializing param=model.layers.14.input_layernorm.weight] Loading weights: 44%|████████████████████████████████████████▊ | 129/291 [00:00<00:00, 2317.32it/s, Materializing param=model.layers.14.input_layernorm.weight] Loading weights: 45%|█████████████████████████████████████████▉ | 130/291 [00:00<00:00, 2318.62it/s, Materializing param=model.layers.14.mlp.down_proj.weight] Loading weights: 45%|█████████████████████████████████████████▉ | 130/291 [00:00<00:00, 2312.85it/s, Materializing param=model.layers.14.mlp.down_proj.weight] Loading weights: 45%|██████████████████████████████████████████▎ | 131/291 [00:00<00:00, 2322.11it/s, Materializing param=model.layers.14.mlp.gate_proj.weight] Loading weights: 45%|██████████████████████████████████████████▎ | 131/291 [00:00<00:00, 2316.51it/s, Materializing param=model.layers.14.mlp.gate_proj.weight] Loading weights: 45%|███████████████████████████████████████████▌ | 132/291 [00:00<00:00, 2325.81it/s, Materializing param=model.layers.14.mlp.up_proj.weight] Loading weights: 45%|███████████████████████████████████████████▌ | 132/291 [00:00<00:00, 2320.07it/s, Materializing param=model.layers.14.mlp.up_proj.weight] Loading weights: 46%|█████████████████████████████████████▉ | 133/291 [00:00<00:00, 2331.43it/s, Materializing param=model.layers.14.post_attention_layernorm.weight] Loading weights: 46%|█████████████████████████████████████▉ | 133/291 [00:00<00:00, 2327.23it/s, Materializing param=model.layers.14.post_attention_layernorm.weight] Loading weights: 46%|█████████████████████████████████████████▉ | 134/291 [00:00<00:00, 2337.66it/s, Materializing param=model.layers.14.self_attn.k_proj.weight] Loading weights: 46%|█████████████████████████████████████████▉ | 134/291 [00:00<00:00, 2333.59it/s, Materializing param=model.layers.14.self_attn.k_proj.weight] Loading weights: 46%|██████████████████████████████████████████▏ | 135/291 [00:00<00:00, 2344.10it/s, Materializing param=model.layers.14.self_attn.o_proj.weight] Loading weights: 46%|██████████████████████████████████████████▏ | 135/291 [00:00<00:00, 2339.98it/s, Materializing param=model.layers.14.self_attn.o_proj.weight] Loading weights: 47%|██████████████████████████████████████████▌ | 136/291 [00:00<00:00, 2350.66it/s, Materializing param=model.layers.14.self_attn.q_proj.weight] Loading weights: 47%|██████████████████████████████████████████▌ | 136/291 [00:00<00:00, 2346.45it/s, Materializing param=model.layers.14.self_attn.q_proj.weight] Loading weights: 47%|██████████████████████████████████████████▊ | 137/291 [00:00<00:00, 2357.07it/s, Materializing param=model.layers.14.self_attn.v_proj.weight] Loading weights: 47%|██████████████████████████████████████████▊ | 137/291 [00:00<00:00, 2353.15it/s, Materializing param=model.layers.14.self_attn.v_proj.weight] Loading weights: 47%|███████████████████████████████████████████▋ | 138/291 [00:00<00:00, 2364.21it/s, Materializing param=model.layers.15.input_layernorm.weight] Loading weights: 47%|███████████████████████████████████████████▋ | 138/291 [00:00<00:00, 2360.21it/s, Materializing param=model.layers.15.input_layernorm.weight] Loading weights: 48%|████████████████████████████████████████████▉ | 139/291 [00:00<00:00, 2371.16it/s, Materializing param=model.layers.15.mlp.down_proj.weight] Loading weights: 48%|████████████████████████████████████████████▉ | 139/291 [00:00<00:00, 2367.24it/s, Materializing param=model.layers.15.mlp.down_proj.weight] Loading weights: 48%|█████████████████████████████████████████████▏ | 140/291 [00:00<00:00, 2377.68it/s, Materializing param=model.layers.15.mlp.gate_proj.weight] Loading weights: 48%|█████████████████████████████████████████████▏ | 140/291 [00:00<00:00, 2373.59it/s, Materializing param=model.layers.15.mlp.gate_proj.weight] Loading weights: 48%|██████████████████████████████████████████████▌ | 141/291 [00:00<00:00, 2368.74it/s, Materializing param=model.layers.15.mlp.up_proj.weight] Loading weights: 48%|██████████████████████████████████████████████▌ | 141/291 [00:00<00:00, 2364.58it/s, Materializing param=model.layers.15.mlp.up_proj.weight] Loading weights: 49%|████████████████████████████████████████▌ | 142/291 [00:00<00:00, 2357.54it/s, Materializing param=model.layers.15.post_attention_layernorm.weight] Loading weights: 49%|████████████████████████████████████████▌ | 142/291 [00:00<00:00, 2353.52it/s, Materializing param=model.layers.15.post_attention_layernorm.weight] Loading weights: 49%|████████████████████████████████████████████▋ | 143/291 [00:00<00:00, 2363.87it/s, Materializing param=model.layers.15.self_attn.k_proj.weight] Loading weights: 49%|████████████████████████████████████████████▋ | 143/291 [00:00<00:00, 2359.94it/s, Materializing param=model.layers.15.self_attn.k_proj.weight] Loading weights: 49%|█████████████████████████████████████████████ | 144/291 [00:00<00:00, 2370.47it/s, Materializing param=model.layers.15.self_attn.o_proj.weight] Loading weights: 49%|█████████████████████████████████████████████ | 144/291 [00:00<00:00, 2366.19it/s, Materializing param=model.layers.15.self_attn.o_proj.weight] Loading weights: 50%|█████████████████████████████████████████████▎ | 145/291 [00:00<00:00, 2376.06it/s, Materializing param=model.layers.15.self_attn.q_proj.weight] Loading weights: 50%|█████████████████████████████████████████████▎ | 145/291 [00:00<00:00, 2371.86it/s, Materializing param=model.layers.15.self_attn.q_proj.weight] Loading weights: 50%|█████████████████████████████████████████████▋ | 146/291 [00:00<00:00, 2382.32it/s, Materializing param=model.layers.15.self_attn.v_proj.weight] Loading weights: 50%|█████████████████████████████████████████████▋ | 146/291 [00:00<00:00, 2378.55it/s, Materializing param=model.layers.15.self_attn.v_proj.weight] Loading weights: 51%|██████████████████████████████████████████████▍ | 147/291 [00:00<00:00, 2388.93it/s, Materializing param=model.layers.16.input_layernorm.weight] Loading weights: 51%|██████████████████████████████████████████████▍ | 147/291 [00:00<00:00, 2385.07it/s, Materializing param=model.layers.16.input_layernorm.weight] Loading weights: 51%|███████████████████████████████████████████████▊ | 148/291 [00:00<00:00, 2395.18it/s, Materializing param=model.layers.16.mlp.down_proj.weight] Loading weights: 51%|███████████████████████████████████████████████▊ | 148/291 [00:00<00:00, 2391.23it/s, Materializing param=model.layers.16.mlp.down_proj.weight] Loading weights: 51%|████████████████████████████████████████████████▏ | 149/291 [00:00<00:00, 2383.02it/s, Materializing param=model.layers.16.mlp.gate_proj.weight] Loading weights: 51%|████████████████████████████████████████████████▏ | 149/291 [00:00<00:00, 2379.04it/s, Materializing param=model.layers.16.mlp.gate_proj.weight] Loading weights: 52%|█████████████████████████████████████████████████▍ | 150/291 [00:00<00:00, 2384.70it/s, Materializing param=model.layers.16.mlp.up_proj.weight] Loading weights: 52%|█████████████████████████████████████████████████▍ | 150/291 [00:00<00:00, 2360.80it/s, Materializing param=model.layers.16.mlp.up_proj.weight] Loading weights: 52%|███████████████████████████████████████████ | 151/291 [00:00<00:00, 2361.50it/s, Materializing param=model.layers.16.post_attention_layernorm.weight] Loading weights: 52%|███████████████████████████████████████████ | 151/291 [00:00<00:00, 2357.52it/s, Materializing param=model.layers.16.post_attention_layernorm.weight] Loading weights: 52%|███████████████████████████████████████████████▌ | 152/291 [00:00<00:00, 2367.23it/s, Materializing param=model.layers.16.self_attn.k_proj.weight] Loading weights: 52%|███████████████████████████████████████████████▌ | 152/291 [00:00<00:00, 2333.83it/s, Materializing param=model.layers.16.self_attn.k_proj.weight] Loading weights: 53%|███████████████████████████████████████████████▊ | 153/291 [00:00<00:00, 2342.49it/s, Materializing param=model.layers.16.self_attn.o_proj.weight] Loading weights: 53%|███████████████████████████████████████████████▊ | 153/291 [00:00<00:00, 2337.19it/s, Materializing param=model.layers.16.self_attn.o_proj.weight] Loading weights: 53%|████████████████████████████████████████████████▏ | 154/291 [00:00<00:00, 2346.77it/s, Materializing param=model.layers.16.self_attn.q_proj.weight] Loading weights: 53%|████████████████████████████████████████████████▏ | 154/291 [00:00<00:00, 2343.27it/s, Materializing param=model.layers.16.self_attn.q_proj.weight] Loading weights: 53%|████████████████████████████████████████████████▍ | 155/291 [00:00<00:00, 2352.62it/s, Materializing param=model.layers.16.self_attn.v_proj.weight] Loading weights: 53%|████████████████████████████████████████████████▍ | 155/291 [00:00<00:00, 2348.36it/s, Materializing param=model.layers.16.self_attn.v_proj.weight] Loading weights: 54%|█████████████████████████████████████████████████▎ | 156/291 [00:00<00:00, 2357.40it/s, Materializing param=model.layers.17.input_layernorm.weight] Loading weights: 54%|█████████████████████████████████████████████████▎ | 156/291 [00:00<00:00, 2353.84it/s, Materializing param=model.layers.17.input_layernorm.weight] Loading weights: 54%|██████████████████████████████████████████████████▋ | 157/291 [00:00<00:00, 2342.79it/s, Materializing param=model.layers.17.mlp.down_proj.weight] Loading weights: 54%|██████████████████████████████████████████████████▋ | 157/291 [00:00<00:00, 2338.63it/s, Materializing param=model.layers.17.mlp.down_proj.weight] Loading weights: 54%|███████████████████████████████████████████████████ | 158/291 [00:00<00:00, 2348.03it/s, Materializing param=model.layers.17.mlp.gate_proj.weight] Loading weights: 54%|███████████████████████████████████████████████████ | 158/291 [00:00<00:00, 2344.64it/s, Materializing param=model.layers.17.mlp.gate_proj.weight] Loading weights: 55%|████████████████████████████████████████████████████▍ | 159/291 [00:00<00:00, 2353.70it/s, Materializing param=model.layers.17.mlp.up_proj.weight] Loading weights: 55%|████████████████████████████████████████████████████▍ | 159/291 [00:00<00:00, 2350.28it/s, Materializing param=model.layers.17.mlp.up_proj.weight] Loading weights: 55%|█████████████████████████████████████████████▋ | 160/291 [00:00<00:00, 2358.92it/s, Materializing param=model.layers.17.post_attention_layernorm.weight] Loading weights: 55%|█████████████████████████████████████████████▋ | 160/291 [00:00<00:00, 2355.45it/s, Materializing param=model.layers.17.post_attention_layernorm.weight] Loading weights: 55%|██████████████████████████████████████████████████▎ | 161/291 [00:00<00:00, 2364.93it/s, Materializing param=model.layers.17.self_attn.k_proj.weight] Loading weights: 55%|██████████████████████████████████████████████████▎ | 161/291 [00:00<00:00, 2360.69it/s, Materializing param=model.layers.17.self_attn.k_proj.weight] Loading weights: 56%|██████████████████████████████████████████████████▋ | 162/291 [00:00<00:00, 2369.18it/s, Materializing param=model.layers.17.self_attn.o_proj.weight] Loading weights: 56%|██████████████████████████████████████████████████▋ | 162/291 [00:00<00:00, 2365.32it/s, Materializing param=model.layers.17.self_attn.o_proj.weight] Loading weights: 56%|██████████████████████████████████████████████████▉ | 163/291 [00:00<00:00, 2372.84it/s, Materializing param=model.layers.17.self_attn.q_proj.weight] Loading weights: 56%|██████████████████████████████████████████████████▉ | 163/291 [00:00<00:00, 2354.78it/s, Materializing param=model.layers.17.self_attn.q_proj.weight] Loading weights: 56%|███████████████████████████████████████████████████▎ | 164/291 [00:00<00:00, 2362.57it/s, Materializing param=model.layers.17.self_attn.v_proj.weight] Loading weights: 56%|███████████████████████████████████████████████████▎ | 164/291 [00:00<00:00, 2358.58it/s, Materializing param=model.layers.17.self_attn.v_proj.weight] Loading weights: 57%|████████████████████████████████████████████████████▏ | 165/291 [00:00<00:00, 2366.03it/s, Materializing param=model.layers.18.input_layernorm.weight] Loading weights: 57%|████████████████████████████████████████████████████▏ | 165/291 [00:00<00:00, 2357.58it/s, Materializing param=model.layers.18.input_layernorm.weight] Loading weights: 57%|█████████████████████████████████████████████████████▌ | 166/291 [00:00<00:00, 2355.20it/s, Materializing param=model.layers.18.mlp.down_proj.weight] Loading weights: 57%|█████████████████████████████████████████████████████▌ | 166/291 [00:00<00:00, 2351.12it/s, Materializing param=model.layers.18.mlp.down_proj.weight] Loading weights: 57%|█████████████████████████████████████████████████████▉ | 167/291 [00:00<00:00, 2358.69it/s, Materializing param=model.layers.18.mlp.gate_proj.weight] Loading weights: 57%|█████████████████████████████████████████████████████▉ | 167/291 [00:00<00:00, 2354.15it/s, Materializing param=model.layers.18.mlp.gate_proj.weight] Loading weights: 58%|███████████████████████████████████████████████████████▍ | 168/291 [00:00<00:00, 2361.98it/s, Materializing param=model.layers.18.mlp.up_proj.weight] Loading weights: 58%|███████████████████████████████████████████████████████▍ | 168/291 [00:00<00:00, 2356.31it/s, Materializing param=model.layers.18.mlp.up_proj.weight] Loading weights: 58%|████████████████████████████████████████████████▏ | 169/291 [00:00<00:00, 2363.47it/s, Materializing param=model.layers.18.post_attention_layernorm.weight] Loading weights: 58%|████████████████████████████████████████████████▏ | 169/291 [00:00<00:00, 2358.27it/s, Materializing param=model.layers.18.post_attention_layernorm.weight] Loading weights: 58%|█████████████████████████████████████████████████████▏ | 170/291 [00:00<00:00, 2355.16it/s, Materializing param=model.layers.18.self_attn.k_proj.weight] Loading weights: 58%|█████████████████████████████████████████████████████▏ | 170/291 [00:00<00:00, 2350.37it/s, Materializing param=model.layers.18.self_attn.k_proj.weight] Loading weights: 59%|█████████████████████████████████████████████████████▍ | 171/291 [00:00<00:00, 2357.16it/s, Materializing param=model.layers.18.self_attn.o_proj.weight] Loading weights: 59%|█████████████████████████████████████████████████████▍ | 171/291 [00:00<00:00, 2352.86it/s, Materializing param=model.layers.18.self_attn.o_proj.weight] Loading weights: 59%|█████████████████████████████████████████████████████▊ | 172/291 [00:00<00:00, 2359.32it/s, Materializing param=model.layers.18.self_attn.q_proj.weight] Loading weights: 59%|█████████████████████████████████████████████████████▊ | 172/291 [00:00<00:00, 2354.95it/s, Materializing param=model.layers.18.self_attn.q_proj.weight] Loading weights: 59%|██████████████████████████████████████████████████████ | 173/291 [00:00<00:00, 2354.78it/s, Materializing param=model.layers.18.self_attn.v_proj.weight] Loading weights: 59%|██████████████████████████████████████████████████████ | 173/291 [00:00<00:00, 2350.10it/s, Materializing param=model.layers.18.self_attn.v_proj.weight] Loading weights: 60%|███████████████████████████████████████████████████████ | 174/291 [00:00<00:00, 2356.77it/s, Materializing param=model.layers.19.input_layernorm.weight] Loading weights: 60%|███████████████████████████████████████████████████████ | 174/291 [00:00<00:00, 2352.30it/s, Materializing param=model.layers.19.input_layernorm.weight] Loading weights: 60%|████████████████████████████████████████████████████████▌ | 175/291 [00:00<00:00, 2358.94it/s, Materializing param=model.layers.19.mlp.down_proj.weight] Loading weights: 60%|████████████████████████████████████████████████████████▌ | 175/291 [00:00<00:00, 2354.59it/s, Materializing param=model.layers.19.mlp.down_proj.weight] Loading weights: 60%|████████████████████████████████████████████████████████▊ | 176/291 [00:00<00:00, 2361.26it/s, Materializing param=model.layers.19.mlp.gate_proj.weight] Loading weights: 60%|████████████████████████████████████████████████████████▊ | 176/291 [00:00<00:00, 2327.50it/s, Materializing param=model.layers.19.mlp.gate_proj.weight] Loading weights: 61%|██████████████████████████████████████████████████████████▍ | 177/291 [00:00<00:00, 2328.96it/s, Materializing param=model.layers.19.mlp.up_proj.weight] Loading weights: 61%|██████████████████████████████████████████████████████████▍ | 177/291 [00:00<00:00, 2324.63it/s, Materializing param=model.layers.19.mlp.up_proj.weight] Loading weights: 61%|██████████████████████████████████████████████████▊ | 178/291 [00:00<00:00, 2331.20it/s, Materializing param=model.layers.19.post_attention_layernorm.weight] Loading weights: 61%|██████████████████████████████████████████████████▊ | 178/291 [00:00<00:00, 2327.01it/s, Materializing param=model.layers.19.post_attention_layernorm.weight] Loading weights: 62%|███████████████████████████████████████████████████████▉ | 179/291 [00:00<00:00, 2333.70it/s, Materializing param=model.layers.19.self_attn.k_proj.weight] Loading weights: 62%|███████████████████████████████████████████████████████▉ | 179/291 [00:00<00:00, 2329.08it/s, Materializing param=model.layers.19.self_attn.k_proj.weight] Loading weights: 62%|████████████████████████████████████████████████████████▎ | 180/291 [00:00<00:00, 2334.85it/s, Materializing param=model.layers.19.self_attn.o_proj.weight] Loading weights: 62%|████████████████████████████████████████████████████████▎ | 180/291 [00:00<00:00, 2330.18it/s, Materializing param=model.layers.19.self_attn.o_proj.weight] Loading weights: 62%|████████████████████████████████████████████████████████▌ | 181/291 [00:00<00:00, 2336.14it/s, Materializing param=model.layers.19.self_attn.q_proj.weight] Loading weights: 62%|████████████████████████████████████████████████████████▌ | 181/291 [00:00<00:00, 2331.69it/s, Materializing param=model.layers.19.self_attn.q_proj.weight] Loading weights: 63%|████████████████████████████████████████████████████████▉ | 182/291 [00:00<00:00, 2332.97it/s, Materializing param=model.layers.19.self_attn.v_proj.weight] Loading weights: 63%|████████████████████████████████████████████████████████▉ | 182/291 [00:00<00:00, 2314.67it/s, Materializing param=model.layers.19.self_attn.v_proj.weight] Loading weights: 63%|█████████████████████████████████████████████████████████▊ | 183/291 [00:00<00:00, 2316.34it/s, Materializing param=model.layers.20.input_layernorm.weight] Loading weights: 63%|█████████████████████████████████████████████████████████▊ | 183/291 [00:00<00:00, 2311.58it/s, Materializing param=model.layers.20.input_layernorm.weight] Loading weights: 63%|███████████████████████████████████████████████████████████▍ | 184/291 [00:00<00:00, 2317.27it/s, Materializing param=model.layers.20.mlp.down_proj.weight] Loading weights: 63%|███████████████████████████████████████████████████████████▍ | 184/291 [00:00<00:00, 2313.99it/s, Materializing param=model.layers.20.mlp.down_proj.weight] Loading weights: 64%|███████████████████████████████████████████████████████████▊ | 185/291 [00:00<00:00, 2321.84it/s, Materializing param=model.layers.20.mlp.gate_proj.weight] Loading weights: 64%|███████████████████████████████████████████████████████████▊ | 185/291 [00:00<00:00, 2292.61it/s, Materializing param=model.layers.20.mlp.gate_proj.weight] Loading weights: 64%|█████████████████████████████████████████████████████████████▎ | 186/291 [00:00<00:00, 2298.94it/s, Materializing param=model.layers.20.mlp.up_proj.weight] Loading weights: 64%|█████████████████████████████████████████████████████████████▎ | 186/291 [00:00<00:00, 2295.77it/s, Materializing param=model.layers.20.mlp.up_proj.weight] Loading weights: 64%|█████████████████████████████████████████████████████▎ | 187/291 [00:00<00:00, 2303.10it/s, Materializing param=model.layers.20.post_attention_layernorm.weight] Loading weights: 64%|█████████████████████████████████████████████████████▎ | 187/291 [00:00<00:00, 2300.02it/s, Materializing param=model.layers.20.post_attention_layernorm.weight] Loading weights: 65%|██████████████████████████████████████████████████████████▊ | 188/291 [00:00<00:00, 2307.69it/s, Materializing param=model.layers.20.self_attn.k_proj.weight] Loading weights: 65%|██████████████████████████████████████████████████████████▊ | 188/291 [00:00<00:00, 2304.71it/s, Materializing param=model.layers.20.self_attn.k_proj.weight] Loading weights: 65%|███████████████████████████████████████████████████████████ | 189/291 [00:00<00:00, 2308.39it/s, Materializing param=model.layers.20.self_attn.o_proj.weight] Loading weights: 65%|███████████████████████████████████████████████████████████ | 189/291 [00:00<00:00, 2305.38it/s, Materializing param=model.layers.20.self_attn.o_proj.weight] Loading weights: 65%|███████████████████████████████████████████████████████████▍ | 190/291 [00:00<00:00, 2312.80it/s, Materializing param=model.layers.20.self_attn.q_proj.weight] Loading weights: 65%|███████████████████████████████████████████████████████████▍ | 190/291 [00:00<00:00, 2309.91it/s, Materializing param=model.layers.20.self_attn.q_proj.weight] Loading weights: 66%|███████████████████████████████████████████████████████████▋ | 191/291 [00:00<00:00, 2317.74it/s, Materializing param=model.layers.20.self_attn.v_proj.weight] Loading weights: 66%|███████████████████████████████████████████████████████████▋ | 191/291 [00:00<00:00, 2314.94it/s, Materializing param=model.layers.20.self_attn.v_proj.weight] Loading weights: 66%|████████████████████████████████████████████████████████████▋ | 192/291 [00:00<00:00, 2323.02it/s, Materializing param=model.layers.21.input_layernorm.weight] Loading weights: 66%|████████████████████████████████████████████████████████████▋ | 192/291 [00:00<00:00, 2320.21it/s, Materializing param=model.layers.21.input_layernorm.weight] Loading weights: 66%|██████████████████████████████████████████████████████████████▎ | 193/291 [00:00<00:00, 2327.67it/s, Materializing param=model.layers.21.mlp.down_proj.weight] Loading weights: 66%|██████████████████████████████████████████████████████████████▎ | 193/291 [00:00<00:00, 2324.80it/s, Materializing param=model.layers.21.mlp.down_proj.weight] Loading weights: 67%|██████████████████████████████████████████████████████████████▋ | 194/291 [00:00<00:00, 2332.41it/s, Materializing param=model.layers.21.mlp.gate_proj.weight] Loading weights: 67%|██████████████████████████████████████████████████████████████▋ | 194/291 [00:00<00:00, 2326.55it/s, Materializing param=model.layers.21.mlp.gate_proj.weight] Loading weights: 67%|████████████████████████████████████████████████████████████████▎ | 195/291 [00:00<00:00, 2329.39it/s, Materializing param=model.layers.21.mlp.up_proj.weight] Loading weights: 67%|████████████████████████████████████████████████████████████████▎ | 195/291 [00:00<00:00, 2326.24it/s, Materializing param=model.layers.21.mlp.up_proj.weight] Loading weights: 67%|███████████████████████████████████████████████████████▉ | 196/291 [00:00<00:00, 2333.62it/s, Materializing param=model.layers.21.post_attention_layernorm.weight] Loading weights: 67%|███████████████████████████████████████████████████████▉ | 196/291 [00:00<00:00, 2330.48it/s, Materializing param=model.layers.21.post_attention_layernorm.weight] Loading weights: 68%|█████████████████████████████████████████████████████████████▌ | 197/291 [00:00<00:00, 2337.39it/s, Materializing param=model.layers.21.self_attn.k_proj.weight] Loading weights: 68%|█████████████████████████████████████████████████████████████▌ | 197/291 [00:00<00:00, 2334.43it/s, Materializing param=model.layers.21.self_attn.k_proj.weight] Loading weights: 68%|█████████████████████████████████████████████████████████████▉ | 198/291 [00:00<00:00, 2318.34it/s, Materializing param=model.layers.21.self_attn.o_proj.weight] Loading weights: 68%|█████████████████████████████████████████████████████████████▉ | 198/291 [00:00<00:00, 2314.38it/s, Materializing param=model.layers.21.self_attn.o_proj.weight] Loading weights: 68%|██████████████████████████████████████████████████████████████▏ | 199/291 [00:00<00:00, 2321.65it/s, Materializing param=model.layers.21.self_attn.q_proj.weight] Loading weights: 68%|██████████████████████████████████████████████████████████████▏ | 199/291 [00:00<00:00, 2318.70it/s, Materializing param=model.layers.21.self_attn.q_proj.weight] Loading weights: 69%|██████████████████████████████████████████████████████████████▌ | 200/291 [00:00<00:00, 2326.03it/s, Materializing param=model.layers.21.self_attn.v_proj.weight] Loading weights: 69%|██████████████████████████████████████████████████████████████▌ | 200/291 [00:00<00:00, 2323.17it/s, Materializing param=model.layers.21.self_attn.v_proj.weight] Loading weights: 69%|███████████████████████████████████████████████████████████████▌ | 201/291 [00:00<00:00, 2330.35it/s, Materializing param=model.layers.22.input_layernorm.weight] Loading weights: 69%|███████████████████████████████████████████████████████████████▌ | 201/291 [00:00<00:00, 2327.65it/s, Materializing param=model.layers.22.input_layernorm.weight] Loading weights: 69%|█████████████████████████████████████████████████████████████████▎ | 202/291 [00:00<00:00, 2334.95it/s, Materializing param=model.layers.22.mlp.down_proj.weight] Loading weights: 69%|█████████████████████████████████████████████████████████████████▎ | 202/291 [00:00<00:00, 2332.16it/s, Materializing param=model.layers.22.mlp.down_proj.weight] Loading weights: 70%|█████████████████████████████████████████████████████████████████▌ | 203/291 [00:00<00:00, 2338.77it/s, Materializing param=model.layers.22.mlp.gate_proj.weight] Loading weights: 70%|█████████████████████████████████████████████████████████████████▌ | 203/291 [00:00<00:00, 2335.97it/s, Materializing param=model.layers.22.mlp.gate_proj.weight] Loading weights: 70%|███████████████████████████████████████████████████████████████████▎ | 204/291 [00:00<00:00, 2343.35it/s, Materializing param=model.layers.22.mlp.up_proj.weight] Loading weights: 70%|███████████████████████████████████████████████████████████████████▎ | 204/291 [00:00<00:00, 2340.49it/s, Materializing param=model.layers.22.mlp.up_proj.weight] Loading weights: 70%|██████████████████████████████████████████████████████████▍ | 205/291 [00:00<00:00, 2347.79it/s, Materializing param=model.layers.22.post_attention_layernorm.weight] Loading weights: 70%|██████████████████████████████████████████████████████████▍ | 205/291 [00:00<00:00, 2345.12it/s, Materializing param=model.layers.22.post_attention_layernorm.weight] Loading weights: 71%|████████████████████████████████████████████████████████████████▍ | 206/291 [00:00<00:00, 2352.33it/s, Materializing param=model.layers.22.self_attn.k_proj.weight] Loading weights: 71%|████████████████████████████████████████████████████████████████▍ | 206/291 [00:00<00:00, 2349.54it/s, Materializing param=model.layers.22.self_attn.k_proj.weight] Loading weights: 71%|████████████████████████████████████████████████████████████████▋ | 207/291 [00:00<00:00, 2356.58it/s, Materializing param=model.layers.22.self_attn.o_proj.weight] Loading weights: 71%|████████████████████████████████████████████████████████████████▋ | 207/291 [00:00<00:00, 2353.82it/s, Materializing param=model.layers.22.self_attn.o_proj.weight] Loading weights: 71%|█████████████████████████████████████████████████████████████████ | 208/291 [00:00<00:00, 2332.54it/s, Materializing param=model.layers.22.self_attn.q_proj.weight] Loading weights: 71%|█████████████████████████████████████████████████████████████████ | 208/291 [00:00<00:00, 2329.64it/s, Materializing param=model.layers.22.self_attn.q_proj.weight] Loading weights: 72%|█████████████████████████████████████████████████████████████████▎ | 209/291 [00:00<00:00, 2336.69it/s, Materializing param=model.layers.22.self_attn.v_proj.weight] Loading weights: 72%|█████████████████████████████████████████████████████████████████▎ | 209/291 [00:00<00:00, 2334.12it/s, Materializing param=model.layers.22.self_attn.v_proj.weight] Loading weights: 72%|██████████████████████████████████████████████████████████████████▍ | 210/291 [00:00<00:00, 2340.93it/s, Materializing param=model.layers.23.input_layernorm.weight] Loading weights: 72%|██████████████████████████████████████████████████████████████████▍ | 210/291 [00:00<00:00, 2337.94it/s, Materializing param=model.layers.23.input_layernorm.weight] Loading weights: 73%|████████████████████████████████████████████████████████████████████▏ | 211/291 [00:00<00:00, 2344.58it/s, Materializing param=model.layers.23.mlp.down_proj.weight] Loading weights: 73%|████████████████████████████████████████████████████████████████████▏ | 211/291 [00:00<00:00, 2341.83it/s, Materializing param=model.layers.23.mlp.down_proj.weight] Loading weights: 73%|████████████████████████████████████████████████████████████████████▍ | 212/291 [00:00<00:00, 2346.75it/s, Materializing param=model.layers.23.mlp.gate_proj.weight] Loading weights: 73%|████████████████████████████████████████████████████████████████████▍ | 212/291 [00:00<00:00, 2343.16it/s, Materializing param=model.layers.23.mlp.gate_proj.weight] Loading weights: 73%|██████████████████████████████████████████████████████████████████████▎ | 213/291 [00:00<00:00, 2348.42it/s, Materializing param=model.layers.23.mlp.up_proj.weight] Loading weights: 73%|██████████████████████████████████████████████████████████████████████▎ | 213/291 [00:00<00:00, 2344.31it/s, Materializing param=model.layers.23.mlp.up_proj.weight] Loading weights: 74%|█████████████████████████████████████████████████████████████ | 214/291 [00:00<00:00, 2349.73it/s, Materializing param=model.layers.23.post_attention_layernorm.weight] Loading weights: 74%|█████████████████████████████████████████████████████████████ | 214/291 [00:00<00:00, 2345.97it/s, Materializing param=model.layers.23.post_attention_layernorm.weight] Loading weights: 74%|███████████████████████████████████████████████████████████████████▏ | 215/291 [00:00<00:00, 2351.42it/s, Materializing param=model.layers.23.self_attn.k_proj.weight] Loading weights: 74%|███████████████████████████████████████████████████████████████████▏ | 215/291 [00:00<00:00, 2347.96it/s, Materializing param=model.layers.23.self_attn.k_proj.weight] Loading weights: 74%|███████████████████████████████████████████████████████████████████▌ | 216/291 [00:00<00:00, 2333.48it/s, Materializing param=model.layers.23.self_attn.o_proj.weight] Loading weights: 74%|███████████████████████████████████████████████████████████████████▌ | 216/291 [00:00<00:00, 2325.01it/s, Materializing param=model.layers.23.self_attn.o_proj.weight] Loading weights: 75%|███████████████████████████████████████████████████████████████████▊ | 217/291 [00:00<00:00, 2322.42it/s, Materializing param=model.layers.23.self_attn.q_proj.weight] Loading weights: 75%|███████████████████████████████████████████████████████████████████▊ | 217/291 [00:00<00:00, 2318.58it/s, Materializing param=model.layers.23.self_attn.q_proj.weight] Loading weights: 75%|████████████████████████████████████████████████████████████████████▏ | 218/291 [00:00<00:00, 2323.52it/s, Materializing param=model.layers.23.self_attn.v_proj.weight] Loading weights: 75%|████████████████████████████████████████████████████████████████████▏ | 218/291 [00:00<00:00, 2320.07it/s, Materializing param=model.layers.23.self_attn.v_proj.weight] Loading weights: 75%|█████████████████████████████████████████████████████████████████████▏ | 219/291 [00:00<00:00, 2325.03it/s, Materializing param=model.layers.24.input_layernorm.weight] Loading weights: 75%|█████████████████████████████████████████████████████████████████████▏ | 219/291 [00:00<00:00, 2321.56it/s, Materializing param=model.layers.24.input_layernorm.weight] Loading weights: 76%|███████████████████████████████████████████████████████████████████████ | 220/291 [00:00<00:00, 2311.04it/s, Materializing param=model.layers.24.mlp.down_proj.weight] Loading weights: 76%|███████████████████████████████████████████████████████████████████████ | 220/291 [00:00<00:00, 2307.39it/s, Materializing param=model.layers.24.mlp.down_proj.weight] Loading weights: 76%|███████████████████████████████████████████████████████████████████████▍ | 221/291 [00:00<00:00, 2312.44it/s, Materializing param=model.layers.24.mlp.gate_proj.weight] Loading weights: 76%|███████████████████████████████████████████████████████████████████████▍ | 221/291 [00:00<00:00, 2309.11it/s, Materializing param=model.layers.24.mlp.gate_proj.weight] Loading weights: 76%|█████████████████████████████████████████████████████████████████████████▏ | 222/291 [00:00<00:00, 2302.18it/s, Materializing param=model.layers.24.mlp.up_proj.weight] Loading weights: 76%|█████████████████████████████████████████████████████████████████████████▏ | 222/291 [00:00<00:00, 2298.57it/s, Materializing param=model.layers.24.mlp.up_proj.weight] Loading weights: 77%|███████████████████████████████████████████████████████████████▌ | 223/291 [00:00<00:00, 2296.49it/s, Materializing param=model.layers.24.post_attention_layernorm.weight] Loading weights: 77%|███████████████████████████████████████████████████████████████▌ | 223/291 [00:00<00:00, 2292.81it/s, Materializing param=model.layers.24.post_attention_layernorm.weight] Loading weights: 77%|██████████████████████████████████████████████████████████████████████ | 224/291 [00:00<00:00, 2297.78it/s, Materializing param=model.layers.24.self_attn.k_proj.weight] Loading weights: 77%|██████████████████████████████████████████████████████████████████████ | 224/291 [00:00<00:00, 2294.54it/s, Materializing param=model.layers.24.self_attn.k_proj.weight] Loading weights: 77%|██████████████████████████████████████████████████████████████████████▎ | 225/291 [00:00<00:00, 2299.40it/s, Materializing param=model.layers.24.self_attn.o_proj.weight] Loading weights: 77%|██████████████████████████████████████████████████████████████████████▎ | 225/291 [00:00<00:00, 2296.06it/s, Materializing param=model.layers.24.self_attn.o_proj.weight] Loading weights: 78%|██████████████████████████████████████████████████████████████████████▋ | 226/291 [00:00<00:00, 2301.25it/s, Materializing param=model.layers.24.self_attn.q_proj.weight] Loading weights: 78%|██████████████████████████████████████████████████████████████████████▋ | 226/291 [00:00<00:00, 2298.01it/s, Materializing param=model.layers.24.self_attn.q_proj.weight] Loading weights: 78%|██████████████████████████████████████████████████████████████████████▉ | 227/291 [00:00<00:00, 2295.75it/s, Materializing param=model.layers.24.self_attn.v_proj.weight] Loading weights: 78%|██████████████████████████████████████████████████████████████████████▉ | 227/291 [00:00<00:00, 2292.50it/s, Materializing param=model.layers.24.self_attn.v_proj.weight] Loading weights: 78%|████████████████████████████████████████████████████████████████████████ | 228/291 [00:00<00:00, 2297.53it/s, Materializing param=model.layers.25.input_layernorm.weight] Loading weights: 78%|████████████████████████████████████████████████████████████████████████ | 228/291 [00:00<00:00, 2294.29it/s, Materializing param=model.layers.25.input_layernorm.weight] Loading weights: 79%|█████████████████████████████████████████████████████████████████████████▉ | 229/291 [00:00<00:00, 2299.44it/s, Materializing param=model.layers.25.mlp.down_proj.weight] Loading weights: 79%|█████████████████████████████████████████████████████████████████████████▉ | 229/291 [00:00<00:00, 2296.20it/s, Materializing param=model.layers.25.mlp.down_proj.weight] Loading weights: 79%|██████████████████████████████████████████████████████████████████████████▎ | 230/291 [00:00<00:00, 2300.86it/s, Materializing param=model.layers.25.mlp.gate_proj.weight] Loading weights: 79%|██████████████████████████████████████████████████████████████████████████▎ | 230/291 [00:00<00:00, 2297.43it/s, Materializing param=model.layers.25.mlp.gate_proj.weight] Loading weights: 79%|██████████████████████████████████████████████████████████████████████████▌ | 231/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.mlp.gate_proj.weight] Loading weights: 79%|████████████████████████████████████████████████████████████████████████████▏ | 231/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.mlp.up_proj.weight] Loading weights: 79%|████████████████████████████████████████████████████████████████████████████▏ | 231/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.mlp.up_proj.weight] Loading weights: 80%|██████████████████████████████████████████████████████████████████▏ | 232/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.post_attention_layernorm.weight] Loading weights: 80%|██████████████████████████████████████████████████████████████████▏ | 232/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.post_attention_layernorm.weight] Loading weights: 80%|████████████████████████████████████████████████████████████████████████▊ | 233/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.k_proj.weight] Loading weights: 80%|████████████████████████████████████████████████████████████████████████▊ | 233/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.k_proj.weight] Loading weights: 80%|█████████████████████████████████████████████████████████████████████████▏ | 234/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.o_proj.weight] Loading weights: 80%|█████████████████████████████████████████████████████████████████████████▏ | 234/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.o_proj.weight] Loading weights: 81%|█████████████████████████████████████████████████████████████████████████▍ | 235/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.q_proj.weight] Loading weights: 81%|█████████████████████████████████████████████████████████████████████████▍ | 235/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.q_proj.weight] Loading weights: 81%|█████████████████████████████████████████████████████████████████████████▊ | 236/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.v_proj.weight] Loading weights: 81%|█████████████████████████████████████████████████████████████████████████▊ | 236/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.25.self_attn.v_proj.weight] Loading weights: 81%|██████████████████████████████████████████████████████████████████████████▉ | 237/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.input_layernorm.weight] Loading weights: 81%|██████████████████████████████████████████████████████████████████████████▉ | 237/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.input_layernorm.weight] Loading weights: 82%|████████████████████████████████████████████████████████████████████████████▉ | 238/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.down_proj.weight] Loading weights: 82%|████████████████████████████████████████████████████████████████████████████▉ | 238/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.down_proj.weight] Loading weights: 82%|█████████████████████████████████████████████████████████████████████████████▏ | 239/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.gate_proj.weight] Loading weights: 82%|█████████████████████████████████████████████████████████████████████████████▏ | 239/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.gate_proj.weight] Loading weights: 82%|███████████████████████████████████████████████████████████████████████████████▏ | 240/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.up_proj.weight] Loading weights: 82%|███████████████████████████████████████████████████████████████████████████████▏ | 240/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.mlp.up_proj.weight] Loading weights: 83%|████████████████████████████████████████████████████████████████████▋ | 241/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.post_attention_layernorm.weight] Loading weights: 83%|████████████████████████████████████████████████████████████████████▋ | 241/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.post_attention_layernorm.weight] Loading weights: 83%|███████████████████████████████████████████████████████████████████████████▋ | 242/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.k_proj.weight] Loading weights: 83%|███████████████████████████████████████████████████████████████████████████▋ | 242/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.k_proj.weight] Loading weights: 84%|███████████████████████████████████████████████████████████████████████████▉ | 243/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.o_proj.weight] Loading weights: 84%|███████████████████████████████████████████████████████████████████████████▉ | 243/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.o_proj.weight] Loading weights: 84%|████████████████████████████████████████████████████████████████████████████▎ | 244/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.q_proj.weight] Loading weights: 84%|████████████████████████████████████████████████████████████████████████████▎ | 244/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.q_proj.weight] Loading weights: 84%|████████████████████████████████████████████████████████████████████████████▌ | 245/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.v_proj.weight] Loading weights: 84%|████████████████████████████████████████████████████████████████████████████▌ | 245/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.26.self_attn.v_proj.weight] Loading weights: 85%|█████████████████████████████████████████████████████████████████████████████▊ | 246/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.input_layernorm.weight] Loading weights: 85%|█████████████████████████████████████████████████████████████████████████████▊ | 246/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.input_layernorm.weight] Loading weights: 85%|███████████████████████████████████████████████████████████████████████████████▊ | 247/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.down_proj.weight] Loading weights: 85%|███████████████████████████████████████████████████████████████████████████████▊ | 247/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.down_proj.weight] Loading weights: 85%|████████████████████████████████████████████████████████████████████████████████ | 248/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight] Loading weights: 85%|████████████████████████████████████████████████████████████████████████████████ | 248/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight] Loading weights: 86%|██████████████████████████████████████████████████████████████████████████████████▏ | 249/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight] Loading weights: 86%|██████████████████████████████████████████████████████████████████████████████████▏ | 249/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight] Loading weights: 86%|███████████████████████████████████████████████████████████████████████▎ | 250/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight] Loading weights: 86%|███████████████████████████████████████████████████████████████████████▎ | 250/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight] Loading weights: 86%|██████████████████████████████████████████████████████████████████████████████▍ | 251/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight] Loading weights: 86%|██████████████████████████████████████████████████████████████████████████████▍ | 251/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight] Loading weights: 87%|██████████████████████████████████████████████████████████████████████████████▊ | 252/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight] Loading weights: 87%|██████████████████████████████████████████████████████████████████████████████▊ | 252/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight] Loading weights: 87%|███████████████████████████████████████████████████████████████████████████████ | 253/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight] Loading weights: 87%|███████████████████████████████████████████████████████████████████████████████ | 253/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight] Loading weights: 87%|███████████████████████████████████████████████████████████████████████████████▍ | 254/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.v_proj.weight] Loading weights: 87%|███████████████████████████████████████████████████████████████████████████████▍ | 254/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.27.self_attn.v_proj.weight] Loading weights: 88%|████████████████████████████████████████████████████████████████████████████████▌ | 255/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.input_layernorm.weight] Loading weights: 88%|████████████████████████████████████████████████████████████████████████████████▌ | 255/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.input_layernorm.weight] Loading weights: 88%|██████████████████████████████████████████████████████████████████████████████████▋ | 256/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.down_proj.weight] Loading weights: 88%|██████████████████████████████████████████████████████████████████████████████████▋ | 256/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.down_proj.weight] Loading weights: 88%|███████████████████████████████████████████████████████████████████████████████████ | 257/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.gate_proj.weight] Loading weights: 88%|███████████████████████████████████████████████████████████████████████████████████ | 257/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.gate_proj.weight] Loading weights: 89%|█████████████████████████████████████████████████████████████████████████████████████ | 258/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.up_proj.weight] Loading weights: 89%|█████████████████████████████████████████████████████████████████████████████████████ | 258/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.mlp.up_proj.weight] Loading weights: 89%|█████████████████████████████████████████████████████████████████████████▊ | 259/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.post_attention_layernorm.weight] Loading weights: 89%|█████████████████████████████████████████████████████████████████████████▊ | 259/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.post_attention_layernorm.weight] Loading weights: 89%|█████████████████████████████████████████████████████████████████████████████████▎ | 260/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.k_proj.weight] Loading weights: 89%|█████████████████████████████████████████████████████████████████████████████████▎ | 260/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.k_proj.weight] Loading weights: 90%|█████████████████████████████████████████████████████████████████████████████████▌ | 261/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.o_proj.weight] Loading weights: 90%|█████████████████████████████████████████████████████████████████████████████████▌ | 261/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.o_proj.weight] Loading weights: 90%|█████████████████████████████████████████████████████████████████████████████████▉ | 262/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.q_proj.weight] Loading weights: 90%|█████████████████████████████████████████████████████████████████████████████████▉ | 262/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.q_proj.weight] Loading weights: 90%|██████████████████████████████████████████████████████████████████████████████████▏ | 263/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.v_proj.weight] Loading weights: 90%|██████████████████████████████████████████████████████████████████████████████████▏ | 263/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.28.self_attn.v_proj.weight] Loading weights: 91%|███████████████████████████████████████████████████████████████████████████████████▍ | 264/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.input_layernorm.weight] Loading weights: 91%|███████████████████████████████████████████████████████████████████████████████████▍ | 264/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.input_layernorm.weight] Loading weights: 91%|█████████████████████████████████████████████████████████████████████████████████████▌ | 265/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.down_proj.weight] Loading weights: 91%|█████████████████████████████████████████████████████████████████████████████████████▌ | 265/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.down_proj.weight] Loading weights: 91%|█████████████████████████████████████████████████████████████████████████████████████▉ | 266/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.gate_proj.weight] Loading weights: 91%|█████████████████████████████████████████████████████████████████████████████████████▉ | 266/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.gate_proj.weight] Loading weights: 92%|████████████████████████████████████████████████████████████████████████████████████████ | 267/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.up_proj.weight] Loading weights: 92%|████████████████████████████████████████████████████████████████████████████████████████ | 267/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.mlp.up_proj.weight] Loading weights: 92%|████████████████████████████████████████████████████████████████████████████▍ | 268/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.post_attention_layernorm.weight] Loading weights: 92%|████████████████████████████████████████████████████████████████████████████▍ | 268/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.post_attention_layernorm.weight] Loading weights: 92%|████████████████████████████████████████████████████████████████████████████████████ | 269/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.k_proj.weight] Loading weights: 92%|████████████████████████████████████████████████████████████████████████████████████ | 269/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.k_proj.weight] Loading weights: 93%|████████████████████████████████████████████████████████████████████████████████████▍ | 270/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.o_proj.weight] Loading weights: 93%|████████████████████████████████████████████████████████████████████████████████████▍ | 270/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.o_proj.weight] Loading weights: 93%|████████████████████████████████████████████████████████████████████████████████████▋ | 271/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.q_proj.weight] Loading weights: 93%|████████████████████████████████████████████████████████████████████████████████████▋ | 271/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.q_proj.weight] Loading weights: 93%|█████████████████████████████████████████████████████████████████████████████████████ | 272/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.v_proj.weight] Loading weights: 93%|█████████████████████████████████████████████████████████████████████████████████████ | 272/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.29.self_attn.v_proj.weight] Loading weights: 94%|██████████████████████████████████████████████████████████████████████████████████████▎ | 273/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.input_layernorm.weight] Loading weights: 94%|██████████████████████████████████████████████████████████████████████████████████████▎ | 273/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.input_layernorm.weight] Loading weights: 94%|████████████████████████████████████████████████████████████████████████████████████████▌ | 274/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.down_proj.weight] Loading weights: 94%|████████████████████████████████████████████████████████████████████████████████████████▌ | 274/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.down_proj.weight] Loading weights: 95%|████████████████████████████████████████████████████████████████████████████████████████▊ | 275/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.gate_proj.weight] Loading weights: 95%|████████████████████████████████████████████████████████████████████████████████████████▊ | 275/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.gate_proj.weight] Loading weights: 95%|███████████████████████████████████████████████████████████████████████████████████████████ | 276/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.up_proj.weight] Loading weights: 95%|███████████████████████████████████████████████████████████████████████████████████████████ | 276/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.mlp.up_proj.weight] Loading weights: 95%|███████████████████████████████████████████████████████████████████████████████ | 277/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.post_attention_layernorm.weight] Loading weights: 95%|███████████████████████████████████████████████████████████████████████████████ | 277/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.post_attention_layernorm.weight] Loading weights: 96%|██████████████████████████████████████████████████████████████████████████████████████▉ | 278/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.k_proj.weight] Loading weights: 96%|██████████████████████████████████████████████████████████████████████████████████████▉ | 278/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.k_proj.weight] Loading weights: 96%|███████████████████████████████████████████████████████████████████████████████████████▏ | 279/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.o_proj.weight] Loading weights: 96%|███████████████████████████████████████████████████████████████████████████████████████▏ | 279/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.o_proj.weight] Loading weights: 96%|███████████████████████████████████████████████████████████████████████████████████████▌ | 280/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.q_proj.weight] Loading weights: 96%|███████████████████████████████████████████████████████████████████████████████████████▌ | 280/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.q_proj.weight] Loading weights: 97%|███████████████████████████████████████████████████████████████████████████████████████▊ | 281/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.v_proj.weight] Loading weights: 97%|███████████████████████████████████████████████████████████████████████████████████████▊ | 281/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.30.self_attn.v_proj.weight] Loading weights: 97%|█████████████████████████████████████████████████████████████████████████████████████████▏ | 282/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.input_layernorm.weight] Loading weights: 97%|█████████████████████████████████████████████████████████████████████████████████████████▏ | 282/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.input_layernorm.weight] Loading weights: 97%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 283/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.down_proj.weight] Loading weights: 97%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 283/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.down_proj.weight] Loading weights: 98%|███████████████████████████████████████████████████████████████████████████████████████████▋ | 284/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.gate_proj.weight] Loading weights: 98%|███████████████████████████████████████████████████████████████████████████████████████████▋ | 284/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.gate_proj.weight] Loading weights: 98%|██████████████████████████████████████████████████████████████████████████████████████████████ | 285/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.up_proj.weight] Loading weights: 98%|██████████████████████████████████████████████████████████████████████████████████████████████ | 285/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.mlp.up_proj.weight] Loading weights: 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 286/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.post_attention_layernorm.weight] Loading weights: 98%|█████████████████████████████████████████████████████████████████████████████████▌ | 286/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.post_attention_layernorm.weight] Loading weights: 99%|█████████████████████████████████████████████████████████████████████████████████████████▋ | 287/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.k_proj.weight] Loading weights: 99%|█████████████████████████████████████████████████████████████████████████████████████████▋ | 287/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.k_proj.weight] Loading weights: 99%|██████████████████████████████████████████████████████████████████████████████████████████ | 288/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.o_proj.weight] Loading weights: 99%|██████████████████████████████████████████████████████████████████████████████████████████ | 288/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.o_proj.weight] Loading weights: 99%|██████████████████████████████████████████████████████████████████████████████████████████▎| 289/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.q_proj.weight] Loading weights: 99%|██████████████████████████████████████████████████████████████████████████████████████████▎| 289/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.q_proj.weight] Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████▋| 290/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.v_proj.weight] Loading weights: 100%|██████████████████████████████████████████████████████████████████████████████████████████▋| 290/291 [00:00<00:00, 2280.38it/s, Materializing param=model.layers.31.self_attn.v_proj.weight] Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:00<00:00, 2280.38it/s, Materializing param=model.norm.weight] Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:00<00:00, 2280.38it/s, Materializing param=model.norm.weight] Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:00<00:00, 2332.89it/s, Materializing param=model.norm.weight]
[2026-02-15 04:01:21,136] [WARNING] [accelerate.utils.dataclasses.__post_init__:1962] [PID:6181] sharding_strategy is deprecated in favor of reshard_after_forward. This will be removed in a future version of Accelerate.Multiple deprecation warnings due to FSDP2 conversion:
sync_module_states is obsolete in FSDP2, as it is not needed anymore.Setting sync_module_states to None.
[2026-02-15 04:01:24,329] [WARNING] [py.warnings._showwarnmsg:110] [PID:6181] /root/axolotl/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once