|
Loading weights: 0%| | 0/291 [00:00<?, ?it/s]
Loading weights: 0%| | 1/291 [00:00<00:00, 14364.05it/s, Materializing param=lm_head.weight]
Loading weights: 0%| | 1/291 [00:00<00:00, 7781.64it/s, Materializing param=lm_head.weight]
Loading weights: 1%| | 2/291 [00:00<00:38, 7.58it/s, Materializing param=lm_head.weight]
Loading weights: 1%| | 2/291 [00:00<00:38, 7.58it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%| | 2/291 [00:00<00:38, 7.58it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%| | 3/291 [00:00<00:38, 7.58it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%| | 3/291 [00:00<00:38, 7.58it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%|β | 4/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 1%|β | 4/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 2%|β | 5/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 2%|β | 5/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 2%|β | 6/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|β | 6/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|β | 7/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 2%|β | 7/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 3%|β | 8/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 3%|β | 8/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 3%|β | 9/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 3%|β | 9/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 3%|β | 10/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 3%|β | 10/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 3%|β | 10/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 4%|β | 11/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 4%|β | 11/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 4%|β | 12/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 4%|β | 12/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 4%|β | 13/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 4%|β | 13/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 5%|β | 14/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 5%|β | 14/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 5%|β | 15/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 5%|β | 15/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 5%|β | 15/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 5%|β | 16/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 5%|β | 16/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 6%|β | 17/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 6%|β | 17/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 6%|β | 18/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 6%|β | 18/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 7%|β | 19/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 7%|β | 19/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 7%|β | 20/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 7%|β | 20/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 7%|β | 20/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 7%|β | 21/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 7%|β | 21/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 8%|β | 22/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 8%|β | 22/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 8%|β | 23/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 8%|β | 23/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 8%|β | 24/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 8%|β | 24/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 8%|β | 24/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 9%|β | 25/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 9%|β | 25/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 9%|β | 26/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 9%|β | 26/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 9%|β | 27/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 9%|β | 27/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 10%|β | 28/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 10%|β | 28/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 10%|β | 29/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 10%|β | 29/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 10%|β | 29/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 10%|β | 30/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 10%|β | 30/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 11%|β | 31/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 11%|β | 31/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 11%|β | 32/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 11%|β | 32/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 11%|ββ | 33/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 11%|ββ | 33/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 12%|ββ | 34/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 12%|ββ | 34/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 12%|ββ | 34/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 12%|ββ | 35/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 12%|ββ | 35/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 12%|ββ | 36/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 12%|ββ | 36/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 13%|ββ | 37/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 13%|ββ | 37/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 13%|ββ | 38/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 13%|ββ | 38/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 13%|ββ | 39/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 13%|ββ | 39/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 14%|ββ | 40/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 14%|ββ | 40/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 14%|ββ | 41/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 14%|ββ | 41/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 14%|ββ | 41/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 14%|ββ | 42/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 14%|ββ | 42/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 15%|ββ | 43/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 15%|ββ | 43/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 15%|ββ | 44/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 15%|ββ | 44/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 15%|ββ | 45/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 15%|ββ | 45/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 16%|ββ | 46/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 16%|ββ | 46/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 16%|ββ | 47/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 16%|ββ | 47/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 16%|ββ | 48/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 16%|ββ | 48/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 17%|ββ | 49/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 17%|ββ | 49/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 17%|ββ | 50/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 17%|ββ | 50/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 17%|ββ | 50/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 18%|ββ | 51/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 18%|ββ | 51/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 18%|ββ | 52/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 18%|ββ | 52/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 18%|ββ | 53/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 18%|ββ | 53/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 19%|ββ | 54/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 19%|ββ | 54/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 19%|ββ | 55/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 19%|ββ | 55/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 19%|ββ | 56/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 19%|ββ | 56/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 20%|ββ | 57/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 20%|ββ | 57/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 20%|ββ | 58/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 20%|ββ | 58/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 20%|ββ | 59/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 20%|ββ | 59/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 20%|ββ | 59/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 21%|ββ | 60/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 21%|ββ | 60/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 21%|ββ | 61/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 21%|ββ | 61/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 21%|βββ | 62/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 21%|βββ | 62/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 22%|βββ | 63/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 22%|βββ | 63/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 22%|βββ | 64/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 22%|βββ | 64/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 22%|βββ | 65/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 22%|βββ | 65/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 23%|βββ | 66/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 23%|βββ | 66/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 23%|βββ | 67/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 23%|βββ | 67/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 23%|βββ | 68/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 23%|βββ | 68/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 23%|βββ | 68/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 24%|βββ | 69/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 24%|βββ | 69/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 24%|βββ | 70/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 24%|βββ | 70/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 24%|βββ | 71/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 24%|βββ | 71/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 25%|βββ | 72/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 25%|βββ | 72/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 25%|βββ | 73/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 25%|βββ | 73/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 25%|βββ | 74/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 25%|βββ | 74/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 26%|βββ | 75/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 26%|βββ | 75/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 26%|βββ | 76/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 26%|βββ | 76/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 26%|βββ | 77/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 26%|βββ | 77/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 26%|βββ | 77/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 27%|βββ | 78/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 27%|βββ | 78/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 27%|βββ | 79/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 27%|βββ | 79/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 27%|βββ | 80/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 27%|βββ | 80/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 28%|βββ | 81/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 28%|βββ | 81/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 28%|βββ | 82/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 28%|βββ | 82/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 29%|βββ | 83/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 29%|βββ | 83/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 29%|βββ | 84/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 29%|βββ | 84/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 29%|βββ | 85/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 29%|βββ | 85/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 30%|βββ | 86/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 30%|βββ | 86/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 30%|βββ | 86/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 30%|βββ | 87/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 30%|βββ | 87/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 30%|βββ | 88/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 30%|βββ | 88/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 31%|βββ | 89/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 31%|βββ | 89/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 31%|βββ | 90/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 31%|βββ | 90/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 31%|ββββ | 91/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 31%|ββββ | 91/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 32%|ββββ | 92/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 32%|ββββ | 92/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 32%|ββββ | 93/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 32%|ββββ | 93/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 32%|ββββ | 94/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 32%|ββββ | 94/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 33%|ββββ | 95/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 33%|ββββ | 95/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 33%|ββββ | 95/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 33%|ββββ | 96/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 33%|ββββ | 96/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 33%|ββββ | 97/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 33%|ββββ | 97/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 34%|ββββ | 98/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 34%|ββββ | 98/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 34%|ββββ | 99/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 34%|ββββ | 99/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 34%|ββββ | 100/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 34%|ββββ | 100/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 35%|ββββ | 101/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 35%|ββββ | 101/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 35%|ββββ | 101/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 35%|ββββ | 102/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 35%|ββββ | 102/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 35%|ββββ | 103/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 35%|ββββ | 103/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 36%|ββββ | 104/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 36%|ββββ | 104/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 36%|ββββ | 105/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 36%|ββββ | 105/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 36%|ββββ | 106/291 [00:02<00:05, 31.69it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 36%|ββββ | 106/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 36%|ββββ | 106/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 37%|ββββ | 107/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 37%|ββββ | 107/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 37%|ββββ | 108/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 37%|ββββ | 108/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 37%|ββββ | 109/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 37%|ββββ | 109/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 38%|ββββ | 110/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 38%|ββββ | 110/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 38%|ββββ | 111/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 38%|ββββ | 111/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 38%|ββββ | 112/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 38%|ββββ | 112/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 39%|ββββ | 113/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 39%|ββββ | 113/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 39%|ββββ | 113/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 39%|ββββ | 114/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 39%|ββββ | 114/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 40%|ββββ | 115/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 40%|ββββ | 115/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 40%|ββββ | 116/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 40%|ββββ | 116/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 40%|ββββ | 117/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 40%|ββββ | 117/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 40%|ββββ | 117/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 41%|ββββ | 118/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 41%|ββββ | 118/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 41%|ββββ | 119/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 41%|ββββ | 119/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 41%|ββββ | 120/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 41%|ββββ | 120/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 42%|βββββ | 121/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 42%|βββββ | 121/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 42%|βββββ | 122/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 42%|βββββ | 122/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 42%|βββββ | 123/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 42%|βββββ | 123/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 42%|βββββ | 123/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 43%|βββββ | 124/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 43%|βββββ | 124/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 43%|βββββ | 125/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 43%|βββββ | 125/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 43%|βββββ | 126/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 43%|βββββ | 126/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 44%|βββββ | 127/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 44%|βββββ | 127/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 44%|βββββ | 128/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 44%|βββββ | 128/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 44%|βββββ | 129/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 44%|βββββ | 129/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 45%|βββββ | 130/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 45%|βββββ | 130/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 45%|βββββ | 131/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 45%|βββββ | 131/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 45%|βββββ | 131/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 45%|βββββ | 132/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 45%|βββββ | 132/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 46%|βββββ | 133/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 46%|βββββ | 133/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 46%|βββββ | 134/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 46%|βββββ | 134/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 46%|βββββ | 135/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 46%|βββββ | 135/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 47%|βββββ | 136/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 47%|βββββ | 136/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 47%|βββββ | 137/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 47%|βββββ | 137/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 47%|βββββ | 138/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 47%|βββββ | 138/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 47%|βββββ | 138/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 48%|βββββ | 139/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 48%|βββββ | 139/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 48%|βββββ | 140/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 48%|βββββ | 140/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 48%|βββββ | 141/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 48%|βββββ | 141/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 49%|βββββ | 142/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 49%|βββββ | 142/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 49%|βββββ | 142/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 49%|βββββ | 143/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 49%|βββββ | 143/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 49%|βββββ | 144/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 49%|βββββ | 144/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 50%|βββββ | 145/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 50%|βββββ | 145/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 50%|βββββ | 146/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 50%|βββββ | 146/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 51%|βββββ | 147/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 51%|βββββ | 147/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 51%|βββββ | 148/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 51%|βββββ | 148/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 51%|βββββ | 149/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 51%|βββββ | 149/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 51%|βββββ | 149/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 52%|ββββββ | 150/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 52%|ββββββ | 150/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 52%|ββββββ | 151/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 52%|ββββββ | 151/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 52%|ββββββ | 152/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 52%|ββββββ | 152/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 52%|ββββββ | 152/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 53%|ββββββ | 153/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 53%|ββββββ | 153/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 53%|ββββββ | 154/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 53%|ββββββ | 154/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 53%|ββββββ | 155/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 53%|ββββββ | 155/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 54%|ββββββ | 156/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 54%|ββββββ | 156/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 54%|ββββββ | 157/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 54%|ββββββ | 157/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 54%|ββββββ | 158/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 54%|ββββββ | 158/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 54%|ββββββ | 158/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 55%|ββββββ | 159/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 55%|ββββββ | 159/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 55%|ββββββ | 160/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 55%|ββββββ | 160/291 [00:06<00:06, 19.86it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 55%|ββββββ | 161/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 55%|ββββββ | 161/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 55%|ββββββ | 161/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 56%|ββββββ | 162/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 56%|ββββββ | 162/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 56%|ββββββ | 163/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 56%|ββββββ | 163/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 56%|ββββββ | 164/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 56%|ββββββ | 164/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 57%|ββββββ | 165/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 57%|ββββββ | 165/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 57%|ββββββ | 166/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 57%|ββββββ | 166/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 57%|ββββββ | 167/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 57%|ββββββ | 167/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 58%|ββββββ | 168/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 58%|ββββββ | 168/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 58%|ββββββ | 168/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 58%|ββββββ | 169/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 58%|ββββββ | 169/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 58%|ββββββ | 170/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 58%|ββββββ | 170/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 59%|ββββββ | 171/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 59%|ββββββ | 171/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 59%|ββββββ | 172/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 59%|ββββββ | 172/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 59%|ββββββ | 173/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 59%|ββββββ | 173/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 60%|ββββββ | 174/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 60%|ββββββ | 174/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 60%|ββββββ | 175/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 60%|ββββββ | 175/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 60%|ββββββ | 176/291 [00:07<00:10, 11.48it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 60%|ββββββ | 176/291 [00:07<00:10, 11.48it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 60%|ββββββ | 176/291 [00:07<00:10, 11.48it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 61%|ββββββ | 177/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 61%|ββββββ | 177/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 61%|ββββββ | 178/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 61%|ββββββ | 178/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 62%|βββββββ | 179/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 62%|βββββββ | 179/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 62%|βββββββ | 180/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 62%|βββββββ | 180/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 62%|βββββββ | 181/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 62%|βββββββ | 181/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 63%|βββββββ | 182/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 63%|βββββββ | 182/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 63%|βββββββ | 182/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 63%|βββββββ | 183/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 63%|βββββββ | 183/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 63%|βββββββ | 184/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 63%|βββββββ | 184/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 64%|βββββββ | 185/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 64%|βββββββ | 185/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 64%|βββββββ | 185/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 64%|βββββββ | 186/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 64%|βββββββ | 186/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 64%|βββββββ | 187/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 64%|βββββββ | 187/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 65%|βββββββ | 188/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 65%|βββββββ | 188/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 65%|βββββββ | 189/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 65%|βββββββ | 189/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 65%|βββββββ | 190/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 65%|βββββββ | 190/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 66%|βββββββ | 191/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 66%|βββββββ | 191/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 66%|βββββββ | 192/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 66%|βββββββ | 192/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 66%|βββββββ | 193/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 66%|βββββββ | 193/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 67%|βββββββ | 194/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 67%|βββββββ | 194/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 67%|βββββββ | 194/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 67%|βββββββ | 195/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 67%|βββββββ | 195/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 67%|βββββββ | 196/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 67%|βββββββ | 196/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 68%|βββββββ | 197/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 68%|βββββββ | 197/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 68%|βββββββ | 198/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 68%|βββββββ | 198/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 68%|βββββββ | 198/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 68%|βββββββ | 199/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 68%|βββββββ | 199/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 69%|βββββββ | 200/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 69%|βββββββ | 200/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 69%|βββββββ | 201/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 69%|βββββββ | 201/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 69%|βββββββ | 202/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 69%|βββββββ | 202/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 70%|βββββββ | 203/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 70%|βββββββ | 203/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 70%|βββββββ | 204/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 70%|βββββββ | 204/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 70%|βββββββ | 205/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 70%|βββββββ | 205/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 71%|βββββββ | 206/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 71%|βββββββ | 206/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 71%|βββββββ | 206/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 71%|βββββββ | 207/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 71%|βββββββ | 207/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 71%|ββββββββ | 208/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 71%|ββββββββ | 208/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 72%|ββββββββ | 209/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 72%|ββββββββ | 209/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 72%|ββββββββ | 210/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 72%|ββββββββ | 210/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 73%|ββββββββ | 211/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 73%|ββββββββ | 211/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 73%|ββββββββ | 212/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 73%|ββββββββ | 212/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 73%|ββββββββ | 212/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 73%|ββββββββ | 213/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 73%|ββββββββ | 213/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 74%|ββββββββ | 214/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 74%|ββββββββ | 214/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 74%|ββββββββ | 215/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 74%|ββββββββ | 215/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 74%|ββββββββ | 216/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 74%|ββββββββ | 216/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 74%|ββββββββ | 216/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 75%|ββββββββ | 217/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 75%|ββββββββ | 217/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 75%|ββββββββ | 218/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 75%|ββββββββ | 218/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 75%|ββββββββ | 219/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 75%|ββββββββ | 219/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 75%|ββββββββ | 219/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 76%|ββββββββ | 220/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 76%|ββββββββ | 220/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 76%|ββββββββ | 221/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 76%|ββββββββ | 221/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 76%|ββββββββ | 221/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 76%|ββββββββ | 222/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 76%|ββββββββ | 222/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 77%|ββββββββ | 223/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 77%|ββββββββ | 223/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 77%|ββββββββ | 224/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 77%|ββββββββ | 224/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 77%|ββββββββ | 225/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 77%|ββββββββ | 225/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 78%|ββββββββ | 226/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 78%|ββββββββ | 226/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 78%|ββββββββ | 227/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 78%|ββββββββ | 227/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 78%|ββββββββ | 227/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 78%|ββββββββ | 228/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 78%|ββββββββ | 228/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 79%|ββββββββ | 229/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 79%|ββββββββ | 229/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 79%|ββββββββ | 230/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 79%|ββββββββ | 230/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 79%|ββββββββ | 230/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 79%|ββββββββ | 231/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 79%|ββββββββ | 231/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 80%|ββββββββ | 232/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 80%|ββββββββ | 232/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 80%|ββββββββ | 233/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 80%|ββββββββ | 233/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 80%|ββββββββ | 234/291 [00:11<00:05, 9.51it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 80%|ββββββββ | 234/291 [00:11<00:05, 9.51it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 81%|ββββββββ | 235/291 [00:11<00:05, 11.11it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 81%|ββββββββ | 235/291 [00:11<00:05, 11.11it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 81%|ββββββββ | 235/291 [00:11<00:05, 11.11it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 81%|ββββββββ | 236/291 [00:12<00:04, 11.11it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 81%|ββββββββ | 236/291 [00:12<00:04, 11.11it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 81%|βββββββββ | 237/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 81%|βββββββββ | 237/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 81%|βββββββββ | 237/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 82%|βββββββββ | 238/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 82%|βββββββββ | 238/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 82%|βββββββββ | 239/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 82%|βββββββββ | 239/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 82%|βββββββββ | 239/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 82%|βββββββββ | 240/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 82%|βββββββββ | 240/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 83%|βββββββββ | 241/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 83%|βββββββββ | 241/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 83%|βββββββββ | 241/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 83%|βββββββββ | 242/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 83%|βββββββββ | 242/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 84%|βββββββββ | 243/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 84%|βββββββββ | 243/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 84%|βββββββββ | 243/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 84%|βββββββββ | 244/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 84%|βββββββββ | 244/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 84%|βββββββββ | 245/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 84%|βββββββββ | 245/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 85%|βββββββββ | 246/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 85%|βββββββββ | 246/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 85%|βββββββββ | 247/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 85%|βββββββββ | 247/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 85%|βββββββββ | 248/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 85%|βββββββββ | 248/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 85%|βββββββββ | 248/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 86%|βββββββββ | 249/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 86%|βββββββββ | 249/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 86%|βββββββββ | 250/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 86%|βββββββββ | 250/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 86%|βββββββββ | 251/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 86%|βββββββββ | 251/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 87%|βββββββββ | 252/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 87%|βββββββββ | 252/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 87%|βββββββββ | 253/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 87%|βββββββββ | 253/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 87%|βββββββββ | 254/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 87%|βββββββββ | 254/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 87%|βββββββββ | 254/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 88%|βββββββββ | 255/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights: 88%|βββββββββ | 255/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights: 88%|βββββββββ | 256/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights: 88%|βββββββββ | 256/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights: 88%|βββββββββ | 257/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights: 88%|βββββββββ | 257/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights: 88%|βββββββββ | 257/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights: 89%|βββββββββ | 258/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights: 89%|βββββββββ | 258/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights: 89%|βββββββββ | 259/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights: 89%|βββββββββ | 259/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights: 89%|βββββββββ | 260/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights: 89%|βββββββββ | 260/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights: 90%|βββββββββ | 261/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights: 90%|βββββββββ | 261/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights: 90%|βββββββββ | 262/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights: 90%|βββββββββ | 262/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights: 90%|βββββββββ | 263/291 [00:14<00:03, 9.16it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights: 90%|βββββββββ | 263/291 [00:14<00:03, 9.16it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights: 90%|βββββββββ | 263/291 [00:14<00:03, 9.16it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights: 91%|βββββββββ | 264/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights: 91%|βββββββββ | 264/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights: 91%|βββββββββ | 265/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights: 91%|βββββββββ | 265/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights: 91%|ββββββββββ| 266/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights: 91%|ββββββββββ| 266/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights: 92%|ββββββββββ| 267/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights: 92%|ββββββββββ| 267/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights: 92%|ββββββββββ| 268/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights: 92%|ββββββββββ| 268/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights: 92%|ββββββββββ| 268/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights: 92%|ββββββββββ| 269/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights: 92%|ββββββββββ| 269/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights: 93%|ββββββββββ| 270/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights: 93%|ββββββββββ| 270/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights: 93%|ββββββββββ| 271/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights: 93%|ββββββββββ| 271/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights: 93%|ββββββββββ| 272/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights: 93%|ββββββββββ| 272/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights: 93%|ββββββββββ| 272/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights: 94%|ββββββββββ| 273/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights: 94%|ββββββββββ| 273/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights: 94%|ββββββββββ| 274/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights: 94%|ββββββββββ| 274/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights: 95%|ββββββββββ| 275/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights: 95%|ββββββββββ| 275/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights: 95%|ββββββββββ| 275/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights: 95%|ββββββββββ| 276/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights: 95%|ββββββββββ| 276/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights: 95%|ββββββββββ| 277/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights: 95%|ββββββββββ| 277/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights: 96%|ββββββββββ| 278/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights: 96%|ββββββββββ| 278/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights: 96%|ββββββββββ| 279/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights: 96%|ββββββββββ| 279/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights: 96%|ββββββββββ| 279/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights: 96%|ββββββββββ| 280/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights: 96%|ββββββββββ| 280/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights: 97%|ββββββββββ| 281/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights: 97%|ββββββββββ| 281/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights: 97%|ββββββββββ| 282/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights: 97%|ββββββββββ| 282/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights: 97%|ββββββββββ| 283/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights: 97%|ββββββββββ| 283/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights: 98%|ββββββββββ| 284/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights: 98%|ββββββββββ| 284/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights: 98%|ββββββββββ| 284/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights: 98%|ββββββββββ| 285/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights: 98%|ββββββββββ| 285/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights: 98%|ββββββββββ| 286/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights: 98%|ββββββββββ| 286/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights: 99%|ββββββββββ| 287/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights: 99%|ββββββββββ| 287/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights: 99%|ββββββββββ| 288/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights: 99%|ββββββββββ| 288/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights: 99%|ββββββββββ| 289/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights: 99%|ββββββββββ| 289/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights: 100%|ββββββββββ| 290/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 100%|ββββββββββ| 290/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 100%|ββββββββββ| 291/291 [00:16<00:00, 9.77it/s, Materializing param=model.norm.weight]
Loading weights: 100%|ββββββββββ| 291/291 [00:16<00:00, 9.77it/s, Materializing param=model.norm.weight]
Loading weights: 100%|ββββββββββ| 291/291 [00:16<00:00, 17.25it/s, Materializing param=model.norm.weight] |