model / miner_8092.log
akashdutta1030's picture
Upload merged Mistral-7B LoRA model
f0d7d0a verified
2026-03-06 01:42:49,774 - __main__ - INFO - ============================================================
2026-03-06 01:42:49,774 - __main__ - INFO - Starting Babelbit Miner Server
2026-03-06 01:42:49,774 - __main__ - INFO - Backend: Hugging Face
2026-03-06 01:42:49,774 - __main__ - INFO - ============================================================
2026-03-06 01:42:49,774 - __main__ - INFO -
2026-03-06 01:42:49,774 - __main__ - WARNING - πŸ”“ DEV MODE ENABLED - Bittensor verification DISABLED
2026-03-06 01:42:49,774 - __main__ - WARNING - This should ONLY be used for local testing!
2026-03-06 01:42:49,774 - __main__ - WARNING - Set MINER_DEV_MODE=0 for production use.
2026-03-06 01:42:49,774 - __main__ - INFO -
2026-03-06 01:42:49,774 - __main__ - INFO - ⚠️ Make sure you've registered your axon first:
2026-03-06 01:42:49,774 - __main__ - INFO - uv run python babelbit/miner/register_axon.py
2026-03-06 01:42:49,774 - __main__ - INFO -
2026-03-06 01:42:49,774 - __main__ - INFO - 🌐 Production mode: MINER_EXTERNAL_IP=216.81.245.239
2026-03-06 01:42:49,774 - __main__ - INFO - Server accessible at: http://216.81.245.239:8092
2026-03-06 01:42:49,775 - __main__ - INFO - πŸš€ Miner serving predictions on port 8092
2026-03-06 01:42:49,775 - __main__ - INFO - Press Ctrl+C to stop.
2026-03-06 01:42:49,775 - __main__ - INFO -
INFO: Started server process [311701]
INFO: Waiting for application startup.
2026-03-06 01:42:49,782 - __main__ - INFO - Model: /root/workspace/babelbit_subnet/fine_tuning/fine_tuned_model/merged
2026-03-06 01:42:49,782 - __main__ - INFO - Revision: main
2026-03-06 01:42:49,782 - __main__ - INFO - Cache dir: /root/.babelbit/models
2026-03-06 01:42:49,782 - __main__ - INFO - Quantization: 8bit=False, 4bit=False
2026-03-06 01:42:49,782 - __main__ - INFO - Device: cuda
2026-03-06 01:42:49,782 - __main__ - INFO -
2026-03-06 01:42:49,782 - __main__ - INFO - Initialized BabelbitMiner with model: /root/workspace/babelbit_subnet/fine_tuning/fine_tuned_model/merged
2026-03-06 01:42:49,782 - __main__ - INFO - Target device: cuda, dtype: torch.float16
2026-03-06 01:42:49,782 - __main__ - INFO - Loading model...
2026-03-06 01:42:49,782 - __main__ - INFO - Loading model /root/workspace/babelbit_subnet/fine_tuning/fine_tuned_model/merged...
2026-03-06 01:42:49,783 - babelbit.miner.model_loader - INFO - Loading tokenizer for /root/workspace/babelbit_subnet/fine_tuning/fine_tuned_model/merged...
2026-03-06 01:42:50,252 - babelbit.miner.model_loader - INFO - Loading model /root/workspace/babelbit_subnet/fine_tuning/fine_tuned_model/merged...
Loading weights: 0%| | 0/291 [00:00<?, ?it/s] Loading weights: 0%| | 1/291 [00:00<00:00, 14364.05it/s, Materializing param=lm_head.weight] Loading weights: 0%| | 1/291 [00:00<00:00, 7781.64it/s, Materializing param=lm_head.weight] Loading weights: 1%| | 2/291 [00:00<00:38, 7.58it/s, Materializing param=lm_head.weight] Loading weights: 1%| | 2/291 [00:00<00:38, 7.58it/s, Materializing param=model.embed_tokens.weight] Loading weights: 1%| | 2/291 [00:00<00:38, 7.58it/s, Materializing param=model.embed_tokens.weight] Loading weights: 1%| | 3/291 [00:00<00:38, 7.58it/s, Materializing param=model.layers.0.input_layernorm.weight] Loading weights: 1%| | 3/291 [00:00<00:38, 7.58it/s, Materializing param=model.layers.0.input_layernorm.weight] Loading weights: 1%|▏ | 4/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.down_proj.weight] Loading weights: 1%|▏ | 4/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.down_proj.weight] Loading weights: 2%|▏ | 5/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.gate_proj.weight] Loading weights: 2%|▏ | 5/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.gate_proj.weight] Loading weights: 2%|▏ | 6/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.up_proj.weight] Loading weights: 2%|▏ | 6/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.mlp.up_proj.weight] Loading weights: 2%|▏ | 7/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.post_attention_layernorm.weight] Loading weights: 2%|▏ | 7/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.post_attention_layernorm.weight] Loading weights: 3%|β–Ž | 8/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.k_proj.weight] Loading weights: 3%|β–Ž | 8/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.k_proj.weight] Loading weights: 3%|β–Ž | 9/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.o_proj.weight] Loading weights: 3%|β–Ž | 9/291 [00:00<00:37, 7.58it/s, Materializing param=model.layers.0.self_attn.o_proj.weight] Loading weights: 3%|β–Ž | 10/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.o_proj.weight] Loading weights: 3%|β–Ž | 10/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.q_proj.weight] Loading weights: 3%|β–Ž | 10/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.q_proj.weight] Loading weights: 4%|▍ | 11/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.v_proj.weight] Loading weights: 4%|▍ | 11/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.0.self_attn.v_proj.weight] Loading weights: 4%|▍ | 12/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.input_layernorm.weight] Loading weights: 4%|▍ | 12/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.input_layernorm.weight] Loading weights: 4%|▍ | 13/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.down_proj.weight] Loading weights: 4%|▍ | 13/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.down_proj.weight] Loading weights: 5%|▍ | 14/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.gate_proj.weight] Loading weights: 5%|▍ | 14/291 [00:00<00:08, 32.68it/s, Materializing param=model.layers.1.mlp.gate_proj.weight] Loading weights: 5%|β–Œ | 15/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.mlp.gate_proj.weight] Loading weights: 5%|β–Œ | 15/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.mlp.up_proj.weight] Loading weights: 5%|β–Œ | 15/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.mlp.up_proj.weight] Loading weights: 5%|β–Œ | 16/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.post_attention_layernorm.weight] Loading weights: 5%|β–Œ | 16/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.post_attention_layernorm.weight] Loading weights: 6%|β–Œ | 17/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.k_proj.weight] Loading weights: 6%|β–Œ | 17/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.k_proj.weight] Loading weights: 6%|β–Œ | 18/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.o_proj.weight] Loading weights: 6%|β–Œ | 18/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.o_proj.weight] Loading weights: 7%|β–‹ | 19/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.q_proj.weight] Loading weights: 7%|β–‹ | 19/291 [00:00<00:09, 28.24it/s, Materializing param=model.layers.1.self_attn.q_proj.weight] Loading weights: 7%|β–‹ | 20/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.1.self_attn.q_proj.weight] Loading weights: 7%|β–‹ | 20/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.1.self_attn.v_proj.weight] Loading weights: 7%|β–‹ | 20/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.1.self_attn.v_proj.weight] Loading weights: 7%|β–‹ | 21/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.input_layernorm.weight] Loading weights: 7%|β–‹ | 21/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.input_layernorm.weight] Loading weights: 8%|β–Š | 22/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.down_proj.weight] Loading weights: 8%|β–Š | 22/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.down_proj.weight] Loading weights: 8%|β–Š | 23/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.gate_proj.weight] Loading weights: 8%|β–Š | 23/291 [00:00<00:09, 27.30it/s, Materializing param=model.layers.2.mlp.gate_proj.weight] Loading weights: 8%|β–Š | 24/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.mlp.gate_proj.weight] Loading weights: 8%|β–Š | 24/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.mlp.up_proj.weight] Loading weights: 8%|β–Š | 24/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.mlp.up_proj.weight] Loading weights: 9%|β–Š | 25/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.post_attention_layernorm.weight] Loading weights: 9%|β–Š | 25/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.post_attention_layernorm.weight] Loading weights: 9%|β–‰ | 26/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.k_proj.weight] Loading weights: 9%|β–‰ | 26/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.k_proj.weight] Loading weights: 9%|β–‰ | 27/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.o_proj.weight] Loading weights: 9%|β–‰ | 27/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.o_proj.weight] Loading weights: 10%|β–‰ | 28/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.q_proj.weight] Loading weights: 10%|β–‰ | 28/291 [00:00<00:10, 24.73it/s, Materializing param=model.layers.2.self_attn.q_proj.weight] Loading weights: 10%|β–‰ | 29/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.2.self_attn.q_proj.weight] Loading weights: 10%|β–‰ | 29/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.2.self_attn.v_proj.weight] Loading weights: 10%|β–‰ | 29/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.2.self_attn.v_proj.weight] Loading weights: 10%|β–ˆ | 30/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.input_layernorm.weight] Loading weights: 10%|β–ˆ | 30/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.input_layernorm.weight] Loading weights: 11%|β–ˆ | 31/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.down_proj.weight] Loading weights: 11%|β–ˆ | 31/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.down_proj.weight] Loading weights: 11%|β–ˆ | 32/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.gate_proj.weight] Loading weights: 11%|β–ˆ | 32/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.gate_proj.weight] Loading weights: 11%|β–ˆβ– | 33/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.up_proj.weight] Loading weights: 11%|β–ˆβ– | 33/291 [00:01<00:08, 29.82it/s, Materializing param=model.layers.3.mlp.up_proj.weight] Loading weights: 12%|β–ˆβ– | 34/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.mlp.up_proj.weight] Loading weights: 12%|β–ˆβ– | 34/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.post_attention_layernorm.weight] Loading weights: 12%|β–ˆβ– | 34/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.post_attention_layernorm.weight] Loading weights: 12%|β–ˆβ– | 35/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.k_proj.weight] Loading weights: 12%|β–ˆβ– | 35/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.k_proj.weight] Loading weights: 12%|β–ˆβ– | 36/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.o_proj.weight] Loading weights: 12%|β–ˆβ– | 36/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.o_proj.weight] Loading weights: 13%|β–ˆβ–Ž | 37/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.q_proj.weight] Loading weights: 13%|β–ˆβ–Ž | 37/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.q_proj.weight] Loading weights: 13%|β–ˆβ–Ž | 38/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.v_proj.weight] Loading weights: 13%|β–ˆβ–Ž | 38/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.3.self_attn.v_proj.weight] Loading weights: 13%|β–ˆβ–Ž | 39/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.input_layernorm.weight] Loading weights: 13%|β–ˆβ–Ž | 39/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.input_layernorm.weight] Loading weights: 14%|β–ˆβ–Ž | 40/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.mlp.down_proj.weight] Loading weights: 14%|β–ˆβ–Ž | 40/291 [00:01<00:07, 34.42it/s, Materializing param=model.layers.4.mlp.down_proj.weight] Loading weights: 14%|β–ˆβ– | 41/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.down_proj.weight] Loading weights: 14%|β–ˆβ– | 41/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.gate_proj.weight] Loading weights: 14%|β–ˆβ– | 41/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.gate_proj.weight] Loading weights: 14%|β–ˆβ– | 42/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.up_proj.weight] Loading weights: 14%|β–ˆβ– | 42/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.mlp.up_proj.weight] Loading weights: 15%|β–ˆβ– | 43/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.post_attention_layernorm.weight] Loading weights: 15%|β–ˆβ– | 43/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.post_attention_layernorm.weight] Loading weights: 15%|β–ˆβ–Œ | 44/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.k_proj.weight] Loading weights: 15%|β–ˆβ–Œ | 44/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.k_proj.weight] Loading weights: 15%|β–ˆβ–Œ | 45/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.o_proj.weight] Loading weights: 15%|β–ˆβ–Œ | 45/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.o_proj.weight] Loading weights: 16%|β–ˆβ–Œ | 46/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.q_proj.weight] Loading weights: 16%|β–ˆβ–Œ | 46/291 [00:01<00:07, 34.86it/s, Materializing param=model.layers.4.self_attn.q_proj.weight] Loading weights: 16%|β–ˆβ–Œ | 47/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.4.self_attn.v_proj.weight] Loading weights: 16%|β–ˆβ–Œ | 47/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.4.self_attn.v_proj.weight] Loading weights: 16%|β–ˆβ–‹ | 48/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.input_layernorm.weight] Loading weights: 16%|β–ˆβ–‹ | 48/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.input_layernorm.weight] Loading weights: 17%|β–ˆβ–‹ | 49/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.mlp.down_proj.weight] Loading weights: 17%|β–ˆβ–‹ | 49/291 [00:01<00:06, 34.86it/s, Materializing param=model.layers.5.mlp.down_proj.weight] Loading weights: 17%|β–ˆβ–‹ | 50/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.down_proj.weight] Loading weights: 17%|β–ˆβ–‹ | 50/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.gate_proj.weight] Loading weights: 17%|β–ˆβ–‹ | 50/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.gate_proj.weight] Loading weights: 18%|β–ˆβ–Š | 51/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.up_proj.weight] Loading weights: 18%|β–ˆβ–Š | 51/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.mlp.up_proj.weight] Loading weights: 18%|β–ˆβ–Š | 52/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.post_attention_layernorm.weight] Loading weights: 18%|β–ˆβ–Š | 52/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.post_attention_layernorm.weight] Loading weights: 18%|β–ˆβ–Š | 53/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.k_proj.weight] Loading weights: 18%|β–ˆβ–Š | 53/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.k_proj.weight] Loading weights: 19%|β–ˆβ–Š | 54/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.o_proj.weight] Loading weights: 19%|β–ˆβ–Š | 54/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.o_proj.weight] Loading weights: 19%|β–ˆβ–‰ | 55/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.q_proj.weight] Loading weights: 19%|β–ˆβ–‰ | 55/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.q_proj.weight] Loading weights: 19%|β–ˆβ–‰ | 56/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.v_proj.weight] Loading weights: 19%|β–ˆβ–‰ | 56/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.5.self_attn.v_proj.weight] Loading weights: 20%|β–ˆβ–‰ | 57/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.input_layernorm.weight] Loading weights: 20%|β–ˆβ–‰ | 57/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.input_layernorm.weight] Loading weights: 20%|β–ˆβ–‰ | 58/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.mlp.down_proj.weight] Loading weights: 20%|β–ˆβ–‰ | 58/291 [00:01<00:06, 38.66it/s, Materializing param=model.layers.6.mlp.down_proj.weight] Loading weights: 20%|β–ˆβ–ˆ | 59/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.down_proj.weight] Loading weights: 20%|β–ˆβ–ˆ | 59/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.gate_proj.weight] Loading weights: 20%|β–ˆβ–ˆ | 59/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.gate_proj.weight] Loading weights: 21%|β–ˆβ–ˆ | 60/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.up_proj.weight] Loading weights: 21%|β–ˆβ–ˆ | 60/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.mlp.up_proj.weight] Loading weights: 21%|β–ˆβ–ˆ | 61/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.post_attention_layernorm.weight] Loading weights: 21%|β–ˆβ–ˆ | 61/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.post_attention_layernorm.weight] Loading weights: 21%|β–ˆβ–ˆβ– | 62/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.k_proj.weight] Loading weights: 21%|β–ˆβ–ˆβ– | 62/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.k_proj.weight] Loading weights: 22%|β–ˆβ–ˆβ– | 63/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.o_proj.weight] Loading weights: 22%|β–ˆβ–ˆβ– | 63/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.o_proj.weight] Loading weights: 22%|β–ˆβ–ˆβ– | 64/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.q_proj.weight] Loading weights: 22%|β–ˆβ–ˆβ– | 64/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.q_proj.weight] Loading weights: 22%|β–ˆβ–ˆβ– | 65/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.v_proj.weight] Loading weights: 22%|β–ˆβ–ˆβ– | 65/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.6.self_attn.v_proj.weight] Loading weights: 23%|β–ˆβ–ˆβ–Ž | 66/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.input_layernorm.weight] Loading weights: 23%|β–ˆβ–ˆβ–Ž | 66/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.input_layernorm.weight] Loading weights: 23%|β–ˆβ–ˆβ–Ž | 67/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.mlp.down_proj.weight] Loading weights: 23%|β–ˆβ–ˆβ–Ž | 67/291 [00:01<00:05, 40.63it/s, Materializing param=model.layers.7.mlp.down_proj.weight] Loading weights: 23%|β–ˆβ–ˆβ–Ž | 68/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.down_proj.weight] Loading weights: 23%|β–ˆβ–ˆβ–Ž | 68/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.gate_proj.weight] Loading weights: 23%|β–ˆβ–ˆβ–Ž | 68/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.gate_proj.weight] Loading weights: 24%|β–ˆβ–ˆβ–Ž | 69/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.up_proj.weight] Loading weights: 24%|β–ˆβ–ˆβ–Ž | 69/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.mlp.up_proj.weight] Loading weights: 24%|β–ˆβ–ˆβ– | 70/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.post_attention_layernorm.weight] Loading weights: 24%|β–ˆβ–ˆβ– | 70/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.post_attention_layernorm.weight] Loading weights: 24%|β–ˆβ–ˆβ– | 71/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.k_proj.weight] Loading weights: 24%|β–ˆβ–ˆβ– | 71/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.k_proj.weight] Loading weights: 25%|β–ˆβ–ˆβ– | 72/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.o_proj.weight] Loading weights: 25%|β–ˆβ–ˆβ– | 72/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.o_proj.weight] Loading weights: 25%|β–ˆβ–ˆβ–Œ | 73/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.q_proj.weight] Loading weights: 25%|β–ˆβ–ˆβ–Œ | 73/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.q_proj.weight] Loading weights: 25%|β–ˆβ–ˆβ–Œ | 74/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.v_proj.weight] Loading weights: 25%|β–ˆβ–ˆβ–Œ | 74/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.7.self_attn.v_proj.weight] Loading weights: 26%|β–ˆβ–ˆβ–Œ | 75/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.input_layernorm.weight] Loading weights: 26%|β–ˆβ–ˆβ–Œ | 75/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.input_layernorm.weight] Loading weights: 26%|β–ˆβ–ˆβ–Œ | 76/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.mlp.down_proj.weight] Loading weights: 26%|β–ˆβ–ˆβ–Œ | 76/291 [00:01<00:05, 42.17it/s, Materializing param=model.layers.8.mlp.down_proj.weight] Loading weights: 26%|β–ˆβ–ˆβ–‹ | 77/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.down_proj.weight] Loading weights: 26%|β–ˆβ–ˆβ–‹ | 77/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.gate_proj.weight] Loading weights: 26%|β–ˆβ–ˆβ–‹ | 77/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.gate_proj.weight] Loading weights: 27%|β–ˆβ–ˆβ–‹ | 78/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.up_proj.weight] Loading weights: 27%|β–ˆβ–ˆβ–‹ | 78/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.mlp.up_proj.weight] Loading weights: 27%|β–ˆβ–ˆβ–‹ | 79/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.post_attention_layernorm.weight] Loading weights: 27%|β–ˆβ–ˆβ–‹ | 79/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.post_attention_layernorm.weight] Loading weights: 27%|β–ˆβ–ˆβ–‹ | 80/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.k_proj.weight] Loading weights: 27%|β–ˆβ–ˆβ–‹ | 80/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.k_proj.weight] Loading weights: 28%|β–ˆβ–ˆβ–Š | 81/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.o_proj.weight] Loading weights: 28%|β–ˆβ–ˆβ–Š | 81/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.o_proj.weight] Loading weights: 28%|β–ˆβ–ˆβ–Š | 82/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.q_proj.weight] Loading weights: 28%|β–ˆβ–ˆβ–Š | 82/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.q_proj.weight] Loading weights: 29%|β–ˆβ–ˆβ–Š | 83/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.v_proj.weight] Loading weights: 29%|β–ˆβ–ˆβ–Š | 83/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.8.self_attn.v_proj.weight] Loading weights: 29%|β–ˆβ–ˆβ–‰ | 84/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.input_layernorm.weight] Loading weights: 29%|β–ˆβ–ˆβ–‰ | 84/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.input_layernorm.weight] Loading weights: 29%|β–ˆβ–ˆβ–‰ | 85/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.mlp.down_proj.weight] Loading weights: 29%|β–ˆβ–ˆβ–‰ | 85/291 [00:02<00:04, 43.00it/s, Materializing param=model.layers.9.mlp.down_proj.weight] Loading weights: 30%|β–ˆβ–ˆβ–‰ | 86/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.down_proj.weight] Loading weights: 30%|β–ˆβ–ˆβ–‰ | 86/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.gate_proj.weight] Loading weights: 30%|β–ˆβ–ˆβ–‰ | 86/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.gate_proj.weight] Loading weights: 30%|β–ˆβ–ˆβ–‰ | 87/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.up_proj.weight] Loading weights: 30%|β–ˆβ–ˆβ–‰ | 87/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.mlp.up_proj.weight] Loading weights: 30%|β–ˆβ–ˆβ–ˆ | 88/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.post_attention_layernorm.weight] Loading weights: 30%|β–ˆβ–ˆβ–ˆ | 88/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.post_attention_layernorm.weight] Loading weights: 31%|β–ˆβ–ˆβ–ˆ | 89/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.k_proj.weight] Loading weights: 31%|β–ˆβ–ˆβ–ˆ | 89/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.k_proj.weight] Loading weights: 31%|β–ˆβ–ˆβ–ˆ | 90/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.o_proj.weight] Loading weights: 31%|β–ˆβ–ˆβ–ˆ | 90/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.o_proj.weight] Loading weights: 31%|β–ˆβ–ˆβ–ˆβ– | 91/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.q_proj.weight] Loading weights: 31%|β–ˆβ–ˆβ–ˆβ– | 91/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.q_proj.weight] Loading weights: 32%|β–ˆβ–ˆβ–ˆβ– | 92/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.v_proj.weight] Loading weights: 32%|β–ˆβ–ˆβ–ˆβ– | 92/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.9.self_attn.v_proj.weight] Loading weights: 32%|β–ˆβ–ˆβ–ˆβ– | 93/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.input_layernorm.weight] Loading weights: 32%|β–ˆβ–ˆβ–ˆβ– | 93/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.input_layernorm.weight] Loading weights: 32%|β–ˆβ–ˆβ–ˆβ– | 94/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.mlp.down_proj.weight] Loading weights: 32%|β–ˆβ–ˆβ–ˆβ– | 94/291 [00:02<00:04, 43.79it/s, Materializing param=model.layers.10.mlp.down_proj.weight] Loading weights: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 95/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.down_proj.weight] Loading weights: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 95/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.gate_proj.weight] Loading weights: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 95/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.gate_proj.weight] Loading weights: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 96/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.up_proj.weight] Loading weights: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 96/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.mlp.up_proj.weight] Loading weights: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 97/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.post_attention_layernorm.weight] Loading weights: 33%|β–ˆβ–ˆβ–ˆβ–Ž | 97/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.post_attention_layernorm.weight] Loading weights: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 98/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.k_proj.weight] Loading weights: 34%|β–ˆβ–ˆβ–ˆβ–Ž | 98/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.k_proj.weight] Loading weights: 34%|β–ˆβ–ˆβ–ˆβ– | 99/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.o_proj.weight] Loading weights: 34%|β–ˆβ–ˆβ–ˆβ– | 99/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.o_proj.weight] Loading weights: 34%|β–ˆβ–ˆβ–ˆβ– | 100/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.q_proj.weight] Loading weights: 34%|β–ˆβ–ˆβ–ˆβ– | 100/291 [00:02<00:03, 51.26it/s, Materializing param=model.layers.10.self_attn.q_proj.weight] Loading weights: 35%|β–ˆβ–ˆβ–ˆβ– | 101/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.10.self_attn.q_proj.weight] Loading weights: 35%|β–ˆβ–ˆβ–ˆβ– | 101/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.10.self_attn.v_proj.weight] Loading weights: 35%|β–ˆβ–ˆβ–ˆβ– | 101/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.10.self_attn.v_proj.weight] Loading weights: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 102/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.input_layernorm.weight] Loading weights: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 102/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.input_layernorm.weight] Loading weights: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 103/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.down_proj.weight] Loading weights: 35%|β–ˆβ–ˆβ–ˆβ–Œ | 103/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.down_proj.weight] Loading weights: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 104/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.gate_proj.weight] Loading weights: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 104/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.gate_proj.weight] Loading weights: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 105/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.up_proj.weight] Loading weights: 36%|β–ˆβ–ˆβ–ˆβ–Œ | 105/291 [00:02<00:04, 45.03it/s, Materializing param=model.layers.11.mlp.up_proj.weight] Loading weights: 36%|β–ˆβ–ˆβ–ˆβ–‹ | 106/291 [00:02<00:05, 31.69it/s, Materializing param=model.layers.11.mlp.up_proj.weight] Loading weights: 36%|β–ˆβ–ˆβ–ˆβ–‹ | 106/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.post_attention_layernorm.weight] Loading weights: 36%|β–ˆβ–ˆβ–ˆβ–‹ | 106/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.post_attention_layernorm.weight] Loading weights: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 107/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.k_proj.weight] Loading weights: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 107/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.k_proj.weight] Loading weights: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 108/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.o_proj.weight] Loading weights: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 108/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.o_proj.weight] Loading weights: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 109/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.q_proj.weight] Loading weights: 37%|β–ˆβ–ˆβ–ˆβ–‹ | 109/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.q_proj.weight] Loading weights: 38%|β–ˆβ–ˆβ–ˆβ–Š | 110/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.v_proj.weight] Loading weights: 38%|β–ˆβ–ˆβ–ˆβ–Š | 110/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.11.self_attn.v_proj.weight] Loading weights: 38%|β–ˆβ–ˆβ–ˆβ–Š | 111/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.input_layernorm.weight] Loading weights: 38%|β–ˆβ–ˆβ–ˆβ–Š | 111/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.input_layernorm.weight] Loading weights: 38%|β–ˆβ–ˆβ–ˆβ–Š | 112/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.mlp.down_proj.weight] Loading weights: 38%|β–ˆβ–ˆβ–ˆβ–Š | 112/291 [00:03<00:05, 31.69it/s, Materializing param=model.layers.12.mlp.down_proj.weight] Loading weights: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 113/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.down_proj.weight] Loading weights: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 113/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.gate_proj.weight] Loading weights: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 113/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.gate_proj.weight] Loading weights: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 114/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.up_proj.weight] Loading weights: 39%|β–ˆβ–ˆβ–ˆβ–‰ | 114/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.mlp.up_proj.weight] Loading weights: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 115/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.post_attention_layernorm.weight] Loading weights: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 115/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.post_attention_layernorm.weight] Loading weights: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 116/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.self_attn.k_proj.weight] Loading weights: 40%|β–ˆβ–ˆβ–ˆβ–‰ | 116/291 [00:03<00:05, 33.31it/s, Materializing param=model.layers.12.self_attn.k_proj.weight] Loading weights: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 117/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.k_proj.weight] Loading weights: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 117/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.o_proj.weight] Loading weights: 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 117/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.o_proj.weight] Loading weights: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 118/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.q_proj.weight] Loading weights: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 118/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.q_proj.weight] Loading weights: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 119/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.v_proj.weight] Loading weights: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 119/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.12.self_attn.v_proj.weight] Loading weights: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 120/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.input_layernorm.weight] Loading weights: 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 120/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.input_layernorm.weight] Loading weights: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 121/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.down_proj.weight] Loading weights: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 121/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.down_proj.weight] Loading weights: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 122/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.gate_proj.weight] Loading weights: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 122/291 [00:03<00:05, 32.91it/s, Materializing param=model.layers.13.mlp.gate_proj.weight] Loading weights: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 123/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.mlp.gate_proj.weight] Loading weights: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 123/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.mlp.up_proj.weight] Loading weights: 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 123/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.mlp.up_proj.weight] Loading weights: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 124/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.post_attention_layernorm.weight] Loading weights: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 124/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.post_attention_layernorm.weight] Loading weights: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.k_proj.weight] Loading weights: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.k_proj.weight] Loading weights: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 126/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.o_proj.weight] Loading weights: 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 126/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.o_proj.weight] Loading weights: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 127/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.q_proj.weight] Loading weights: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 127/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.q_proj.weight] Loading weights: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 128/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.v_proj.weight] Loading weights: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 128/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.13.self_attn.v_proj.weight] Loading weights: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 129/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.input_layernorm.weight] Loading weights: 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 129/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.input_layernorm.weight] Loading weights: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 130/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.mlp.down_proj.weight] Loading weights: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 130/291 [00:03<00:05, 29.13it/s, Materializing param=model.layers.14.mlp.down_proj.weight] Loading weights: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 131/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.down_proj.weight] Loading weights: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 131/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.gate_proj.weight] Loading weights: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 131/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.gate_proj.weight] Loading weights: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 132/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.up_proj.weight] Loading weights: 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 132/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.mlp.up_proj.weight] Loading weights: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 133/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.post_attention_layernorm.weight] Loading weights: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 133/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.post_attention_layernorm.weight] Loading weights: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 134/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.self_attn.k_proj.weight] Loading weights: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 134/291 [00:04<00:12, 13.03it/s, Materializing param=model.layers.14.self_attn.k_proj.weight] Loading weights: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 135/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.o_proj.weight] Loading weights: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 135/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.o_proj.weight] Loading weights: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 136/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.q_proj.weight] Loading weights: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 136/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.q_proj.weight] Loading weights: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 137/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.v_proj.weight] Loading weights: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 137/291 [00:04<00:11, 13.03it/s, Materializing param=model.layers.14.self_attn.v_proj.weight] Loading weights: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 138/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.14.self_attn.v_proj.weight] Loading weights: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 138/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.input_layernorm.weight] Loading weights: 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 138/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.input_layernorm.weight] Loading weights: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 139/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.down_proj.weight] Loading weights: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 139/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.down_proj.weight] Loading weights: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 140/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.gate_proj.weight] Loading weights: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 140/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.gate_proj.weight] Loading weights: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 141/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.up_proj.weight] Loading weights: 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 141/291 [00:04<00:09, 16.38it/s, Materializing param=model.layers.15.mlp.up_proj.weight] Loading weights: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 142/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.mlp.up_proj.weight] Loading weights: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 142/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.post_attention_layernorm.weight] Loading weights: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 142/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.post_attention_layernorm.weight] Loading weights: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 143/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.k_proj.weight] Loading weights: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 143/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.k_proj.weight] Loading weights: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 144/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.o_proj.weight] Loading weights: 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 144/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.o_proj.weight] Loading weights: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 145/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.q_proj.weight] Loading weights: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 145/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.q_proj.weight] Loading weights: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 146/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.v_proj.weight] Loading weights: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 146/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.15.self_attn.v_proj.weight] Loading weights: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 147/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.input_layernorm.weight] Loading weights: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 147/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.input_layernorm.weight] Loading weights: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 148/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.mlp.down_proj.weight] Loading weights: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 148/291 [00:05<00:08, 16.59it/s, Materializing param=model.layers.16.mlp.down_proj.weight] Loading weights: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 149/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.down_proj.weight] Loading weights: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 149/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.gate_proj.weight] Loading weights: 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 149/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.gate_proj.weight] Loading weights: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 150/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.up_proj.weight] Loading weights: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 150/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.mlp.up_proj.weight] Loading weights: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 151/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.post_attention_layernorm.weight] Loading weights: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 151/291 [00:05<00:06, 20.69it/s, Materializing param=model.layers.16.post_attention_layernorm.weight] Loading weights: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 152/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.post_attention_layernorm.weight] Loading weights: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 152/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.k_proj.weight] Loading weights: 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 152/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.k_proj.weight] Loading weights: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 153/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.o_proj.weight] Loading weights: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 153/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.o_proj.weight] Loading weights: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 154/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.q_proj.weight] Loading weights: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 154/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.q_proj.weight] Loading weights: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 155/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.v_proj.weight] Loading weights: 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 155/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.16.self_attn.v_proj.weight] Loading weights: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 156/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.input_layernorm.weight] Loading weights: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 156/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.input_layernorm.weight] Loading weights: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 157/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.mlp.down_proj.weight] Loading weights: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 157/291 [00:05<00:07, 18.85it/s, Materializing param=model.layers.17.mlp.down_proj.weight] Loading weights: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 158/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.down_proj.weight] Loading weights: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 158/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.gate_proj.weight] Loading weights: 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 158/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.gate_proj.weight] Loading weights: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 159/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.up_proj.weight] Loading weights: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 159/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.mlp.up_proj.weight] Loading weights: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 160/291 [00:05<00:06, 19.86it/s, Materializing param=model.layers.17.post_attention_layernorm.weight] Loading weights: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 160/291 [00:06<00:06, 19.86it/s, Materializing param=model.layers.17.post_attention_layernorm.weight] Loading weights: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 161/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.post_attention_layernorm.weight] Loading weights: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 161/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.k_proj.weight] Loading weights: 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 161/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.k_proj.weight] Loading weights: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 162/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.o_proj.weight] Loading weights: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 162/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.o_proj.weight] Loading weights: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 163/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.q_proj.weight] Loading weights: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 163/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.q_proj.weight] Loading weights: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 164/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.v_proj.weight] Loading weights: 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 164/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.17.self_attn.v_proj.weight] Loading weights: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 165/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.input_layernorm.weight] Loading weights: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 165/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.input_layernorm.weight] Loading weights: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 166/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.down_proj.weight] Loading weights: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 166/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.down_proj.weight] Loading weights: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 167/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.gate_proj.weight] Loading weights: 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 167/291 [00:06<00:06, 20.06it/s, Materializing param=model.layers.18.mlp.gate_proj.weight] Loading weights: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 168/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.mlp.gate_proj.weight] Loading weights: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 168/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.mlp.up_proj.weight] Loading weights: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 168/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.mlp.up_proj.weight] Loading weights: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 169/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.post_attention_layernorm.weight] Loading weights: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 169/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.post_attention_layernorm.weight] Loading weights: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 170/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.k_proj.weight] Loading weights: 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 170/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.k_proj.weight] Loading weights: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 171/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.o_proj.weight] Loading weights: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 171/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.o_proj.weight] Loading weights: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 172/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.q_proj.weight] Loading weights: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 172/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.q_proj.weight] Loading weights: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 173/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.v_proj.weight] Loading weights: 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 173/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.18.self_attn.v_proj.weight] Loading weights: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 174/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.input_layernorm.weight] Loading weights: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 174/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.input_layernorm.weight] Loading weights: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 175/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.mlp.down_proj.weight] Loading weights: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 175/291 [00:06<00:09, 12.68it/s, Materializing param=model.layers.19.mlp.down_proj.weight] Loading weights: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 176/291 [00:07<00:10, 11.48it/s, Materializing param=model.layers.19.mlp.down_proj.weight] Loading weights: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 176/291 [00:07<00:10, 11.48it/s, Materializing param=model.layers.19.mlp.gate_proj.weight] Loading weights: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 176/291 [00:07<00:10, 11.48it/s, Materializing param=model.layers.19.mlp.gate_proj.weight] Loading weights: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 177/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.mlp.up_proj.weight] Loading weights: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 177/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.mlp.up_proj.weight] Loading weights: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 178/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.post_attention_layernorm.weight] Loading weights: 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 178/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.post_attention_layernorm.weight] Loading weights: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 179/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.k_proj.weight] Loading weights: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 179/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.k_proj.weight] Loading weights: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 180/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.o_proj.weight] Loading weights: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 180/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.o_proj.weight] Loading weights: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 181/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.q_proj.weight] Loading weights: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 181/291 [00:07<00:09, 11.48it/s, Materializing param=model.layers.19.self_attn.q_proj.weight] Loading weights: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 182/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.19.self_attn.q_proj.weight] Loading weights: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 182/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.19.self_attn.v_proj.weight] Loading weights: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 182/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.19.self_attn.v_proj.weight] Loading weights: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 183/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.input_layernorm.weight] Loading weights: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 183/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.input_layernorm.weight] Loading weights: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 184/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.mlp.down_proj.weight] Loading weights: 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 184/291 [00:07<00:07, 15.04it/s, Materializing param=model.layers.20.mlp.down_proj.weight] Loading weights: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 185/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.down_proj.weight] Loading weights: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 185/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.gate_proj.weight] Loading weights: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 185/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.gate_proj.weight] Loading weights: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 186/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.up_proj.weight] Loading weights: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 186/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.mlp.up_proj.weight] Loading weights: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 187/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.post_attention_layernorm.weight] Loading weights: 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 187/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.post_attention_layernorm.weight] Loading weights: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 188/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.k_proj.weight] Loading weights: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 188/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.k_proj.weight] Loading weights: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 189/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.o_proj.weight] Loading weights: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 189/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.o_proj.weight] Loading weights: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 190/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.q_proj.weight] Loading weights: 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 190/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.q_proj.weight] Loading weights: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 191/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.v_proj.weight] Loading weights: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 191/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.20.self_attn.v_proj.weight] Loading weights: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 192/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.input_layernorm.weight] Loading weights: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 192/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.input_layernorm.weight] Loading weights: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 193/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.mlp.down_proj.weight] Loading weights: 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 193/291 [00:07<00:06, 15.15it/s, Materializing param=model.layers.21.mlp.down_proj.weight] Loading weights: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 194/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.down_proj.weight] Loading weights: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 194/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.gate_proj.weight] Loading weights: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 194/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.gate_proj.weight] Loading weights: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 195/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.up_proj.weight] Loading weights: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 195/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.mlp.up_proj.weight] Loading weights: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 196/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.post_attention_layernorm.weight] Loading weights: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 196/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.post_attention_layernorm.weight] Loading weights: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 197/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.self_attn.k_proj.weight] Loading weights: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 197/291 [00:08<00:04, 21.32it/s, Materializing param=model.layers.21.self_attn.k_proj.weight] Loading weights: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 198/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.k_proj.weight] Loading weights: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 198/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.o_proj.weight] Loading weights: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 198/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.o_proj.weight] Loading weights: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 199/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.q_proj.weight] Loading weights: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 199/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.q_proj.weight] Loading weights: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 200/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.v_proj.weight] Loading weights: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 200/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.21.self_attn.v_proj.weight] Loading weights: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 201/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.input_layernorm.weight] Loading weights: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 201/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.input_layernorm.weight] Loading weights: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 202/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.down_proj.weight] Loading weights: 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 202/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.down_proj.weight] Loading weights: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 203/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.gate_proj.weight] Loading weights: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 203/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.gate_proj.weight] Loading weights: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 204/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.up_proj.weight] Loading weights: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 204/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.mlp.up_proj.weight] Loading weights: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 205/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.post_attention_layernorm.weight] Loading weights: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 205/291 [00:08<00:05, 17.04it/s, Materializing param=model.layers.22.post_attention_layernorm.weight] Loading weights: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 206/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.post_attention_layernorm.weight] Loading weights: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 206/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.k_proj.weight] Loading weights: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 206/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.k_proj.weight] Loading weights: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 207/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.o_proj.weight] Loading weights: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 207/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.o_proj.weight] Loading weights: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 208/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.q_proj.weight] Loading weights: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 208/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.q_proj.weight] Loading weights: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 209/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.v_proj.weight] Loading weights: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 209/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.22.self_attn.v_proj.weight] Loading weights: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 210/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.input_layernorm.weight] Loading weights: 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 210/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.input_layernorm.weight] Loading weights: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 211/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.mlp.down_proj.weight] Loading weights: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 211/291 [00:08<00:03, 23.92it/s, Materializing param=model.layers.23.mlp.down_proj.weight] Loading weights: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 212/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.down_proj.weight] Loading weights: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 212/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.gate_proj.weight] Loading weights: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 212/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.gate_proj.weight] Loading weights: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 213/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.up_proj.weight] Loading weights: 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 213/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.mlp.up_proj.weight] Loading weights: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 214/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.post_attention_layernorm.weight] Loading weights: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 214/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.post_attention_layernorm.weight] Loading weights: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 215/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.self_attn.k_proj.weight] Loading weights: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 215/291 [00:09<00:04, 18.79it/s, Materializing param=model.layers.23.self_attn.k_proj.weight] Loading weights: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 216/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.k_proj.weight] Loading weights: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 216/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.o_proj.weight] Loading weights: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 216/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.o_proj.weight] Loading weights: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 217/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.q_proj.weight] Loading weights: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 217/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.q_proj.weight] Loading weights: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 218/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.v_proj.weight] Loading weights: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 218/291 [00:09<00:05, 13.58it/s, Materializing param=model.layers.23.self_attn.v_proj.weight] Loading weights: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 219/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.23.self_attn.v_proj.weight] Loading weights: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 219/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.input_layernorm.weight] Loading weights: 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 219/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.input_layernorm.weight] Loading weights: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 220/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.mlp.down_proj.weight] Loading weights: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 220/291 [00:10<00:05, 12.64it/s, Materializing param=model.layers.24.mlp.down_proj.weight] Loading weights: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 221/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.down_proj.weight] Loading weights: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 221/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.gate_proj.weight] Loading weights: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 221/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.gate_proj.weight] Loading weights: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 222/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.up_proj.weight] Loading weights: 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 222/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.mlp.up_proj.weight] Loading weights: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 223/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.post_attention_layernorm.weight] Loading weights: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 223/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.post_attention_layernorm.weight] Loading weights: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 224/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.self_attn.k_proj.weight] Loading weights: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 224/291 [00:10<00:08, 8.36it/s, Materializing param=model.layers.24.self_attn.k_proj.weight] Loading weights: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 225/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.o_proj.weight] Loading weights: 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 225/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.o_proj.weight] Loading weights: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 226/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.q_proj.weight] Loading weights: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 226/291 [00:10<00:07, 8.36it/s, Materializing param=model.layers.24.self_attn.q_proj.weight] Loading weights: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 227/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.24.self_attn.q_proj.weight] Loading weights: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 227/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.24.self_attn.v_proj.weight] Loading weights: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 227/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.24.self_attn.v_proj.weight] Loading weights: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 228/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.input_layernorm.weight] Loading weights: 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 228/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.input_layernorm.weight] Loading weights: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 229/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.mlp.down_proj.weight] Loading weights: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 229/291 [00:11<00:07, 8.00it/s, Materializing param=model.layers.25.mlp.down_proj.weight] Loading weights: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 230/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.down_proj.weight] Loading weights: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 230/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.gate_proj.weight] Loading weights: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 230/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.gate_proj.weight] Loading weights: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 231/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.up_proj.weight] Loading weights: 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 231/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.mlp.up_proj.weight] Loading weights: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 232/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.post_attention_layernorm.weight] Loading weights: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 232/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.post_attention_layernorm.weight] Loading weights: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 233/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.self_attn.k_proj.weight] Loading weights: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 233/291 [00:11<00:06, 9.51it/s, Materializing param=model.layers.25.self_attn.k_proj.weight] Loading weights: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 234/291 [00:11<00:05, 9.51it/s, Materializing param=model.layers.25.self_attn.o_proj.weight] Loading weights: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 234/291 [00:11<00:05, 9.51it/s, Materializing param=model.layers.25.self_attn.o_proj.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 235/291 [00:11<00:05, 11.11it/s, Materializing param=model.layers.25.self_attn.o_proj.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 235/291 [00:11<00:05, 11.11it/s, Materializing param=model.layers.25.self_attn.q_proj.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 235/291 [00:11<00:05, 11.11it/s, Materializing param=model.layers.25.self_attn.q_proj.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 236/291 [00:12<00:04, 11.11it/s, Materializing param=model.layers.25.self_attn.v_proj.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 236/291 [00:12<00:04, 11.11it/s, Materializing param=model.layers.25.self_attn.v_proj.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 237/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.25.self_attn.v_proj.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 237/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.input_layernorm.weight] Loading weights: 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 237/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.input_layernorm.weight] Loading weights: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 238/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.mlp.down_proj.weight] Loading weights: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 238/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.mlp.down_proj.weight] Loading weights: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 239/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.down_proj.weight] Loading weights: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 239/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.gate_proj.weight] Loading weights: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 239/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.gate_proj.weight] Loading weights: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 240/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.up_proj.weight] Loading weights: 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 240/291 [00:12<00:05, 9.81it/s, Materializing param=model.layers.26.mlp.up_proj.weight] Loading weights: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 241/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.mlp.up_proj.weight] Loading weights: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 241/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.post_attention_layernorm.weight] Loading weights: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 241/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.post_attention_layernorm.weight] Loading weights: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 242/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.self_attn.k_proj.weight] Loading weights: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 242/291 [00:12<00:04, 10.99it/s, Materializing param=model.layers.26.self_attn.k_proj.weight] Loading weights: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 243/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.k_proj.weight] Loading weights: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 243/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.o_proj.weight] Loading weights: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 243/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.o_proj.weight] Loading weights: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 244/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.q_proj.weight] Loading weights: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 244/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.q_proj.weight] Loading weights: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 245/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.v_proj.weight] Loading weights: 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 245/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.26.self_attn.v_proj.weight] Loading weights: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 246/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.input_layernorm.weight] Loading weights: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 246/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.input_layernorm.weight] Loading weights: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 247/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.mlp.down_proj.weight] Loading weights: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 247/291 [00:12<00:04, 10.77it/s, Materializing param=model.layers.27.mlp.down_proj.weight] Loading weights: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 248/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.down_proj.weight] Loading weights: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 248/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight] Loading weights: 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 248/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.gate_proj.weight] Loading weights: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 249/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight] Loading weights: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 249/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.mlp.up_proj.weight] Loading weights: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 250/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight] Loading weights: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 250/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.post_attention_layernorm.weight] Loading weights: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 251/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight] Loading weights: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 251/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.k_proj.weight] Loading weights: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 252/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight] Loading weights: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 252/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.o_proj.weight] Loading weights: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 253/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight] Loading weights: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 253/291 [00:13<00:04, 9.38it/s, Materializing param=model.layers.27.self_attn.q_proj.weight] Loading weights: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 254/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.27.self_attn.q_proj.weight] Loading weights: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 254/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.27.self_attn.v_proj.weight] Loading weights: 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 254/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.27.self_attn.v_proj.weight] Loading weights: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 255/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.input_layernorm.weight] Loading weights: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 255/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.input_layernorm.weight] Loading weights: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 256/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.mlp.down_proj.weight] Loading weights: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 256/291 [00:13<00:03, 11.34it/s, Materializing param=model.layers.28.mlp.down_proj.weight] Loading weights: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 257/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.down_proj.weight] Loading weights: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 257/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.gate_proj.weight] Loading weights: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 257/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.gate_proj.weight] Loading weights: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 258/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.up_proj.weight] Loading weights: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 258/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.mlp.up_proj.weight] Loading weights: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 259/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.post_attention_layernorm.weight] Loading weights: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 259/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.post_attention_layernorm.weight] Loading weights: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 260/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.k_proj.weight] Loading weights: 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 260/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.k_proj.weight] Loading weights: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 261/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.o_proj.weight] Loading weights: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 261/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.o_proj.weight] Loading weights: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 262/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.q_proj.weight] Loading weights: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 262/291 [00:14<00:03, 8.57it/s, Materializing param=model.layers.28.self_attn.q_proj.weight] Loading weights: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 263/291 [00:14<00:03, 9.16it/s, Materializing param=model.layers.28.self_attn.q_proj.weight] Loading weights: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 263/291 [00:14<00:03, 9.16it/s, Materializing param=model.layers.28.self_attn.v_proj.weight] Loading weights: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 263/291 [00:14<00:03, 9.16it/s, Materializing param=model.layers.28.self_attn.v_proj.weight] Loading weights: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 264/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.input_layernorm.weight] Loading weights: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 264/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.input_layernorm.weight] Loading weights: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 265/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.down_proj.weight] Loading weights: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 265/291 [00:14<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.down_proj.weight] Loading weights: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 266/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.gate_proj.weight] Loading weights: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 266/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.gate_proj.weight] Loading weights: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 267/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.up_proj.weight] Loading weights: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 267/291 [00:15<00:02, 9.16it/s, Materializing param=model.layers.29.mlp.up_proj.weight] Loading weights: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 268/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.mlp.up_proj.weight] Loading weights: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 268/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.post_attention_layernorm.weight] Loading weights: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 268/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.post_attention_layernorm.weight] Loading weights: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 269/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.k_proj.weight] Loading weights: 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 269/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.k_proj.weight] Loading weights: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 270/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.o_proj.weight] Loading weights: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 270/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.o_proj.weight] Loading weights: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 271/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.q_proj.weight] Loading weights: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 271/291 [00:15<00:01, 12.40it/s, Materializing param=model.layers.29.self_attn.q_proj.weight] Loading weights: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 272/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.29.self_attn.q_proj.weight] Loading weights: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 272/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.29.self_attn.v_proj.weight] Loading weights: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 272/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.29.self_attn.v_proj.weight] Loading weights: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 273/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.input_layernorm.weight] Loading weights: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 273/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.input_layernorm.weight] Loading weights: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 274/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.mlp.down_proj.weight] Loading weights: 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 274/291 [00:15<00:01, 15.23it/s, Materializing param=model.layers.30.mlp.down_proj.weight] Loading weights: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 275/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.down_proj.weight] Loading weights: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 275/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.gate_proj.weight] Loading weights: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 275/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.gate_proj.weight] Loading weights: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 276/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.up_proj.weight] Loading weights: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 276/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.mlp.up_proj.weight] Loading weights: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 277/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.post_attention_layernorm.weight] Loading weights: 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 277/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.post_attention_layernorm.weight] Loading weights: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 278/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.self_attn.k_proj.weight] Loading weights: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 278/291 [00:16<00:01, 8.51it/s, Materializing param=model.layers.30.self_attn.k_proj.weight] Loading weights: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 279/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.k_proj.weight] Loading weights: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 279/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.o_proj.weight] Loading weights: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 279/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.o_proj.weight] Loading weights: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 280/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.q_proj.weight] Loading weights: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 280/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.q_proj.weight] Loading weights: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 281/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.v_proj.weight] Loading weights: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 281/291 [00:16<00:01, 9.58it/s, Materializing param=model.layers.30.self_attn.v_proj.weight] Loading weights: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 282/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.input_layernorm.weight] Loading weights: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 282/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.input_layernorm.weight] Loading weights: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 283/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.mlp.down_proj.weight] Loading weights: 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 283/291 [00:16<00:00, 9.58it/s, Materializing param=model.layers.31.mlp.down_proj.weight] Loading weights: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 284/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.down_proj.weight] Loading weights: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 284/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.gate_proj.weight] Loading weights: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 284/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.gate_proj.weight] Loading weights: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 285/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.up_proj.weight] Loading weights: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 285/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.mlp.up_proj.weight] Loading weights: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 286/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.post_attention_layernorm.weight] Loading weights: 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 286/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.post_attention_layernorm.weight] Loading weights: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 287/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.k_proj.weight] Loading weights: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 287/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.k_proj.weight] Loading weights: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 288/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.o_proj.weight] Loading weights: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 288/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.o_proj.weight] Loading weights: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 289/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.q_proj.weight] Loading weights: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 289/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.q_proj.weight] Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 290/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.v_proj.weight] Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 290/291 [00:16<00:00, 9.77it/s, Materializing param=model.layers.31.self_attn.v_proj.weight] Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 291/291 [00:16<00:00, 9.77it/s, Materializing param=model.norm.weight] Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 291/291 [00:16<00:00, 9.77it/s, Materializing param=model.norm.weight] Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 291/291 [00:16<00:00, 17.25it/s, Materializing param=model.norm.weight]
2026-03-06 01:43:07,759 - babelbit.miner.model_loader - INFO - Model loaded successfully on cuda
2026-03-06 01:43:07,760 - babelbit.miner.model_loader - INFO - Model memory footprint: 28.99 GB
2026-03-06 01:43:07,761 - __main__ - INFO - Model loaded successfully on cuda
2026-03-06 01:43:07,761 - __main__ - INFO - βœ… Model loaded successfully
2026-03-06 01:43:07,761 - __main__ - INFO -
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8092 (Press CTRL+C to quit)
INFO: 127.0.0.1:45000 - "GET /healthz HTTP/1.1" 200 OK
2026-03-06 01:43:14,128 - __main__ - INFO - πŸ”“ Dev mode enabled - bypassing Bittensor verification
2026-03-06 01:43:14,128 - __main__ - INFO - Generating prediction for prefix: 'Clarity'
2026-03-06 01:43:14,128 - __main__ - INFO - Using context: ''
2026-03-06 01:43:21,332 - __main__ - ERROR - Error moving model to device: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 23.53 GiB of which 112.62 MiB is free. Including non-PyTorch memory, this process has 23.41 GiB memory in use. Of the allocated memory 23.03 GiB is allocated by PyTorch, and 1.16 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
2026-03-06 01:43:33,940 - __main__ - INFO - Fell back to CPU device
2026-03-06 01:43:33,941 - __main__ - WARNING - Prediction timed out in dev mode; returning empty prediction
INFO: 127.0.0.1:43426 - "POST /predict HTTP/1.1" 200 OK
2026-03-06 01:43:33,946 - __main__ - INFO - πŸ”“ Dev mode enabled - bypassing Bittensor verification
2026-03-06 01:43:33,946 - __main__ - INFO - Generating prediction for prefix: 'Clarity and'
2026-03-06 01:43:33,946 - __main__ - INFO - Using context: ''
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
2026-03-06 01:43:43,946 - __main__ - WARNING - Prediction timed out in dev mode; returning empty prediction
INFO: 127.0.0.1:43426 - "POST /predict HTTP/1.1" 200 OK
2026-03-06 01:43:44,014 - __main__ - INFO - πŸ”“ Dev mode enabled - bypassing Bittensor verification
2026-03-06 01:43:44,014 - __main__ - INFO - Generating prediction for prefix: 'Clarity and empathy,'
2026-03-06 01:43:44,014 - __main__ - INFO - Using context: ''
2026-03-06 01:43:54,014 - __main__ - WARNING - Prediction timed out in dev mode; returning empty prediction
INFO: 127.0.0.1:43426 - "POST /predict HTTP/1.1" 200 OK
2026-03-06 01:43:54,019 - __main__ - INFO - πŸ”“ Dev mode enabled - bypassing Bittensor verification
2026-03-06 01:43:54,020 - __main__ - INFO - Generating prediction for prefix: 'Clarity and empathy, yes,'
2026-03-06 01:43:54,020 - __main__ - INFO - Using context: ''
2026-03-06 01:44:04,021 - __main__ - WARNING - Prediction timed out in dev mode; returning empty prediction
INFO: 127.0.0.1:43426 - "POST /predict HTTP/1.1" 200 OK
2026-03-06 01:44:04,030 - __main__ - INFO - πŸ”“ Dev mode enabled - bypassing Bittensor verification
2026-03-06 01:44:04,031 - __main__ - INFO - Generating prediction for prefix: 'Clarity and empathy, yes, but'
2026-03-06 01:44:04,031 - __main__ - INFO - Using context: ''
INFO: Shutting down
INFO: Finished server process [311701]