File size: 225,774 Bytes
3c94bd0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [2026-02-12 03:54:19,150] [WARNING] [py.warnings._showwarnmsg:110] [PID:6928] /root/axolotl/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
[2026-02-12 03:54:22,709] [WARNING] [huggingface_hub.utils._http._warn_on_warning_headers:779] [PID:6928] Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s][A
Fetching 6 files: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 5/6 [00:10<00:02, 2.19s/it][A
Fetching 6 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [00:10<00:00, 1.83s/it]
Download complete: : 0.00B [00:10, ?B/s]
Download complete: : 0.00B [00:11, ?B/s]
Loading weights: 0%| | 0/363 [00:00<?, ?it/s]
Loading weights: 0%|β | 1/363 [00:00<00:00, 4032.98it/s, Materializing param=lm_head.weight]
Loading weights: 0%|β | 1/363 [00:00<00:00, 1735.33it/s, Materializing param=lm_head.weight]
Loading weights: 1%|β | 2/363 [00:00<00:00, 1862.89it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%|β | 2/363 [00:00<00:00, 1465.77it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%|β | 3/363 [00:00<00:00, 1616.93it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%|β | 3/363 [00:00<00:00, 1352.27it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%|β | 4/363 [00:00<00:00, 1455.47it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 1%|β | 4/363 [00:00<00:00, 1301.16it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 1%|ββ | 5/363 [00:00<00:00, 1360.73it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 1%|ββ | 5/363 [00:00<00:00, 1171.72it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 2%|ββ | 6/363 [00:00<00:00, 1240.55it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|ββ | 6/363 [00:00<00:00, 1150.65it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|ββ | 7/363 [00:00<00:00, 1202.25it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 2%|ββ | 7/363 [00:00<00:00, 1108.14it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 2%|ββ | 8/363 [00:00<00:00, 1136.28it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 2%|ββ | 8/363 [00:00<00:00, 1063.60it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 2%|βββ | 9/363 [00:00<00:00, 1100.67it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 2%|βββ | 9/363 [00:00<00:00, 1041.95it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 3%|βββ | 10/363 [00:00<00:00, 1087.28it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 3%|βββ | 10/363 [00:00<00:00, 1046.61it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 3%|βββ | 11/363 [00:00<00:00, 1087.58it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 3%|βββ | 11/363 [00:00<00:00, 1037.05it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 3%|βββ | 12/363 [00:00<00:00, 1049.30it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 3%|βββ | 12/363 [00:00<00:00, 1019.85it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 4%|ββββ | 13/363 [00:00<00:00, 1061.99it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 4%|ββββ | 13/363 [00:00<00:00, 1034.28it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 4%|ββββ | 14/363 [00:00<00:00, 1073.79it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 4%|ββββ | 14/363 [00:00<00:00, 1048.35it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 4%|ββββ | 15/363 [00:00<00:00, 1078.10it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 4%|ββββ | 15/363 [00:00<00:00, 979.03it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 4%|ββββ | 16/363 [00:00<00:00, 997.40it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 4%|ββββ | 16/363 [00:00<00:00, 975.52it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 5%|βββββ | 17/363 [00:00<00:00, 1003.63it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 5%|βββββ | 17/363 [00:00<00:00, 984.95it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 5%|βββββ | 18/363 [00:00<00:00, 1010.46it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 5%|βββββ | 18/363 [00:00<00:00, 992.38it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 5%|βββββ | 19/363 [00:00<00:00, 1021.40it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 5%|βββββ | 19/363 [00:00<00:00, 1003.89it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 6%|βββββ | 20/363 [00:00<00:00, 1029.71it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 6%|βββββ | 20/363 [00:00<00:00, 1013.25it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 6%|ββββββ | 21/363 [00:00<00:00, 1039.00it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 6%|ββββββ | 21/363 [00:00<00:00, 1022.03it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 6%|ββββββ | 22/363 [00:00<00:00, 1046.76it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 6%|ββββββ | 22/363 [00:00<00:00, 1031.07it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 6%|ββββββ | 23/363 [00:00<00:00, 1053.55it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 6%|ββββββ | 23/363 [00:00<00:00, 1038.55it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 7%|βββββββ | 24/363 [00:00<00:00, 1017.36it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 7%|βββββββ | 24/363 [00:00<00:00, 975.28it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 7%|ββββββ | 25/363 [00:00<00:00, 997.49it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 7%|ββββββ | 25/363 [00:00<00:00, 985.38it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 7%|βββββββ | 26/363 [00:00<00:00, 1005.55it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 7%|βββββββ | 26/363 [00:00<00:00, 994.37it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 7%|βββββββ | 27/363 [00:00<00:00, 1014.63it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 7%|βββββββ | 27/363 [00:00<00:00, 1004.62it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 8%|ββββββββ | 28/363 [00:00<00:00, 1025.89it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 8%|ββββββββ | 28/363 [00:00<00:00, 1015.84it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 8%|ββββββββ | 29/363 [00:00<00:00, 1036.22it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 8%|ββββββββ | 29/363 [00:00<00:00, 1025.56it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 8%|ββββββββ | 30/363 [00:00<00:00, 1043.01it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 8%|ββββββββ | 30/363 [00:00<00:00, 1032.86it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 9%|βββββββββ | 31/363 [00:00<00:00, 1052.40it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 9%|βββββββββ | 31/363 [00:00<00:00, 1041.63it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 9%|βββββββββ | 32/363 [00:00<00:00, 1060.19it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 9%|βββββββββ | 32/363 [00:00<00:00, 1050.32it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 9%|βββββββββ | 33/363 [00:00<00:00, 1068.89it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 9%|βββββββββ | 33/363 [00:00<00:00, 1046.97it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 9%|ββββββββ | 34/363 [00:00<00:00, 1064.60it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 9%|ββββββββ | 34/363 [00:00<00:00, 1055.28it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 10%|βββββββββ | 35/363 [00:00<00:00, 1069.94it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 10%|βββββββββ | 35/363 [00:00<00:00, 1060.58it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 10%|ββββββββββ | 36/363 [00:00<00:00, 1014.72it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 10%|ββββββββββ | 36/363 [00:00<00:00, 1002.06it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 10%|ββββββββββ | 37/363 [00:00<00:00, 1009.74it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 10%|ββββββββββ | 37/363 [00:00<00:00, 1000.14it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 10%|ββββββββββ | 38/363 [00:00<00:00, 1012.19it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 10%|ββββββββββ | 38/363 [00:00<00:00, 1002.91it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 11%|ββββββββββ | 39/363 [00:00<00:00, 1014.73it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 11%|ββββββββββ | 39/363 [00:00<00:00, 1004.91it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 11%|βββββββββββ | 40/363 [00:00<00:00, 1016.99it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 11%|βββββββββββ | 40/363 [00:00<00:00, 1008.52it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 11%|βββββββββββ | 41/363 [00:00<00:00, 1023.98it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 11%|βββββββββββ | 41/363 [00:00<00:00, 1017.23it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 12%|ββββββββββββ | 42/363 [00:00<00:00, 999.45it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 12%|ββββββββββββ | 42/363 [00:00<00:00, 975.28it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 12%|βββββββββββ | 43/363 [00:00<00:00, 982.28it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 12%|βββββββββββ | 43/363 [00:00<00:00, 973.57it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 12%|ββββββββββββ | 44/363 [00:00<00:00, 959.56it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 12%|ββββββββββββ | 44/363 [00:00<00:00, 950.74it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 12%|ββββββββββββ | 45/363 [00:00<00:00, 959.11it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 12%|ββββββββββββ | 45/363 [00:00<00:00, 945.25it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 13%|ββββββββββββ | 46/363 [00:00<00:00, 951.31it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 13%|ββββββββββββ | 46/363 [00:00<00:00, 928.96it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 13%|βββββββββββββ | 47/363 [00:00<00:00, 937.34it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 13%|βββββββββββββ | 47/363 [00:00<00:00, 931.63it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 13%|βββββββββββββ | 48/363 [00:00<00:00, 933.18it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 13%|βββββββββββββ | 48/363 [00:00<00:00, 927.09it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 13%|βββββββββββββ | 49/363 [00:00<00:00, 936.50it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 13%|βββββββββββββ | 49/363 [00:00<00:00, 929.43it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 14%|ββββββββββββββ | 50/363 [00:00<00:00, 938.16it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 14%|ββββββββββββββ | 50/363 [00:00<00:00, 919.38it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 14%|ββββββββββββββ | 51/363 [00:00<00:00, 915.10it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 14%|ββββββββββββββ | 51/363 [00:00<00:00, 908.71it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 14%|βββββββββββββ | 52/363 [00:00<00:00, 916.63it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 14%|βββββββββββββ | 52/363 [00:00<00:00, 909.80it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 15%|ββββββββββββββ | 53/363 [00:00<00:00, 915.92it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 15%|ββββββββββββββ | 53/363 [00:00<00:00, 909.63it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 15%|ββββββββββββββ | 54/363 [00:00<00:00, 918.13it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 15%|ββββββββββββββ | 54/363 [00:00<00:00, 892.60it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 15%|βββββββββββββββ | 55/363 [00:00<00:00, 865.98it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 15%|βββββββββββββββ | 55/363 [00:00<00:00, 850.09it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 15%|βββββββββββββββ | 56/363 [00:00<00:00, 856.94it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 15%|βββββββββββββββ | 56/363 [00:00<00:00, 846.16it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 16%|βββββββββββββββ | 57/363 [00:00<00:00, 853.25it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 16%|βββββββββββββββ | 57/363 [00:00<00:00, 848.69it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 16%|ββββββββββββββββ | 58/363 [00:00<00:00, 855.42it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 16%|ββββββββββββββββ | 58/363 [00:00<00:00, 846.52it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 16%|ββββββββββββββββ | 59/363 [00:00<00:00, 826.58it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 16%|ββββββββββββββββ | 59/363 [00:00<00:00, 816.20it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 17%|βββββββββββββββββ | 60/363 [00:00<00:00, 824.26it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 17%|βββββββββββββββββ | 60/363 [00:00<00:00, 821.13it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 17%|βββββββββββββββ | 61/363 [00:00<00:00, 829.70it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 17%|βββββββββββββββ | 61/363 [00:00<00:00, 826.40it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 17%|ββββββββββββββββ | 62/363 [00:00<00:00, 834.99it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 17%|ββββββββββββββββ | 62/363 [00:00<00:00, 831.76it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 17%|βββββββββββββββββ | 63/363 [00:00<00:00, 840.21it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 17%|βββββββββββββββββ | 63/363 [00:00<00:00, 837.01it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 18%|βββββββββββββββββ | 64/363 [00:00<00:00, 845.31it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 18%|βββββββββββββββββ | 64/363 [00:00<00:00, 841.98it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 18%|βββββββββββββββββ | 65/363 [00:00<00:00, 850.02it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 18%|βββββββββββββββββ | 65/363 [00:00<00:00, 847.04it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 18%|ββββββββββββββββββ | 66/363 [00:00<00:00, 855.42it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 18%|ββββββββββββββββββ | 66/363 [00:00<00:00, 837.50it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 18%|ββββββββββββββββββ | 67/363 [00:00<00:00, 839.23it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 18%|ββββββββββββββββββ | 67/363 [00:00<00:00, 835.48it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 19%|βββββββββββββββββββ | 68/363 [00:00<00:00, 838.56it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 19%|βββββββββββββββββββ | 68/363 [00:00<00:00, 829.33it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 19%|βββββββββββββββββββ | 69/363 [00:00<00:00, 815.09it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 19%|βββββββββββββββββββ | 69/363 [00:00<00:00, 809.53it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 19%|βββββββββββββββββ | 70/363 [00:00<00:00, 814.96it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 19%|βββββββββββββββββ | 70/363 [00:00<00:00, 810.84it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 20%|βββββββββββββββββββ | 71/363 [00:00<00:00, 814.85it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 20%|βββββββββββββββββββ | 71/363 [00:00<00:00, 808.66it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 20%|βββββββββββββββββββ | 72/363 [00:00<00:00, 808.29it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 20%|βββββββββββββββββββ | 72/363 [00:00<00:00, 792.52it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 20%|βββββββββββββββββββ | 73/363 [00:00<00:00, 794.44it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 20%|βββββββββββββββββββ | 73/363 [00:00<00:00, 788.74it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 20%|ββββββββββββββββββββ | 74/363 [00:00<00:00, 791.36it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 20%|ββββββββββββββββββββ | 74/363 [00:00<00:00, 785.39it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 21%|ββββββββββββββββββββ | 75/363 [00:00<00:00, 789.96it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 21%|ββββββββββββββββββββ | 75/363 [00:00<00:00, 773.01it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 21%|βββββββββββββββββββββ | 76/363 [00:00<00:00, 772.32it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 21%|βββββββββββββββββββββ | 76/363 [00:00<00:00, 769.03it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 21%|βββββββββββββββββββββ | 77/363 [00:00<00:00, 774.52it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 21%|βββββββββββββββββββββ | 77/363 [00:00<00:00, 771.52it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 21%|βββββββββββββββββββββ | 78/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 21%|ββββββββββββββββββββββ | 78/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 21%|ββββββββββββββββββββββ | 78/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 22%|βββββββββββββββββββ | 79/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 22%|βββββββββββββββββββ | 79/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 22%|βββββββββββββββββββββ | 80/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 22%|βββββββββββββββββββββ | 80/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 22%|βββββββββββββββββββββ | 81/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 22%|βββββββββββββββββββββ | 81/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 23%|ββββββββββββββββββββββ | 82/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 23%|ββββββββββββββββββββββ | 82/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 23%|ββββββββββββββββββββββ | 83/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 23%|ββββββββββββββββββββββ | 83/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 23%|ββββββββββββββββββββββ | 84/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 23%|ββββββββββββββββββββββ | 84/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 23%|βββββββββββββββββββββββ | 85/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 23%|βββββββββββββββββββββββ | 85/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 24%|βββββββββββββββββββββββ | 86/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 24%|βββββββββββββββββββββββ | 86/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 24%|ββββββββββββββββββββββββ | 87/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 24%|ββββββββββββββββββββββββ | 87/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 24%|βββββββββββββββββββββ | 88/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 24%|βββββββββββββββββββββ | 88/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 25%|βββββββββββββββββββββββ | 89/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 25%|βββββββββββββββββββββββ | 89/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 25%|ββββββββββββββββββββββββ | 90/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 25%|ββββββββββββββββββββββββ | 90/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 25%|ββββββββββββββββββββββββ | 91/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 25%|ββββββββββββββββββββββββ | 91/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 25%|ββββββββββββββββββββββββ | 92/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 25%|ββββββββββββββββββββββββ | 92/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 26%|ββββββββββββββββββββββββ | 93/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 26%|ββββββββββββββββββββββββ | 93/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 26%|βββββββββββββββββββββββββ | 94/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 26%|βββββββββββββββββββββββββ | 94/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 26%|βββββββββββββββββββββββββ | 95/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 26%|βββββββββββββββββββββββββ | 95/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 26%|ββββββββββββββββββββββββββ | 96/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 26%|ββββββββββββββββββββββββββ | 96/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 27%|βββββββββββββββββββββββ | 97/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 27%|βββββββββββββββββββββββ | 97/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 27%|βββββββββββββββββββββββββ | 98/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 27%|βββββββββββββββββββββββββ | 98/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 27%|ββββββββββββββββββββββββββ | 99/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 27%|ββββββββββββββββββββββββββ | 99/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 28%|ββββββββββββββββββββββββββ | 100/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 28%|ββββββββββββββββββββββββββ | 100/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 28%|ββββββββββββββββββββββββββ | 101/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 28%|ββββββββββββββββββββββββββ | 101/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 28%|βββββββββββββββββββββββββββ | 102/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 28%|βββββββββββββββββββββββββββ | 102/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 28%|βββββββββββββββββββββββββββ | 103/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 28%|βββββββββββββββββββββββββββ | 103/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 29%|ββββββββββββββββββββββββββββ | 104/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 29%|ββββββββββββββββββββββββββββ | 104/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 29%|ββββββββββββββββββββββββββββ | 105/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 29%|ββββββββββββββββββββββββββββ | 105/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 29%|βββββββββββββββββββββββββ | 106/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 29%|βββββββββββββββββββββββββ | 106/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 29%|βββββββββββββββββββββββββββ | 107/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 29%|βββββββββββββββββββββββββββ | 107/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 30%|ββββββββββββββββββββββββββββ | 108/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 30%|ββββββββββββββββββββββββββββ | 108/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 30%|ββββββββββββββββββββββββββββ | 109/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 30%|ββββββββββββββββββββββββββββ | 109/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 30%|ββββββββββββββββββββββββββββ | 110/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 30%|ββββββββββββββββββββββββββββ | 110/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 31%|βββββββββββββββββββββββββββββ | 111/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 31%|βββββββββββββββββββββββββββββ | 111/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 31%|ββββββββββββββββββββββββββββββ | 112/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 31%|ββββββββββββββββββββββββββββββ | 112/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 31%|ββββββββββββββββββββββββββββββ | 113/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 31%|ββββββββββββββββββββββββββββββ | 113/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 31%|βββββββββββββββββββββββββββββββ | 114/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 31%|βββββββββββββββββββββββββββββββ | 114/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 32%|βββββββββββββββββββββββββββ | 115/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 32%|βββββββββββββββββββββββββββ | 115/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 32%|ββββββββββββββββββββββββββββββ | 116/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 32%|ββββββββββββββββββββββββββββββ | 116/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 32%|ββββββββββββββββββββββββββββββ | 117/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 32%|ββββββββββββββββββββββββββββββ | 117/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 33%|ββββββββββββββββββββββββββββββ | 118/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 33%|ββββββββββββββββββββββββββββββ | 118/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 33%|βββββββββββββββββββββββββββββββ | 119/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 33%|βββββββββββββββββββββββββββββββ | 119/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 33%|βββββββββββββββββββββββββββββββ | 120/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 33%|βββββββββββββββββββββββββββββββ | 120/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 33%|ββββββββββββββββββββββββββββββββ | 121/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 33%|ββββββββββββββββββββββββββββββββ | 121/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 34%|ββββββββββββββββββββββββββββββββ | 122/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 34%|ββββββββββββββββββββββββββββββββ | 122/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 34%|βββββββββββββββββββββββββββββββββ | 123/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 34%|βββββββββββββββββββββββββββββββββ | 123/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 34%|βββββββββββββββββββββββββββββ | 124/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 34%|βββββββββββββββββββββββββββββ | 124/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 34%|ββββββββββββββββββββββββββββββββ | 125/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 34%|ββββββββββββββββββββββββββββββββ | 125/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 35%|ββββββββββββββββββββββββββββββββ | 126/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 35%|ββββββββββββββββββββββββββββββββ | 126/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 35%|βββββββββββββββββββββββββββββββββ | 127/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 35%|βββββββββββββββββββββββββββββββββ | 127/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 35%|βββββββββββββββββββββββββββββββββ | 128/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 35%|βββββββββββββββββββββββββββββββββ | 128/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 36%|βββββββββββββββββββββββββββββββββ | 129/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 36%|βββββββββββββββββββββββββββββββββ | 129/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 36%|ββββββββββββββββββββββββββββββββββ | 130/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 36%|ββββββββββββββββββββββββββββββββββ | 130/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 36%|βββββββββββββββββββββββββββββββββββ | 131/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 36%|βββββββββββββββββββββββββββββββββββ | 131/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 36%|ββββββββββββββββββββββββββββββββββββ | 132/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 36%|ββββββββββββββββββββββββββββββββββββ | 132/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 37%|βββββββββββββββββββββββββββββββ | 133/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 37%|βββββββββββββββββββββββββββββββ | 133/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 37%|ββββββββββββββββββββββββββββββββββ | 134/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 37%|ββββββββββββββββββββββββββββββββββ | 134/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 37%|βββββββββββββββββββββββββββββββββββ | 135/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 37%|βββββββββββββββββββββββββββββββββββ | 135/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 37%|βββββββββββββββββββββββββββββββββββ | 136/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 37%|βββββββββββββββββββββββββββββββββββ | 136/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 38%|βββββββββββββββββββββββββββββββββββ | 137/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 38%|βββββββββββββββββββββββββββββββββββ | 137/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 38%|ββββββββββββββββββββββββββββββββββββ | 138/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 38%|ββββββββββββββββββββββββββββββββββββ | 138/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 38%|βββββββββββββββββββββββββββββββββββββ | 139/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 38%|βββββββββββββββββββββββββββββββββββββ | 139/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 39%|βββββββββββββββββββββββββββββββββββββ | 140/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 39%|βββββββββββββββββββββββββββββββββββββ | 140/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 39%|ββββββββββββββββββββββββββββββββββββββ | 141/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 39%|ββββββββββββββββββββββββββββββββββββββ | 141/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 39%|βββββββββββββββββββββββββββββββββ | 142/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 39%|βββββββββββββββββββββββββββββββββ | 142/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 39%|βββββββββββββββββββββββββββββββββββββ | 143/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 39%|βββββββββββββββββββββββββββββββββββββ | 143/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββββββββββββββββ | 144/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββββββββββββββββ | 144/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββββββββββββββββ | 145/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββββββββββββββββ | 145/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββββββββββββββββ | 146/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββββββββββββββββ | 146/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 40%|ββββββββββββββββββββββββββββββββββββββ | 147/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 40%|ββββββββββββββββββββββββββββββββββββββ | 147/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 41%|βββββββββββββββββββββββββββββββββββββββ | 148/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 41%|βββββββββββββββββββββββββββββββββββββββ | 148/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 41%|βββββββββββββββββββββββββββββββββββββββ | 149/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 41%|βββββββββββββββββββββββββββββββββββββββ | 149/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 41%|ββββββββββββββββββββββββββββββββββββββββ | 150/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 41%|ββββββββββββββββββββββββββββββββββββββββ | 150/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββ | 151/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββ | 151/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββββββ | 152/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββββββ | 152/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββββββ | 153/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββββββ | 153/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββββββ | 154/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββββββββββββββββ | 154/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 43%|ββββββββββββββββββββββββββββββββββββββββ | 155/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 43%|ββββββββββββββββββββββββββββββββββββββββ | 155/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 43%|ββββββββββββββββββββββββββββββββββββββββ | 156/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 43%|ββββββββββββββββββββββββββββββββββββββββ | 156/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 43%|βββββββββββββββββββββββββββββββββββββββββ | 157/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 43%|βββββββββββββββββββββββββββββββββββββββββ | 157/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββββββββββββββββββββββ | 158/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββββββββββββββββββββββ | 158/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 44%|βββββββββββββββββββββββββββββββββββββββββββ | 159/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 44%|βββββββββββββββββββββββββββββββββββββββββββ | 159/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 44%|βββββββββββββββββββββββββββββββββββββ | 160/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 44%|βββββββββββββββββββββββββββββββββββββ | 160/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 44%|βββββββββββββββββββββββββββββββββββββββββ | 161/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 44%|βββββββββββββββββββββββββββββββββββββββββ | 161/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 45%|βββββββββββββββββββββββββββββββββββββββββ | 162/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 45%|βββββββββββββββββββββββββββββββββββββββββ | 162/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββββββββββββββββββ | 163/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββββββββββββββββββ | 163/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββββββββββββββββββ | 164/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββββββββββββββββββ | 164/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 45%|βββββββββββββββββββββββββββββββββββββββββββ | 165/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 45%|βββββββββββββββββββββββββββββββββββββββββββ | 165/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββββββββββββββββββ | 166/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββββββββββββββββββ | 166/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββββββββββββββββββ | 167/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββββββββββββββββββ | 167/363 [00:00<00:00, 777.45it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββββββββββββββββββ | 168/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 46%|βββββββββββββββββββββββββββββββββββββββββββββ | 168/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 46%|βββββββββββββββββββββββββββββββββββββββββββββ | 168/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββββββββββββββββββ | 169/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 47%|βββββββββββββββββββββββββββββββββββββββ | 169/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 47%|βββββββββββββββββββββββββββββββββββββββββββ | 170/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββββββββββββββββββββββ | 170/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 47%|ββββββββββββββββββββββββββββββββββββββββββββ | 171/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 47%|ββββββββββββββββββββββββββββββββββββββββββββ | 171/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 47%|ββββββββββββββββββββββββββββββββββββββββββββ | 172/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 47%|ββββββββββββββββββββββββββββββββββββββββββββ | 172/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββββββββββββββββββ | 173/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββββββββββββββββββ | 173/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 48%|βββββββββββββββββββββββββββββββββββββββββββββ | 174/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 48%|βββββββββββββββββββββββββββββββββββββββββββββ | 174/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββββββββββββββββββββ | 175/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββββββββββββββββββββ | 175/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββββββββββββββββββββ | 176/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββββββββββββββββββββ | 176/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 177/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 177/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββββββββββββββββ | 178/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββββββββββββββββ | 178/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββββββββββββββββββββ | 179/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββββββββββββββββββββ | 179/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 50%|ββββββββββββββββββββββββββββββββββββββββββββββ | 180/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 50%|ββββββββββββββββββββββββββββββββββββββββββββββ | 180/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 50%|ββββββββββββββββββββββββββββββββββββββββββββββ | 181/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 50%|ββββββββββββββββββββββββββββββββββββββββββββββ | 181/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββββββββββββββββββββ | 182/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββββββββββββββββββββ | 182/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββββββββββββββββββββ | 183/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββββββββββββββββββββ | 183/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 184/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 184/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 185/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 185/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 51%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 186/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 51%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 186/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββββββββββββββββ | 187/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββββββββββββββββ | 187/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 188/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 188/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 189/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 189/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 52%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 190/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 52%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 190/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 191/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 191/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 192/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 192/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 193/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 193/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 194/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 194/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 195/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 195/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββββββββββββββ | 196/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββββββββββββββ | 196/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 197/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 197/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 198/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 198/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 199/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 199/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 200/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 202/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 202/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 203/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 203/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 204/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 56%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 204/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 205/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 205/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 206/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 206/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 207/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 207/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 208/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 208/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 209/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 209/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 210/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 210/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 211/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 211/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 212/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 213/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 213/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 59%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 214/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 59%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 214/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 215/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 215/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 216/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 216/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 217/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 217/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 218/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 218/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 219/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 219/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 220/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 220/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 221/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 221/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 222/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 222/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 224/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 225/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 225/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 226/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 226/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 227/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 227/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 228/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 228/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 229/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 229/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 230/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 63%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 230/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 231/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 231/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 232/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 232/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 233/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 233/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 234/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 234/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 235/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 235/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 236/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 65%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 236/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 65%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 237/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 238/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 238/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 239/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 239/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 240/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 240/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 241/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 66%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 241/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 242/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 242/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 243/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 243/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 244/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 244/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 245/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 245/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 246/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 246/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 248/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 248/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 249/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 69%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 249/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 250/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 250/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 251/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 251/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 252/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 252/363 [00:00<00:00, 847.67it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 253/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 253/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 253/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 254/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 70%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 254/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 255/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 255/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 256/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 256/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights: 71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 257/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights: 71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 257/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 258/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 258/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights: 71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 259/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights: 71%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 259/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 260/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 261/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 261/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 262/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 262/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 263/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 263/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 264/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 264/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 265/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 265/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 266/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 266/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 267/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 267/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 268/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights: 74%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 268/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 269/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 269/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 271/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 271/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 272/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 272/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 273/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 273/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 274/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 274/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 275/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 275/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 276/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 276/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 277/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 277/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights: 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 278/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 278/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 279/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 279/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 280/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights: 77%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 280/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 281/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 281/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 282/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 282/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights: 78%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 283/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 284/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 284/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 285/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 285/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 286/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 286/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 287/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 287/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 288/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 288/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 289/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 289/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 290/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 290/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.input_layernorm.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.input_layernorm.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 292/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.mlp.down_proj.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 292/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.mlp.down_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 293/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.mlp.gate_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 293/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.mlp.gate_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 294/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.mlp.up_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 294/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.mlp.up_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 295/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.post_attention_layernorm.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 295/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.post_attention_layernorm.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 296/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.k_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 296/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.k_proj.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 297/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.o_proj.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 297/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.o_proj.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 298/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.q_proj.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 298/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.q_proj.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 299/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.v_proj.weight]
Loading weights: 82%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 299/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.32.self_attn.v_proj.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 300/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.input_layernorm.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 300/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.input_layernorm.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 301/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.mlp.down_proj.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 301/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.mlp.down_proj.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 302/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.mlp.gate_proj.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 302/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.mlp.gate_proj.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 303/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.mlp.up_proj.weight]
Loading weights: 83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 303/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.mlp.up_proj.weight]
Loading weights: 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 304/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.post_attention_layernorm.weight]
Loading weights: 84%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 304/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.post_attention_layernorm.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 305/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.k_proj.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 305/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.k_proj.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 306/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.o_proj.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 306/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.o_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 307/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.q_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 307/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.q_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 308/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.v_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 308/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.33.self_attn.v_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 309/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.input_layernorm.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 309/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.input_layernorm.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 310/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.mlp.down_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 310/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.mlp.down_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 311/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.mlp.gate_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 311/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.mlp.gate_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 312/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.mlp.up_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 312/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.mlp.up_proj.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.post_attention_layernorm.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.post_attention_layernorm.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 314/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.k_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 314/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.k_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 315/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.o_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 315/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.o_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 316/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.q_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 316/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.q_proj.weight]
Loading weights: 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 317/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.v_proj.weight]
Loading weights: 87%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 317/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.34.self_attn.v_proj.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 318/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.input_layernorm.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 318/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.input_layernorm.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 319/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.mlp.down_proj.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 319/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.mlp.down_proj.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 320/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.mlp.gate_proj.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 320/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.mlp.gate_proj.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 321/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.mlp.up_proj.weight]
Loading weights: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 321/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.mlp.up_proj.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 322/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.post_attention_layernorm.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 322/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.post_attention_layernorm.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 323/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.k_proj.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 323/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.k_proj.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 324/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.o_proj.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 324/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.o_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 325/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.q_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 325/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.q_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 326/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.v_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 326/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.35.self_attn.v_proj.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 327/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.input_layernorm.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 327/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.input_layernorm.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 328/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.mlp.down_proj.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 328/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.mlp.down_proj.weight]
Loading weights: 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 329/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.mlp.gate_proj.weight]
Loading weights: 91%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 329/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.mlp.gate_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 330/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.mlp.up_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 330/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.mlp.up_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 331/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.post_attention_layernorm.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 331/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.post_attention_layernorm.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 332/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.k_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 332/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.k_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 333/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.o_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 333/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.o_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 334/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.q_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 334/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.q_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 335/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.v_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 335/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.36.self_attn.v_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 336/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.input_layernorm.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 336/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.input_layernorm.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 337/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.mlp.down_proj.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 337/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.mlp.down_proj.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 338/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.mlp.gate_proj.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 338/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.mlp.gate_proj.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 339/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.mlp.up_proj.weight]
Loading weights: 93%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 339/363 [00:00<00:00, 767.53it/s, Materializing param=model.layers.37.mlp.up_proj.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 340/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.mlp.up_proj.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 340/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.post_attention_layernorm.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 340/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.post_attention_layernorm.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 341/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.k_proj.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 341/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.k_proj.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 342/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.o_proj.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 342/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.o_proj.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 343/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.q_proj.weight]
Loading weights: 94%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 343/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.q_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 344/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.v_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 344/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.37.self_attn.v_proj.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 345/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.input_layernorm.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 345/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.input_layernorm.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 346/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.mlp.down_proj.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 346/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.mlp.down_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 347/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.mlp.gate_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 347/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.mlp.gate_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 348/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.mlp.up_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 348/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.mlp.up_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 349/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.post_attention_layernorm.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 349/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.post_attention_layernorm.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 350/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.k_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 350/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.k_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 351/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.o_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 351/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.o_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 352/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.q_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 352/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.q_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 353/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.v_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 353/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.38.self_attn.v_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 354/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.input_layernorm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 354/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.input_layernorm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 355/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.mlp.down_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 355/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.mlp.down_proj.weight]
Loading weights: 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 356/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.mlp.gate_proj.weight]
Loading weights: 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 356/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.mlp.gate_proj.weight]
Loading weights: 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 357/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.mlp.up_proj.weight]
Loading weights: 98%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 357/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.mlp.up_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 358/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.post_attention_layernorm.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 358/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.post_attention_layernorm.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 359/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.k_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 359/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.k_proj.weight]
Loading weights: 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 360/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.o_proj.weight]
Loading weights: 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 360/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.o_proj.weight]
Loading weights: 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 361/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.q_proj.weight]
Loading weights: 99%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 361/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.q_proj.weight]
Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 362/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.v_proj.weight]
Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 362/363 [00:00<00:00, 802.59it/s, Materializing param=model.layers.39.self_attn.v_proj.weight]
Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:00<00:00, 802.59it/s, Materializing param=model.norm.weight]
Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:00<00:00, 802.59it/s, Materializing param=model.norm.weight]
Loading weights: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:00<00:00, 806.46it/s, Materializing param=model.norm.weight]
[2026-02-12 04:04:14,796] [WARNING] [accelerate.utils.dataclasses.__post_init__:1962] [PID:6928] sync_module_states is obsolete in FSDP2, as it is not needed anymore.Setting sync_module_states to None.Multiple deprecation warnings due to FSDP2 conversion:
sharding_strategy is deprecated in favor of reshard_after_forward. This will be removed in a future version of Accelerate.
[2026-02-12 04:04:22,096] [WARNING] [py.warnings._showwarnmsg:110] [PID:6928] /root/axolotl/.venv/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
[0m |