File size: 174,059 Bytes
fdc1d06 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [2026-02-06 15:45:47,242] [WARNING] [huggingface_hub.utils._http._warn_on_warning_headers:779] [PID:7207] Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
[2026-02-06 15:45:48,737] [WARNING] [py.warnings._showwarnmsg:110] [PID:7207] /root/axolotl/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
Downloading (incomplete total...): 0.00B [00:00, ?B/s]
Fetching 5 files: 0%| | 0/5 [00:00<?, ?it/s][A
Fetching 5 files: 20%|βββββββββββββββββββββββββ | 1/5 [00:30<02:01, 30.46s/it][A
Fetching 5 files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:30<00:00, 6.09s/it]
Download complete: : 0.00B [00:30, ?B/s]
Download complete: : 0.00B [00:30, ?B/s]
Loading weights: 0%| | 0/363 [00:00<?, ?it/s]
Loading weights: 0%|β | 1/363 [00:00<00:00, 9731.56it/s, Materializing param=lm_head.weight]
Loading weights: 0%|β | 1/363 [00:00<00:00, 4219.62it/s, Materializing param=lm_head.weight]
Loading weights: 1%|β | 2/363 [00:00<00:00, 4454.92it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%|β | 2/363 [00:00<00:00, 3539.50it/s, Materializing param=model.embed_tokens.weight]
Loading weights: 1%|β | 3/363 [00:00<00:00, 4051.16it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%|β | 3/363 [00:00<00:00, 3553.49it/s, Materializing param=model.layers.0.input_layernorm.weight]
Loading weights: 1%|β | 4/363 [00:00<00:00, 3981.30it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 1%|β | 4/363 [00:00<00:00, 3622.02it/s, Materializing param=model.layers.0.mlp.down_proj.weight]
Loading weights: 1%|β | 5/363 [00:00<00:00, 3811.62it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 1%|β | 5/363 [00:00<00:00, 3549.68it/s, Materializing param=model.layers.0.mlp.gate_proj.weight]
Loading weights: 2%|β | 6/363 [00:00<00:00, 3430.92it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|β | 6/363 [00:00<00:00, 3243.44it/s, Materializing param=model.layers.0.mlp.up_proj.weight]
Loading weights: 2%|β | 7/363 [00:00<00:00, 3500.67it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 2%|β | 7/363 [00:00<00:00, 3339.03it/s, Materializing param=model.layers.0.post_attention_layernorm.weight]
Loading weights: 2%|ββ | 8/363 [00:00<00:00, 2977.06it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 2%|ββ | 8/363 [00:00<00:00, 2752.39it/s, Materializing param=model.layers.0.self_attn.k_proj.weight]
Loading weights: 2%|ββ | 9/363 [00:00<00:00, 2928.30it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 2%|ββ | 9/363 [00:00<00:00, 2749.96it/s, Materializing param=model.layers.0.self_attn.o_proj.weight]
Loading weights: 3%|ββ | 10/363 [00:00<00:00, 2913.72it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 3%|ββ | 10/363 [00:00<00:00, 2780.45it/s, Materializing param=model.layers.0.self_attn.q_proj.weight]
Loading weights: 3%|ββ | 11/363 [00:00<00:00, 2935.32it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 3%|ββ | 11/363 [00:00<00:00, 2861.94it/s, Materializing param=model.layers.0.self_attn.v_proj.weight]
Loading weights: 3%|ββ | 12/363 [00:00<00:00, 3007.75it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 3%|ββ | 12/363 [00:00<00:00, 2937.87it/s, Materializing param=model.layers.1.input_layernorm.weight]
Loading weights: 4%|βββ | 13/363 [00:00<00:00, 3076.56it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 4%|βββ | 13/363 [00:00<00:00, 3009.66it/s, Materializing param=model.layers.1.mlp.down_proj.weight]
Loading weights: 4%|βββ | 14/363 [00:00<00:00, 3023.39it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 4%|βββ | 14/363 [00:00<00:00, 2959.24it/s, Materializing param=model.layers.1.mlp.gate_proj.weight]
Loading weights: 4%|βββ | 15/363 [00:00<00:00, 2960.55it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 4%|βββ | 15/363 [00:00<00:00, 2870.58it/s, Materializing param=model.layers.1.mlp.up_proj.weight]
Loading weights: 4%|βββ | 16/363 [00:00<00:00, 2975.74it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 4%|βββ | 16/363 [00:00<00:00, 2897.12it/s, Materializing param=model.layers.1.post_attention_layernorm.weight]
Loading weights: 5%|βββ | 17/363 [00:00<00:00, 2998.33it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 5%|βββ | 17/363 [00:00<00:00, 2949.21it/s, Materializing param=model.layers.1.self_attn.k_proj.weight]
Loading weights: 5%|βββ | 18/363 [00:00<00:00, 3045.97it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 5%|βββ | 18/363 [00:00<00:00, 2845.20it/s, Materializing param=model.layers.1.self_attn.o_proj.weight]
Loading weights: 5%|βββ | 19/363 [00:00<00:00, 2905.81it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 5%|βββ | 19/363 [00:00<00:00, 2794.34it/s, Materializing param=model.layers.1.self_attn.q_proj.weight]
Loading weights: 6%|ββββ | 20/363 [00:00<00:00, 2876.16it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 6%|ββββ | 20/363 [00:00<00:00, 2836.86it/s, Materializing param=model.layers.1.self_attn.v_proj.weight]
Loading weights: 6%|ββββ | 21/363 [00:00<00:00, 2917.63it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 6%|ββββ | 21/363 [00:00<00:00, 2878.82it/s, Materializing param=model.layers.2.input_layernorm.weight]
Loading weights: 6%|ββββ | 22/363 [00:00<00:00, 2896.62it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 6%|ββββ | 22/363 [00:00<00:00, 2859.55it/s, Materializing param=model.layers.2.mlp.down_proj.weight]
Loading weights: 6%|ββββ | 23/363 [00:00<00:00, 2921.62it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 6%|ββββ | 23/363 [00:00<00:00, 2865.30it/s, Materializing param=model.layers.2.mlp.gate_proj.weight]
Loading weights: 7%|βββββ | 24/363 [00:00<00:00, 2937.44it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 7%|βββββ | 24/363 [00:00<00:00, 2886.90it/s, Materializing param=model.layers.2.mlp.up_proj.weight]
Loading weights: 7%|ββββ | 25/363 [00:00<00:00, 2954.57it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 7%|ββββ | 25/363 [00:00<00:00, 2919.52it/s, Materializing param=model.layers.2.post_attention_layernorm.weight]
Loading weights: 7%|βββββ | 26/363 [00:00<00:00, 2964.01it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 7%|βββββ | 26/363 [00:00<00:00, 2931.35it/s, Materializing param=model.layers.2.self_attn.k_proj.weight]
Loading weights: 7%|βββββ | 27/363 [00:00<00:00, 2994.98it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 7%|βββββ | 27/363 [00:00<00:00, 2964.02it/s, Materializing param=model.layers.2.self_attn.o_proj.weight]
Loading weights: 8%|βββββ | 28/363 [00:00<00:00, 2860.15it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 8%|βββββ | 28/363 [00:00<00:00, 2830.92it/s, Materializing param=model.layers.2.self_attn.q_proj.weight]
Loading weights: 8%|βββββ | 29/363 [00:00<00:00, 2890.70it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 8%|βββββ | 29/363 [00:00<00:00, 2808.40it/s, Materializing param=model.layers.2.self_attn.v_proj.weight]
Loading weights: 8%|βββββ | 30/363 [00:00<00:00, 2840.00it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 8%|βββββ | 30/363 [00:00<00:00, 2755.42it/s, Materializing param=model.layers.3.input_layernorm.weight]
Loading weights: 9%|ββββββ | 31/363 [00:00<00:00, 2809.01it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 9%|ββββββ | 31/363 [00:00<00:00, 2784.88it/s, Materializing param=model.layers.3.mlp.down_proj.weight]
Loading weights: 9%|ββββββ | 32/363 [00:00<00:00, 2839.32it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 9%|ββββββ | 32/363 [00:00<00:00, 2815.91it/s, Materializing param=model.layers.3.mlp.gate_proj.weight]
Loading weights: 9%|ββββββ | 33/363 [00:00<00:00, 2865.79it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 9%|ββββββ | 33/363 [00:00<00:00, 2842.78it/s, Materializing param=model.layers.3.mlp.up_proj.weight]
Loading weights: 9%|βββββ | 34/363 [00:00<00:00, 2894.09it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 9%|βββββ | 34/363 [00:00<00:00, 2871.19it/s, Materializing param=model.layers.3.post_attention_layernorm.weight]
Loading weights: 10%|ββββββ | 35/363 [00:00<00:00, 2921.12it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 10%|ββββββ | 35/363 [00:00<00:00, 2897.93it/s, Materializing param=model.layers.3.self_attn.k_proj.weight]
Loading weights: 10%|ββββββ | 36/363 [00:00<00:00, 2946.70it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 10%|ββββββ | 36/363 [00:00<00:00, 2924.67it/s, Materializing param=model.layers.3.self_attn.o_proj.weight]
Loading weights: 10%|ββββββ | 37/363 [00:00<00:00, 2970.36it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 10%|ββββββ | 37/363 [00:00<00:00, 2947.17it/s, Materializing param=model.layers.3.self_attn.q_proj.weight]
Loading weights: 10%|βββββββ | 38/363 [00:00<00:00, 2989.92it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 10%|βββββββ | 38/363 [00:00<00:00, 2967.87it/s, Materializing param=model.layers.3.self_attn.v_proj.weight]
Loading weights: 11%|βββββββ | 39/363 [00:00<00:00, 3012.59it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 11%|βββββββ | 39/363 [00:00<00:00, 2990.89it/s, Materializing param=model.layers.4.input_layernorm.weight]
Loading weights: 11%|βββββββ | 40/363 [00:00<00:00, 3035.39it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 11%|βββββββ | 40/363 [00:00<00:00, 3014.23it/s, Materializing param=model.layers.4.mlp.down_proj.weight]
Loading weights: 11%|βββββββ | 41/363 [00:00<00:00, 2935.43it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 11%|βββββββ | 41/363 [00:00<00:00, 2913.90it/s, Materializing param=model.layers.4.mlp.gate_proj.weight]
Loading weights: 12%|ββββββββ | 42/363 [00:00<00:00, 2955.02it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 12%|ββββββββ | 42/363 [00:00<00:00, 2928.84it/s, Materializing param=model.layers.4.mlp.up_proj.weight]
Loading weights: 12%|ββββββ | 43/363 [00:00<00:00, 2954.70it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 12%|ββββββ | 43/363 [00:00<00:00, 2934.85it/s, Materializing param=model.layers.4.post_attention_layernorm.weight]
Loading weights: 12%|ββββββββ | 44/363 [00:00<00:00, 2973.73it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 12%|ββββββββ | 44/363 [00:00<00:00, 2949.30it/s, Materializing param=model.layers.4.self_attn.k_proj.weight]
Loading weights: 12%|ββββββββ | 45/363 [00:00<00:00, 2977.64it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 12%|ββββββββ | 45/363 [00:00<00:00, 2959.34it/s, Materializing param=model.layers.4.self_attn.o_proj.weight]
Loading weights: 13%|ββββββββ | 46/363 [00:00<00:00, 2997.89it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 13%|ββββββββ | 46/363 [00:00<00:00, 2969.46it/s, Materializing param=model.layers.4.self_attn.q_proj.weight]
Loading weights: 13%|ββββββββ | 47/363 [00:00<00:00, 2979.04it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 13%|ββββββββ | 47/363 [00:00<00:00, 2960.97it/s, Materializing param=model.layers.4.self_attn.v_proj.weight]
Loading weights: 13%|ββββββββ | 48/363 [00:00<00:00, 2979.17it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 13%|ββββββββ | 48/363 [00:00<00:00, 2961.56it/s, Materializing param=model.layers.5.input_layernorm.weight]
Loading weights: 13%|βββββββββ | 49/363 [00:00<00:00, 2996.67it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 13%|βββββββββ | 49/363 [00:00<00:00, 2977.92it/s, Materializing param=model.layers.5.mlp.down_proj.weight]
Loading weights: 14%|βββββββββ | 50/363 [00:00<00:00, 2922.25it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 14%|βββββββββ | 50/363 [00:00<00:00, 2905.49it/s, Materializing param=model.layers.5.mlp.gate_proj.weight]
Loading weights: 14%|βββββββββ | 51/363 [00:00<00:00, 2902.91it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 14%|βββββββββ | 51/363 [00:00<00:00, 2877.87it/s, Materializing param=model.layers.5.mlp.up_proj.weight]
Loading weights: 14%|ββββββββ | 52/363 [00:00<00:00, 2909.14it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 14%|ββββββββ | 52/363 [00:00<00:00, 2893.47it/s, Materializing param=model.layers.5.post_attention_layernorm.weight]
Loading weights: 15%|βββββββββ | 53/363 [00:00<00:00, 2925.98it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 15%|βββββββββ | 53/363 [00:00<00:00, 2910.96it/s, Materializing param=model.layers.5.self_attn.k_proj.weight]
Loading weights: 15%|βββββββββ | 54/363 [00:00<00:00, 2942.99it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 15%|βββββββββ | 54/363 [00:00<00:00, 2926.52it/s, Materializing param=model.layers.5.self_attn.o_proj.weight]
Loading weights: 15%|βββββββββ | 55/363 [00:00<00:00, 2957.45it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 15%|βββββββββ | 55/363 [00:00<00:00, 2942.43it/s, Materializing param=model.layers.5.self_attn.q_proj.weight]
Loading weights: 15%|βββββββββ | 56/363 [00:00<00:00, 2973.63it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 15%|βββββββββ | 56/363 [00:00<00:00, 2958.72it/s, Materializing param=model.layers.5.self_attn.v_proj.weight]
Loading weights: 16%|ββββββββββ | 57/363 [00:00<00:00, 2988.52it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 16%|ββββββββββ | 57/363 [00:00<00:00, 2973.72it/s, Materializing param=model.layers.6.input_layernorm.weight]
Loading weights: 16%|ββββββββββ | 58/363 [00:00<00:00, 3003.03it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 16%|ββββββββββ | 58/363 [00:00<00:00, 2988.35it/s, Materializing param=model.layers.6.mlp.down_proj.weight]
Loading weights: 16%|ββββββββββ | 59/363 [00:00<00:00, 2920.45it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 16%|ββββββββββ | 59/363 [00:00<00:00, 2906.01it/s, Materializing param=model.layers.6.mlp.gate_proj.weight]
Loading weights: 17%|βββββββββββ | 60/363 [00:00<00:00, 2933.15it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 17%|βββββββββββ | 60/363 [00:00<00:00, 2919.10it/s, Materializing param=model.layers.6.mlp.up_proj.weight]
Loading weights: 17%|βββββββββ | 61/363 [00:00<00:00, 2946.32it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 17%|βββββββββ | 61/363 [00:00<00:00, 2930.76it/s, Materializing param=model.layers.6.post_attention_layernorm.weight]
Loading weights: 17%|ββββββββββ | 62/363 [00:00<00:00, 2775.91it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 17%|ββββββββββ | 62/363 [00:00<00:00, 2721.01it/s, Materializing param=model.layers.6.self_attn.k_proj.weight]
Loading weights: 17%|βββββββββββ | 63/363 [00:00<00:00, 2744.28it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 17%|βββββββββββ | 63/363 [00:00<00:00, 2731.40it/s, Materializing param=model.layers.6.self_attn.o_proj.weight]
Loading weights: 18%|βββββββββββ | 64/363 [00:00<00:00, 2756.04it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 18%|βββββββββββ | 64/363 [00:00<00:00, 2699.85it/s, Materializing param=model.layers.6.self_attn.q_proj.weight]
Loading weights: 18%|βββββββββββ | 65/363 [00:00<00:00, 2723.03it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 18%|βββββββββββ | 65/363 [00:00<00:00, 2710.85it/s, Materializing param=model.layers.6.self_attn.v_proj.weight]
Loading weights: 18%|βββββββββββ | 66/363 [00:00<00:00, 2734.50it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 18%|βββββββββββ | 66/363 [00:00<00:00, 2722.61it/s, Materializing param=model.layers.7.input_layernorm.weight]
Loading weights: 18%|ββββββββββββ | 67/363 [00:00<00:00, 2745.98it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 18%|ββββββββββββ | 67/363 [00:00<00:00, 2733.35it/s, Materializing param=model.layers.7.mlp.down_proj.weight]
Loading weights: 19%|ββββββββββββ | 68/363 [00:00<00:00, 2756.18it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 19%|ββββββββββββ | 68/363 [00:00<00:00, 2744.38it/s, Materializing param=model.layers.7.mlp.gate_proj.weight]
Loading weights: 19%|βββββββββββββ | 69/363 [00:00<00:00, 2767.09it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 19%|βββββββββββββ | 69/363 [00:00<00:00, 2755.52it/s, Materializing param=model.layers.7.mlp.up_proj.weight]
Loading weights: 19%|ββββββββββ | 70/363 [00:00<00:00, 2777.90it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 19%|ββββββββββ | 70/363 [00:00<00:00, 2765.96it/s, Materializing param=model.layers.7.post_attention_layernorm.weight]
Loading weights: 20%|ββββββββββββ | 71/363 [00:00<00:00, 2787.33it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 20%|ββββββββββββ | 71/363 [00:00<00:00, 2775.38it/s, Materializing param=model.layers.7.self_attn.k_proj.weight]
Loading weights: 20%|ββββββββββββ | 72/363 [00:00<00:00, 2695.26it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 20%|ββββββββββββ | 72/363 [00:00<00:00, 2679.16it/s, Materializing param=model.layers.7.self_attn.o_proj.weight]
Loading weights: 20%|ββββββββββββ | 73/363 [00:00<00:00, 2698.78it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 20%|ββββββββββββ | 73/363 [00:00<00:00, 2688.14it/s, Materializing param=model.layers.7.self_attn.q_proj.weight]
Loading weights: 20%|ββββββββββββ | 74/363 [00:00<00:00, 2709.10it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 20%|ββββββββββββ | 74/363 [00:00<00:00, 2698.71it/s, Materializing param=model.layers.7.self_attn.v_proj.weight]
Loading weights: 21%|βββββββββββββ | 75/363 [00:00<00:00, 2718.54it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 21%|βββββββββββββ | 75/363 [00:00<00:00, 2708.22it/s, Materializing param=model.layers.8.input_layernorm.weight]
Loading weights: 21%|βββββββββββββ | 76/363 [00:00<00:00, 2729.33it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 21%|βββββββββββββ | 76/363 [00:00<00:00, 2718.77it/s, Materializing param=model.layers.8.mlp.down_proj.weight]
Loading weights: 21%|ββββββββββββββ | 77/363 [00:00<00:00, 2703.67it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 21%|ββββββββββββββ | 77/363 [00:00<00:00, 2692.71it/s, Materializing param=model.layers.8.mlp.gate_proj.weight]
Loading weights: 21%|ββββββββββββββ | 78/363 [00:00<00:00, 2712.67it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 21%|ββββββββββββββ | 78/363 [00:00<00:00, 2702.69it/s, Materializing param=model.layers.8.mlp.up_proj.weight]
Loading weights: 22%|βββββββββββ | 79/363 [00:00<00:00, 2722.41it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 22%|βββββββββββ | 79/363 [00:00<00:00, 2712.18it/s, Materializing param=model.layers.8.post_attention_layernorm.weight]
Loading weights: 22%|βββββββββββββ | 80/363 [00:00<00:00, 2731.62it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 22%|βββββββββββββ | 80/363 [00:00<00:00, 2721.43it/s, Materializing param=model.layers.8.self_attn.k_proj.weight]
Loading weights: 22%|ββββββββββββββ | 81/363 [00:00<00:00, 2701.66it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 22%|ββββββββββββββ | 81/363 [00:00<00:00, 2691.34it/s, Materializing param=model.layers.8.self_attn.o_proj.weight]
Loading weights: 23%|ββββββββββββββ | 82/363 [00:00<00:00, 2688.38it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 23%|ββββββββββββββ | 82/363 [00:00<00:00, 2622.08it/s, Materializing param=model.layers.8.self_attn.q_proj.weight]
Loading weights: 23%|ββββββββββββββ | 83/363 [00:00<00:00, 2637.55it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 23%|ββββββββββββββ | 83/363 [00:00<00:00, 2628.11it/s, Materializing param=model.layers.8.self_attn.v_proj.weight]
Loading weights: 23%|ββββββββββββββ | 84/363 [00:00<00:00, 2646.07it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 23%|ββββββββββββββ | 84/363 [00:00<00:00, 2637.22it/s, Materializing param=model.layers.9.input_layernorm.weight]
Loading weights: 23%|βββββββββββββββ | 85/363 [00:00<00:00, 2655.37it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 23%|βββββββββββββββ | 85/363 [00:00<00:00, 2646.58it/s, Materializing param=model.layers.9.mlp.down_proj.weight]
Loading weights: 24%|βββββββββββββββ | 86/363 [00:00<00:00, 2664.70it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 24%|βββββββββββββββ | 86/363 [00:00<00:00, 2656.09it/s, Materializing param=model.layers.9.mlp.gate_proj.weight]
Loading weights: 24%|ββββββββββββββββ | 87/363 [00:00<00:00, 2655.47it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 24%|ββββββββββββββββ | 87/363 [00:00<00:00, 2646.52it/s, Materializing param=model.layers.9.mlp.up_proj.weight]
Loading weights: 24%|βββββββββββββ | 88/363 [00:00<00:00, 2579.49it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 24%|βββββββββββββ | 88/363 [00:00<00:00, 2570.92it/s, Materializing param=model.layers.9.post_attention_layernorm.weight]
Loading weights: 25%|βββββββββββββββ | 89/363 [00:00<00:00, 2588.09it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 25%|βββββββββββββββ | 89/363 [00:00<00:00, 2579.49it/s, Materializing param=model.layers.9.self_attn.k_proj.weight]
Loading weights: 25%|βββββββββββββββ | 90/363 [00:00<00:00, 2596.31it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 25%|βββββββββββββββ | 90/363 [00:00<00:00, 2588.10it/s, Materializing param=model.layers.9.self_attn.o_proj.weight]
Loading weights: 25%|βββββββββββββββ | 91/363 [00:00<00:00, 2577.05it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 25%|βββββββββββββββ | 91/363 [00:00<00:00, 2568.29it/s, Materializing param=model.layers.9.self_attn.q_proj.weight]
Loading weights: 25%|βββββββββββββββ | 92/363 [00:00<00:00, 2584.34it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 25%|βββββββββββββββ | 92/363 [00:00<00:00, 2571.51it/s, Materializing param=model.layers.9.self_attn.v_proj.weight]
Loading weights: 26%|βββββββββββββββ | 93/363 [00:00<00:00, 2586.90it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 26%|βββββββββββββββ | 93/363 [00:00<00:00, 2578.82it/s, Materializing param=model.layers.10.input_layernorm.weight]
Loading weights: 26%|ββββββββββββββββ | 94/363 [00:00<00:00, 2593.81it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 26%|ββββββββββββββββ | 94/363 [00:00<00:00, 2585.95it/s, Materializing param=model.layers.10.mlp.down_proj.weight]
Loading weights: 26%|ββββββββββββββββ | 95/363 [00:00<00:00, 2545.66it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 26%|ββββββββββββββββ | 95/363 [00:00<00:00, 2536.06it/s, Materializing param=model.layers.10.mlp.gate_proj.weight]
Loading weights: 26%|βββββββββββββββββ | 96/363 [00:00<00:00, 2550.63it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 26%|βββββββββββββββββ | 96/363 [00:00<00:00, 2542.39it/s, Materializing param=model.layers.10.mlp.up_proj.weight]
Loading weights: 27%|ββββββββββββββ | 97/363 [00:00<00:00, 2554.84it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 27%|ββββββββββββββ | 97/363 [00:00<00:00, 2541.67it/s, Materializing param=model.layers.10.post_attention_layernorm.weight]
Loading weights: 27%|ββββββββββββββββ | 98/363 [00:00<00:00, 2551.82it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 27%|ββββββββββββββββ | 98/363 [00:00<00:00, 2542.51it/s, Materializing param=model.layers.10.self_attn.k_proj.weight]
Loading weights: 27%|ββββββββββββββββ | 99/363 [00:00<00:00, 2518.49it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 27%|ββββββββββββββββ | 99/363 [00:00<00:00, 2509.25it/s, Materializing param=model.layers.10.self_attn.o_proj.weight]
Loading weights: 28%|ββββββββββββββββ | 100/363 [00:00<00:00, 2523.04it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 28%|ββββββββββββββββ | 100/363 [00:00<00:00, 2515.63it/s, Materializing param=model.layers.10.self_attn.q_proj.weight]
Loading weights: 28%|ββββββββββββββββ | 101/363 [00:00<00:00, 2529.34it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 28%|ββββββββββββββββ | 101/363 [00:00<00:00, 2522.19it/s, Materializing param=model.layers.10.self_attn.v_proj.weight]
Loading weights: 28%|βββββββββββββββββ | 102/363 [00:00<00:00, 2515.13it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 28%|βββββββββββββββββ | 102/363 [00:00<00:00, 2488.43it/s, Materializing param=model.layers.11.input_layernorm.weight]
Loading weights: 28%|βββββββββββββββββ | 103/363 [00:00<00:00, 2501.84it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 28%|βββββββββββββββββ | 103/363 [00:00<00:00, 2495.10it/s, Materializing param=model.layers.11.mlp.down_proj.weight]
Loading weights: 29%|ββββββββββββββββββ | 104/363 [00:00<00:00, 2507.81it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 29%|ββββββββββββββββββ | 104/363 [00:00<00:00, 2500.87it/s, Materializing param=model.layers.11.mlp.gate_proj.weight]
Loading weights: 29%|ββββββββββββββββββ | 105/363 [00:00<00:00, 2514.04it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 29%|ββββββββββββββββββ | 105/363 [00:00<00:00, 2506.70it/s, Materializing param=model.layers.11.mlp.up_proj.weight]
Loading weights: 29%|βββββββββββββββ | 106/363 [00:00<00:00, 2475.76it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 29%|βββββββββββββββ | 106/363 [00:00<00:00, 2466.79it/s, Materializing param=model.layers.11.post_attention_layernorm.weight]
Loading weights: 29%|βββββββββββββββββ | 107/363 [00:00<00:00, 2478.86it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 29%|βββββββββββββββββ | 107/363 [00:00<00:00, 2471.87it/s, Materializing param=model.layers.11.self_attn.k_proj.weight]
Loading weights: 30%|βββββββββββββββββ | 108/363 [00:00<00:00, 2483.29it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 30%|βββββββββββββββββ | 108/363 [00:00<00:00, 2476.50it/s, Materializing param=model.layers.11.self_attn.o_proj.weight]
Loading weights: 30%|βββββββββββββββββ | 109/363 [00:00<00:00, 2451.65it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 30%|βββββββββββββββββ | 109/363 [00:00<00:00, 2445.24it/s, Materializing param=model.layers.11.self_attn.q_proj.weight]
Loading weights: 30%|ββββββββββββββββββ | 110/363 [00:00<00:00, 2457.46it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 30%|ββββββββββββββββββ | 110/363 [00:00<00:00, 2451.12it/s, Materializing param=model.layers.11.self_attn.v_proj.weight]
Loading weights: 31%|ββββββββββββββββββ | 111/363 [00:00<00:00, 2463.63it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 31%|ββββββββββββββββββ | 111/363 [00:00<00:00, 2457.12it/s, Materializing param=model.layers.12.input_layernorm.weight]
Loading weights: 31%|βββββββββββββββββββ | 112/363 [00:00<00:00, 2432.95it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 31%|βββββββββββββββββββ | 112/363 [00:00<00:00, 2423.23it/s, Materializing param=model.layers.12.mlp.down_proj.weight]
Loading weights: 31%|βββββββββββββββββββ | 113/363 [00:00<00:00, 2435.17it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 31%|βββββββββββββββββββ | 113/363 [00:00<00:00, 2427.83it/s, Materializing param=model.layers.12.mlp.gate_proj.weight]
Loading weights: 31%|ββββββββββββββββββββ | 114/363 [00:00<00:00, 2440.27it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 31%|ββββββββββββββββββββ | 114/363 [00:00<00:00, 2434.06it/s, Materializing param=model.layers.12.mlp.up_proj.weight]
Loading weights: 32%|ββββββββββββββββ | 115/363 [00:00<00:00, 2421.75it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 32%|ββββββββββββββββ | 115/363 [00:00<00:00, 2415.15it/s, Materializing param=model.layers.12.post_attention_layernorm.weight]
Loading weights: 32%|βββββββββββββββββββ | 116/363 [00:00<00:00, 2426.94it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 32%|βββββββββββββββββββ | 116/363 [00:00<00:00, 2421.03it/s, Materializing param=model.layers.12.self_attn.k_proj.weight]
Loading weights: 32%|βββββββββββββββββββ | 117/363 [00:00<00:00, 2431.25it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 32%|βββββββββββββββββββ | 117/363 [00:00<00:00, 2422.81it/s, Materializing param=model.layers.12.self_attn.o_proj.weight]
Loading weights: 33%|βββββββββββββββββββ | 118/363 [00:00<00:00, 2355.76it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 33%|βββββββββββββββββββ | 118/363 [00:00<00:00, 2333.57it/s, Materializing param=model.layers.12.self_attn.q_proj.weight]
Loading weights: 33%|βββββββββββββββββββ | 119/363 [00:00<00:00, 2339.09it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 33%|βββββββββββββββββββ | 119/363 [00:00<00:00, 2333.60it/s, Materializing param=model.layers.12.self_attn.v_proj.weight]
Loading weights: 33%|ββββββββββββββββββββ | 120/363 [00:00<00:00, 2344.31it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 33%|ββββββββββββββββββββ | 120/363 [00:00<00:00, 2330.05it/s, Materializing param=model.layers.13.input_layernorm.weight]
Loading weights: 33%|ββββββββββββββββββββ | 121/363 [00:00<00:00, 2338.87it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 33%|ββββββββββββββββββββ | 121/363 [00:00<00:00, 2313.48it/s, Materializing param=model.layers.13.mlp.down_proj.weight]
Loading weights: 34%|βββββββββββββββββββββ | 122/363 [00:00<00:00, 2314.87it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 34%|βββββββββββββββββββββ | 122/363 [00:00<00:00, 2309.14it/s, Materializing param=model.layers.13.mlp.gate_proj.weight]
Loading weights: 34%|βββββββββββββββββββββ | 123/363 [00:00<00:00, 2319.62it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 34%|βββββββββββββββββββββ | 123/363 [00:00<00:00, 2314.03it/s, Materializing param=model.layers.13.mlp.up_proj.weight]
Loading weights: 34%|βββββββββββββββββ | 124/363 [00:00<00:00, 2324.95it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 34%|βββββββββββββββββ | 124/363 [00:00<00:00, 2318.73it/s, Materializing param=model.layers.13.post_attention_layernorm.weight]
Loading weights: 34%|ββββββββββββββββββββ | 125/363 [00:00<00:00, 2329.12it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 34%|ββββββββββββββββββββ | 125/363 [00:00<00:00, 2323.99it/s, Materializing param=model.layers.13.self_attn.k_proj.weight]
Loading weights: 35%|ββββββββββββββββββββ | 126/363 [00:00<00:00, 2334.45it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 35%|ββββββββββββββββββββ | 126/363 [00:00<00:00, 2329.33it/s, Materializing param=model.layers.13.self_attn.o_proj.weight]
Loading weights: 35%|ββββββββββββββββββββ | 127/363 [00:00<00:00, 2339.53it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 35%|ββββββββββββββββββββ | 127/363 [00:00<00:00, 2334.34it/s, Materializing param=model.layers.13.self_attn.q_proj.weight]
Loading weights: 35%|ββββββββββββββββββββ | 128/363 [00:00<00:00, 2344.17it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 35%|ββββββββββββββββββββ | 128/363 [00:00<00:00, 2338.79it/s, Materializing param=model.layers.13.self_attn.v_proj.weight]
Loading weights: 36%|βββββββββββββββββββββ | 129/363 [00:00<00:00, 2348.81it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 36%|βββββββββββββββββββββ | 129/363 [00:00<00:00, 2343.72it/s, Materializing param=model.layers.14.input_layernorm.weight]
Loading weights: 36%|ββββββββββββββββββββββ | 130/363 [00:00<00:00, 2354.32it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 36%|ββββββββββββββββββββββ | 130/363 [00:00<00:00, 2348.72it/s, Materializing param=model.layers.14.mlp.down_proj.weight]
Loading weights: 36%|ββββββββββββββββββββββ | 131/363 [00:00<00:00, 2358.11it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 36%|ββββββββββββββββββββββ | 131/363 [00:00<00:00, 2352.82it/s, Materializing param=model.layers.14.mlp.gate_proj.weight]
Loading weights: 36%|βββββββββββββββββββββββ | 132/363 [00:00<00:00, 2362.65it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 36%|βββββββββββββββββββββββ | 132/363 [00:00<00:00, 2355.59it/s, Materializing param=model.layers.14.mlp.up_proj.weight]
Loading weights: 37%|ββββββββββββββββββ | 133/363 [00:00<00:00, 2352.63it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 37%|ββββββββββββββββββ | 133/363 [00:00<00:00, 2344.30it/s, Materializing param=model.layers.14.post_attention_layernorm.weight]
Loading weights: 37%|βββββββββββββββββββββ | 134/363 [00:00<00:00, 2339.77it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 37%|βββββββββββββββββββββ | 134/363 [00:00<00:00, 2334.45it/s, Materializing param=model.layers.14.self_attn.k_proj.weight]
Loading weights: 37%|ββββββββββββββββββββββ | 135/363 [00:00<00:00, 2337.36it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 37%|ββββββββββββββββββββββ | 135/363 [00:00<00:00, 2330.02it/s, Materializing param=model.layers.14.self_attn.o_proj.weight]
Loading weights: 37%|ββββββββββββββββββββββ | 136/363 [00:00<00:00, 2338.97it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 37%|ββββββββββββββββββββββ | 136/363 [00:00<00:00, 2332.58it/s, Materializing param=model.layers.14.self_attn.q_proj.weight]
Loading weights: 38%|ββββββββββββββββββββββ | 137/363 [00:00<00:00, 2325.07it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 38%|ββββββββββββββββββββββ | 137/363 [00:00<00:00, 2319.43it/s, Materializing param=model.layers.14.self_attn.v_proj.weight]
Loading weights: 38%|ββββββββββββββββββββββ | 138/363 [00:00<00:00, 2329.11it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 38%|ββββββββββββββββββββββ | 138/363 [00:00<00:00, 2324.36it/s, Materializing param=model.layers.15.input_layernorm.weight]
Loading weights: 38%|βββββββββββββββββββββββ | 139/363 [00:00<00:00, 2326.19it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 38%|βββββββββββββββββββββββ | 139/363 [00:00<00:00, 2321.30it/s, Materializing param=model.layers.15.mlp.down_proj.weight]
Loading weights: 39%|ββββββββββββββββββββββββ | 140/363 [00:00<00:00, 2328.88it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 39%|ββββββββββββββββββββββββ | 140/363 [00:00<00:00, 2317.12it/s, Materializing param=model.layers.15.mlp.gate_proj.weight]
Loading weights: 39%|ββββββββββββββββββββββββ | 141/363 [00:00<00:00, 2307.38it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 39%|ββββββββββββββββββββββββ | 141/363 [00:00<00:00, 2302.64it/s, Materializing param=model.layers.15.mlp.up_proj.weight]
Loading weights: 39%|ββββββββββββββββββββ | 142/363 [00:00<00:00, 2311.88it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 39%|ββββββββββββββββββββ | 142/363 [00:00<00:00, 2306.45it/s, Materializing param=model.layers.15.post_attention_layernorm.weight]
Loading weights: 39%|βββββββββββββββββββββββ | 143/363 [00:00<00:00, 2304.73it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 39%|βββββββββββββββββββββββ | 143/363 [00:00<00:00, 2299.59it/s, Materializing param=model.layers.15.self_attn.k_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββ | 144/363 [00:00<00:00, 2301.91it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββ | 144/363 [00:00<00:00, 2296.72it/s, Materializing param=model.layers.15.self_attn.o_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββ | 145/363 [00:00<00:00, 2304.21it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββ | 145/363 [00:00<00:00, 2295.23it/s, Materializing param=model.layers.15.self_attn.q_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββ | 146/363 [00:00<00:00, 2300.95it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 40%|βββββββββββββββββββββββ | 146/363 [00:00<00:00, 2293.88it/s, Materializing param=model.layers.15.self_attn.v_proj.weight]
Loading weights: 40%|ββββββββββββββββββββββββ | 147/363 [00:00<00:00, 2283.40it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 40%|ββββββββββββββββββββββββ | 147/363 [00:00<00:00, 2269.04it/s, Materializing param=model.layers.16.input_layernorm.weight]
Loading weights: 41%|βββββββββββββββββββββββββ | 148/363 [00:00<00:00, 2277.69it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 41%|βββββββββββββββββββββββββ | 148/363 [00:00<00:00, 2265.02it/s, Materializing param=model.layers.16.mlp.down_proj.weight]
Loading weights: 41%|βββββββββββββββββββββββββ | 149/363 [00:00<00:00, 2270.49it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 41%|βββββββββββββββββββββββββ | 149/363 [00:00<00:00, 2260.58it/s, Materializing param=model.layers.16.mlp.gate_proj.weight]
Loading weights: 41%|ββββββββββββββββββββββββββ | 150/363 [00:00<00:00, 2269.25it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 41%|ββββββββββββββββββββββββββ | 150/363 [00:00<00:00, 2264.13it/s, Materializing param=model.layers.16.mlp.up_proj.weight]
Loading weights: 42%|βββββββββββββββββββββ | 151/363 [00:00<00:00, 2264.13it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 42%|βββββββββββββββββββββ | 151/363 [00:00<00:00, 2259.97it/s, Materializing param=model.layers.16.post_attention_layernorm.weight]
Loading weights: 42%|ββββββββββββββββββββββββ | 152/363 [00:00<00:00, 2268.66it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 42%|ββββββββββββββββββββββββ | 152/363 [00:00<00:00, 2262.84it/s, Materializing param=model.layers.16.self_attn.k_proj.weight]
Loading weights: 42%|ββββββββββββββββββββββββ | 153/363 [00:00<00:00, 2271.57it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 42%|ββββββββββββββββββββββββ | 153/363 [00:00<00:00, 2265.24it/s, Materializing param=model.layers.16.self_attn.o_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββ | 154/363 [00:00<00:00, 2273.39it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 42%|βββββββββββββββββββββββββ | 154/363 [00:00<00:00, 2255.96it/s, Materializing param=model.layers.16.self_attn.q_proj.weight]
Loading weights: 43%|βββββββββββββββββββββββββ | 155/363 [00:00<00:00, 2254.12it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 43%|βββββββββββββββββββββββββ | 155/363 [00:00<00:00, 2246.72it/s, Materializing param=model.layers.16.self_attn.v_proj.weight]
Loading weights: 43%|βββββββββββββββββββββββββ | 156/363 [00:00<00:00, 2244.22it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 43%|βββββββββββββββββββββββββ | 156/363 [00:00<00:00, 2238.32it/s, Materializing param=model.layers.17.input_layernorm.weight]
Loading weights: 43%|ββββββββββββββββββββββββββ | 157/363 [00:00<00:00, 2246.64it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 43%|ββββββββββββββββββββββββββ | 157/363 [00:00<00:00, 2236.27it/s, Materializing param=model.layers.17.mlp.down_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββββββ | 158/363 [00:00<00:00, 2244.55it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββββββ | 158/363 [00:00<00:00, 2233.64it/s, Materializing param=model.layers.17.mlp.gate_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββββββββ | 159/363 [00:00<00:00, 2241.38it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββββββββ | 159/363 [00:00<00:00, 2237.30it/s, Materializing param=model.layers.17.mlp.up_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββ | 160/363 [00:00<00:00, 2245.65it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 44%|ββββββββββββββββββββββ | 160/363 [00:00<00:00, 2233.31it/s, Materializing param=model.layers.17.post_attention_layernorm.weight]
Loading weights: 44%|ββββββββββββββββββββββββββ | 161/363 [00:00<00:00, 2238.82it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 44%|ββββββββββββββββββββββββββ | 161/363 [00:00<00:00, 2231.98it/s, Materializing param=model.layers.17.self_attn.k_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββ | 162/363 [00:00<00:00, 2237.30it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββ | 162/363 [00:00<00:00, 2224.11it/s, Materializing param=model.layers.17.self_attn.o_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββ | 163/363 [00:00<00:00, 2230.71it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββ | 163/363 [00:00<00:00, 2224.93it/s, Materializing param=model.layers.17.self_attn.q_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββ | 164/363 [00:00<00:00, 2230.51it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 45%|ββββββββββββββββββββββββββ | 164/363 [00:00<00:00, 2226.78it/s, Materializing param=model.layers.17.self_attn.v_proj.weight]
Loading weights: 45%|βββββββββββββββββββββββββββ | 165/363 [00:00<00:00, 2235.00it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 45%|βββββββββββββββββββββββββββ | 165/363 [00:00<00:00, 2229.87it/s, Materializing param=model.layers.18.input_layernorm.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββ | 166/363 [00:00<00:00, 2238.00it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββ | 166/363 [00:00<00:00, 2234.43it/s, Materializing param=model.layers.18.mlp.down_proj.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββ | 167/363 [00:00<00:00, 2238.58it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 46%|ββββββββββββββββββββββββββββ | 167/363 [00:00<00:00, 2234.75it/s, Materializing param=model.layers.18.mlp.gate_proj.weight]
Loading weights: 46%|βββββββββββββββββββββββββββββ | 168/363 [00:00<00:00, 2242.74it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 46%|βββββββββββββββββββββββββββββ | 168/363 [00:00<00:00, 2239.14it/s, Materializing param=model.layers.18.mlp.up_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββ | 169/363 [00:00<00:00, 2247.08it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 47%|βββββββββββββββββββββββ | 169/363 [00:00<00:00, 2243.43it/s, Materializing param=model.layers.18.post_attention_layernorm.weight]
Loading weights: 47%|βββββββββββββββββββββββββββ | 170/363 [00:00<00:00, 2250.82it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββββββ | 170/363 [00:00<00:00, 2247.24it/s, Materializing param=model.layers.18.self_attn.k_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββββββ | 171/363 [00:00<00:00, 2255.19it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββββββ | 171/363 [00:00<00:00, 2251.16it/s, Materializing param=model.layers.18.self_attn.o_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββββββ | 172/363 [00:00<00:00, 2216.67it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 47%|βββββββββββββββββββββββββββ | 172/363 [00:00<00:00, 2213.01it/s, Materializing param=model.layers.18.self_attn.q_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββ | 173/363 [00:00<00:00, 2220.91it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββ | 173/363 [00:00<00:00, 2217.66it/s, Materializing param=model.layers.18.self_attn.v_proj.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββ | 174/363 [00:00<00:00, 2225.64it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 48%|ββββββββββββββββββββββββββββ | 174/363 [00:00<00:00, 2222.42it/s, Materializing param=model.layers.19.input_layernorm.weight]
Loading weights: 48%|βββββββββββββββββββββββββββββ | 175/363 [00:00<00:00, 2230.10it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 48%|βββββββββββββββββββββββββββββ | 175/363 [00:00<00:00, 2226.74it/s, Materializing param=model.layers.19.mlp.down_proj.weight]
Loading weights: 48%|βββββββββββββββββββββββββββββ | 176/363 [00:00<00:00, 2234.29it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 48%|βββββββββββββββββββββββββββββ | 176/363 [00:00<00:00, 2231.05it/s, Materializing param=model.layers.19.mlp.gate_proj.weight]
Loading weights: 49%|βββββββββββββββββββββββββββββββ | 177/363 [00:00<00:00, 2238.95it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 49%|βββββββββββββββββββββββββββββββ | 177/363 [00:00<00:00, 2235.78it/s, Materializing param=model.layers.19.mlp.up_proj.weight]
Loading weights: 49%|ββββββββββββββββββββββββ | 178/363 [00:00<00:00, 2243.61it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 49%|ββββββββββββββββββββββββ | 178/363 [00:00<00:00, 2240.02it/s, Materializing param=model.layers.19.post_attention_layernorm.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββ | 179/363 [00:00<00:00, 2247.54it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 49%|ββββββββββββββββββββββββββββ | 179/363 [00:00<00:00, 2244.29it/s, Materializing param=model.layers.19.self_attn.k_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββ | 180/363 [00:00<00:00, 2252.08it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββ | 180/363 [00:00<00:00, 2248.84it/s, Materializing param=model.layers.19.self_attn.o_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββ | 181/363 [00:00<00:00, 2256.54it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββ | 181/363 [00:00<00:00, 2253.32it/s, Materializing param=model.layers.19.self_attn.q_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββ | 182/363 [00:00<00:00, 2246.59it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 50%|βββββββββββββββββββββββββββββ | 182/363 [00:00<00:00, 2243.17it/s, Materializing param=model.layers.19.self_attn.v_proj.weight]
Loading weights: 50%|ββββββββββββββββββββββββββββββ | 183/363 [00:00<00:00, 2250.72it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 50%|ββββββββββββββββββββββββββββββ | 183/363 [00:00<00:00, 2247.18it/s, Materializing param=model.layers.20.input_layernorm.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββ | 184/363 [00:00<00:00, 2254.51it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββ | 184/363 [00:00<00:00, 2251.27it/s, Materializing param=model.layers.20.mlp.down_proj.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββ | 185/363 [00:00<00:00, 2254.89it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 51%|βββββββββββββββββββββββββββββββ | 185/363 [00:00<00:00, 2251.52it/s, Materializing param=model.layers.20.mlp.gate_proj.weight]
Loading weights: 51%|ββββββββββββββββββββββββββββββββ | 186/363 [00:00<00:00, 2246.86it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 51%|ββββββββββββββββββββββββββββββββ | 186/363 [00:00<00:00, 2238.10it/s, Materializing param=model.layers.20.mlp.up_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββ | 187/363 [00:00<00:00, 2245.24it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 52%|ββββββββββββββββββββββββββ | 187/363 [00:00<00:00, 2242.05it/s, Materializing param=model.layers.20.post_attention_layernorm.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββ | 188/363 [00:00<00:00, 2248.91it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββ | 188/363 [00:00<00:00, 2245.72it/s, Materializing param=model.layers.20.self_attn.k_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββ | 189/363 [00:00<00:00, 2252.90it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββ | 189/363 [00:00<00:00, 2249.74it/s, Materializing param=model.layers.20.self_attn.o_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββ | 190/363 [00:00<00:00, 2244.62it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 52%|ββββββββββββββββββββββββββββββ | 190/363 [00:00<00:00, 2241.26it/s, Materializing param=model.layers.20.self_attn.q_proj.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββ | 191/363 [00:00<00:00, 2248.20it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββ | 191/363 [00:00<00:00, 2245.02it/s, Materializing param=model.layers.20.self_attn.v_proj.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββ | 192/363 [00:00<00:00, 2238.95it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 53%|βββββββββββββββββββββββββββββββ | 192/363 [00:00<00:00, 2235.30it/s, Materializing param=model.layers.21.input_layernorm.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββββ | 193/363 [00:00<00:00, 2242.12it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββββ | 193/363 [00:00<00:00, 2238.87it/s, Materializing param=model.layers.21.mlp.down_proj.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββββ | 194/363 [00:00<00:00, 2228.58it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 53%|ββββββββββββββββββββββββββββββββ | 194/363 [00:00<00:00, 2225.27it/s, Materializing param=model.layers.21.mlp.gate_proj.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββ | 195/363 [00:00<00:00, 2228.23it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 54%|ββββββββββββββββββββββββββββββββββ | 195/363 [00:00<00:00, 2225.13it/s, Materializing param=model.layers.21.mlp.up_proj.weight]
Loading weights: 54%|βββββββββββββββββββββββββββ | 196/363 [00:00<00:00, 2220.50it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 54%|βββββββββββββββββββββββββββ | 196/363 [00:00<00:00, 2217.42it/s, Materializing param=model.layers.21.post_attention_layernorm.weight]
Loading weights: 54%|βββββββββββββββββββββββββββββββ | 197/363 [00:00<00:00, 2223.89it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 54%|βββββββββββββββββββββββββββββββ | 197/363 [00:00<00:00, 2220.86it/s, Materializing param=model.layers.21.self_attn.k_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββ | 198/363 [00:00<00:00, 2227.63it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 55%|βββββββββββββββββββββββββββββββ | 198/363 [00:00<00:00, 2224.62it/s, Materializing param=model.layers.21.self_attn.o_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββ | 199/363 [00:00<00:00, 2231.43it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββ | 199/363 [00:00<00:00, 2228.48it/s, Materializing param=model.layers.21.self_attn.q_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββ | 200/363 [00:00<00:00, 2235.06it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββ | 200/363 [00:00<00:00, 2231.96it/s, Materializing param=model.layers.21.self_attn.v_proj.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββ | 201/363 [00:00<00:00, 2218.95it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 55%|ββββββββββββββββββββββββββββββββ | 201/363 [00:00<00:00, 2215.40it/s, Materializing param=model.layers.22.input_layernorm.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββ | 202/363 [00:00<00:00, 2221.83it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββ | 202/363 [00:00<00:00, 2218.94it/s, Materializing param=model.layers.22.mlp.down_proj.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββ | 203/363 [00:00<00:00, 2210.92it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββββββββ | 203/363 [00:00<00:00, 2207.87it/s, Materializing param=model.layers.22.mlp.gate_proj.weight]
Loading weights: 56%|βββββββββββββββββββββββββββββββββββ | 204/363 [00:00<00:00, 2214.38it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 56%|βββββββββββββββββββββββββββββββββββ | 204/363 [00:00<00:00, 2211.53it/s, Materializing param=model.layers.22.mlp.up_proj.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββ | 205/363 [00:00<00:00, 2218.11it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 56%|ββββββββββββββββββββββββββββ | 205/363 [00:00<00:00, 2214.96it/s, Materializing param=model.layers.22.post_attention_layernorm.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββ | 206/363 [00:00<00:00, 2208.39it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββ | 206/363 [00:00<00:00, 2205.48it/s, Materializing param=model.layers.22.self_attn.k_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββ | 207/363 [00:00<00:00, 2211.11it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββ | 207/363 [00:00<00:00, 2208.25it/s, Materializing param=model.layers.22.self_attn.o_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββ | 208/363 [00:00<00:00, 2214.81it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 57%|βββββββββββββββββββββββββββββββββ | 208/363 [00:00<00:00, 2212.05it/s, Materializing param=model.layers.22.self_attn.q_proj.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββ | 209/363 [00:00<00:00, 2218.58it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββ | 209/363 [00:00<00:00, 2215.80it/s, Materializing param=model.layers.22.self_attn.v_proj.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββ | 210/363 [00:00<00:00, 2217.62it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 58%|ββββββββββββββββββββββββββββββββββ | 210/363 [00:00<00:00, 2214.60it/s, Materializing param=model.layers.23.input_layernorm.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββββ | 211/363 [00:00<00:00, 2189.96it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββββ | 211/363 [00:00<00:00, 2186.74it/s, Materializing param=model.layers.23.mlp.down_proj.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββββ | 212/363 [00:00<00:00, 2192.86it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 58%|βββββββββββββββββββββββββββββββββββ | 212/363 [00:00<00:00, 2190.11it/s, Materializing param=model.layers.23.mlp.gate_proj.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββββββββββ | 213/363 [00:00<00:00, 2196.43it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββββββββββ | 213/363 [00:00<00:00, 2193.75it/s, Materializing param=model.layers.23.mlp.up_proj.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββ | 214/363 [00:00<00:00, 2200.03it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 59%|βββββββββββββββββββββββββββββ | 214/363 [00:00<00:00, 2197.27it/s, Materializing param=model.layers.23.post_attention_layernorm.weight]
Loading weights: 59%|ββββββββββββββββββββββββββββββββββ | 215/363 [00:00<00:00, 2202.73it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 59%|ββββββββββββββββββββββββββββββββββ | 215/363 [00:00<00:00, 2200.00it/s, Materializing param=model.layers.23.self_attn.k_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββ | 216/363 [00:00<00:00, 2206.24it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββ | 216/363 [00:00<00:00, 2203.57it/s, Materializing param=model.layers.23.self_attn.o_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββ | 217/363 [00:00<00:00, 2209.76it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 60%|ββββββββββββββββββββββββββββββββββ | 217/363 [00:00<00:00, 2207.05it/s, Materializing param=model.layers.23.self_attn.q_proj.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββ | 218/363 [00:00<00:00, 2198.17it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββ | 218/363 [00:00<00:00, 2195.37it/s, Materializing param=model.layers.23.self_attn.v_proj.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββ | 219/363 [00:00<00:00, 2201.17it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 60%|βββββββββββββββββββββββββββββββββββ | 219/363 [00:00<00:00, 2198.40it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββ | 220/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.input_layernorm.weight]
Loading weights: 61%|βββββββββββββββββββββββββββββββββββββ | 220/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 61%|βββββββββββββββββββββββββββββββββββββ | 220/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.mlp.down_proj.weight]
Loading weights: 61%|βββββββββββββββββββββββββββββββββββββ | 221/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 61%|βββββββββββββββββββββββββββββββββββββ | 221/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.mlp.gate_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββ | 222/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββββββββββ | 222/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.mlp.up_proj.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββ | 223/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 61%|ββββββββββββββββββββββββββββββ | 223/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.post_attention_layernorm.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββ | 224/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββ | 224/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.k_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββ | 225/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββ | 225/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.o_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββ | 226/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 62%|ββββββββββββββββββββββββββββββββββββ | 226/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.q_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββ | 227/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββ | 227/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.24.self_attn.v_proj.weight]
Loading weights: 63%|βββββββββββββββββββββββββββββββββββββ | 228/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 63%|βββββββββββββββββββββββββββββββββββββ | 228/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.input_layernorm.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββ | 229/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββ | 229/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.mlp.down_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββ | 230/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 63%|ββββββββββββββββββββββββββββββββββββββ | 230/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.mlp.gate_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββ | 231/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββββββββββ | 231/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.mlp.up_proj.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββ | 232/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 64%|ββββββββββββββββββββββββββββββββ | 232/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.post_attention_layernorm.weight]
Loading weights: 64%|βββββββββββββββββββββββββββββββββββββ | 233/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββββββββββββββββ | 233/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.k_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββββββββββββββββ | 234/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 64%|βββββββββββββββββββββββββββββββββββββ | 234/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.o_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββββββββββββββ | 235/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββββββββββββββ | 235/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.q_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββββββββββββββ | 236/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 65%|βββββββββββββββββββββββββββββββββββββ | 236/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.25.self_attn.v_proj.weight]
Loading weights: 65%|ββββββββββββββββββββββββββββββββββββββ | 237/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 65%|ββββββββββββββββββββββββββββββββββββββ | 237/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.input_layernorm.weight]
Loading weights: 66%|ββββββββββββββββββββββββββββββββββββββββ | 238/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 66%|ββββββββββββββββββββββββββββββββββββββββ | 238/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.mlp.down_proj.weight]
Loading weights: 66%|ββββββββββββββββββββββββββββββββββββββββ | 239/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 66%|ββββββββββββββββββββββββββββββββββββββββ | 239/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.mlp.gate_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββ | 240/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββββββββββ | 240/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.mlp.up_proj.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββ | 241/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 66%|βββββββββββββββββββββββββββββββββ | 241/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.post_attention_layernorm.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββ | 242/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 67%|ββββββββββββββββββββββββββββββββββββββ | 242/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.k_proj.weight]
Loading weights: 67%|βββββββββββββββββββββββββββββββββββββββ | 243/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 67%|βββββββββββββββββββββββββββββββββββββββ | 243/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.o_proj.weight]
Loading weights: 67%|βββββββββββββββββββββββββββββββββββββββ | 244/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 67%|βββββββββββββββββββββββββββββββββββββββ | 244/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.q_proj.weight]
Loading weights: 67%|βββββββββββββββββββββββββββββββββββββββ | 245/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 67%|βββββββββββββββββββββββββββββββββββββββ | 245/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.26.self_attn.v_proj.weight]
Loading weights: 68%|ββββββββββββββββββββββββββββββββββββββββ | 246/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 68%|ββββββββββββββββββββββββββββββββββββββββ | 246/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.input_layernorm.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββ | 247/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββ | 247/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.mlp.down_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββ | 248/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 68%|βββββββββββββββββββββββββββββββββββββββββ | 248/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.mlp.gate_proj.weight]
Loading weights: 69%|βββββββββββββββββββββββββββββββββββββββββββ | 249/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 69%|βββββββββββββββββββββββββββββββββββββββββββ | 249/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.mlp.up_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββ | 250/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββ | 250/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.post_attention_layernorm.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββ | 251/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββ | 251/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.k_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββ | 252/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 69%|ββββββββββββββββββββββββββββββββββββββββ | 252/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.o_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββ | 253/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββ | 253/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.q_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββ | 254/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 70%|ββββββββββββββββββββββββββββββββββββββββ | 254/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.27.self_attn.v_proj.weight]
Loading weights: 70%|βββββββββββββββββββββββββββββββββββββββββ | 255/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights: 70%|βββββββββββββββββββββββββββββββββββββββββ | 255/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.input_layernorm.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββ | 256/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββ | 256/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.mlp.down_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββ | 257/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββββββββββ | 257/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.mlp.gate_proj.weight]
Loading weights: 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 258/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights: 71%|ββββββββββββββββββββββββββββββββββββββββββββ | 258/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.mlp.up_proj.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββ | 259/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights: 71%|βββββββββββββββββββββββββββββββββββ | 259/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.post_attention_layernorm.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββ | 260/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββ | 260/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.k_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββ | 261/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights: 72%|βββββββββββββββββββββββββββββββββββββββββ | 261/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.o_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββββββββββββββββ | 262/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββββββββββββββββ | 262/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.q_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββββββββββββββββ | 263/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights: 72%|ββββββββββββββββββββββββββββββββββββββββββ | 263/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.28.self_attn.v_proj.weight]
Loading weights: 73%|βββββββββββββββββββββββββββββββββββββββββββ | 264/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights: 73%|βββββββββββββββββββββββββββββββββββββββββββ | 264/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.input_layernorm.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββ | 265/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββ | 265/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.mlp.down_proj.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββ | 266/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights: 73%|ββββββββββββββββββββββββββββββββββββββββββββ | 266/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.mlp.gate_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 267/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights: 74%|ββββββββββββββββββββββββββββββββββββββββββββββ | 267/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.mlp.up_proj.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββ | 268/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββ | 268/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.post_attention_layernorm.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββ | 269/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββ | 269/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.k_proj.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββ | 270/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights: 74%|βββββββββββββββββββββββββββββββββββββββββββ | 270/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.o_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββ | 271/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββ | 271/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.q_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββ | 272/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights: 75%|βββββββββββββββββββββββββββββββββββββββββββ | 272/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.29.self_attn.v_proj.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 273/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββ | 273/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.input_layernorm.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 274/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights: 75%|ββββββββββββββββββββββββββββββββββββββββββββββ | 274/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.mlp.down_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββ | 275/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββ | 275/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.mlp.gate_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 276/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 276/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.mlp.up_proj.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββ | 277/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights: 76%|ββββββββββββββββββββββββββββββββββββββ | 277/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.post_attention_layernorm.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 278/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 278/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.k_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 279/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 279/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.o_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 280/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 280/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.q_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 281/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights: 77%|ββββββββββββββββββββββββββββββββββββββββββββ | 281/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.30.self_attn.v_proj.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββ | 282/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββ | 282/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.input_layernorm.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββ | 283/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββ | 283/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.mlp.down_proj.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββ | 284/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights: 78%|βββββββββββββββββββββββββββββββββββββββββββββββ | 284/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.mlp.gate_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 285/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 285/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.mlp.up_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββ | 286/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββ | 286/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.post_attention_layernorm.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββ | 287/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights: 79%|βββββββββββββββββββββββββββββββββββββββββββββ | 287/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.k_proj.weight]
Loading weights: 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 288/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights: 79%|ββββββββββββββββββββββββββββββββββββββββββββββ | 288/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.o_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββ | 289/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββ | 289/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.q_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββ | 290/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 80%|ββββββββββββββββββββββββββββββββββββββββββββββ | 290/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.31.self_attn.v_proj.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 291/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.input_layernorm.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββ | 291/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.input_layernorm.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 292/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.mlp.down_proj.weight]
Loading weights: 80%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 292/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.mlp.down_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 293/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.mlp.gate_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 293/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.mlp.gate_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 294/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.mlp.up_proj.weight]
Loading weights: 81%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 294/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.mlp.up_proj.weight]
Loading weights: 81%|ββββββββββββββββββββββββββββββββββββββββ | 295/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.post_attention_layernorm.weight]
Loading weights: 81%|ββββββββββββββββββββββββββββββββββββββββ | 295/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.post_attention_layernorm.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 296/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.k_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 296/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.k_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 297/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.o_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 297/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.o_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 298/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.q_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 298/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.q_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 299/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.v_proj.weight]
Loading weights: 82%|βββββββββββββββββββββββββββββββββββββββββββββββ | 299/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.32.self_attn.v_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 300/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.input_layernorm.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 300/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.input_layernorm.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 301/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.mlp.down_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 301/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.mlp.down_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 302/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.mlp.gate_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 302/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.mlp.gate_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 303/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.mlp.up_proj.weight]
Loading weights: 83%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 303/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.mlp.up_proj.weight]
Loading weights: 84%|βββββββββββββββββββββββββββββββββββββββββ | 304/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.post_attention_layernorm.weight]
Loading weights: 84%|βββββββββββββββββββββββββββββββββββββββββ | 304/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.post_attention_layernorm.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 305/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.k_proj.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 305/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.k_proj.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 306/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.o_proj.weight]
Loading weights: 84%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 306/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.o_proj.weight]
Loading weights: 85%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 307/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.q_proj.weight]
Loading weights: 85%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 307/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.q_proj.weight]
Loading weights: 85%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 308/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.v_proj.weight]
Loading weights: 85%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 308/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.33.self_attn.v_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 309/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.input_layernorm.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 309/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.input_layernorm.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 310/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.mlp.down_proj.weight]
Loading weights: 85%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 310/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.mlp.down_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 311/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.mlp.gate_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 311/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.mlp.gate_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 312/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.mlp.up_proj.weight]
Loading weights: 86%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 312/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.mlp.up_proj.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββββββββββββ | 313/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.post_attention_layernorm.weight]
Loading weights: 86%|βββββββββββββββββββββββββββββββββββββββββββ | 313/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.post_attention_layernorm.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 314/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.k_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 314/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.k_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 315/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.o_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 315/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.o_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 316/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.q_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 316/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.q_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 317/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.v_proj.weight]
Loading weights: 87%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 317/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.34.self_attn.v_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 318/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.input_layernorm.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 318/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.input_layernorm.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 319/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.mlp.down_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 319/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.mlp.down_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 320/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.mlp.gate_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 320/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.mlp.gate_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 321/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.mlp.up_proj.weight]
Loading weights: 88%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 321/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.mlp.up_proj.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββββββββββββββ | 322/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.post_attention_layernorm.weight]
Loading weights: 89%|ββββββββββββββββββββββββββββββββββββββββββββ | 322/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.post_attention_layernorm.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 323/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.k_proj.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 323/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.k_proj.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 324/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.o_proj.weight]
Loading weights: 89%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 324/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.o_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 325/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.q_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββ | 325/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.q_proj.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 326/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.v_proj.weight]
Loading weights: 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 326/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.35.self_attn.v_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 327/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.input_layernorm.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 327/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.input_layernorm.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 328/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.mlp.down_proj.weight]
Loading weights: 90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 328/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.mlp.down_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 329/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.mlp.gate_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 329/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.mlp.gate_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 330/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.mlp.up_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 330/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.mlp.up_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββ | 331/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.post_attention_layernorm.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββ | 331/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.post_attention_layernorm.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 332/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.k_proj.weight]
Loading weights: 91%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 332/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.k_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 333/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.o_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 333/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.o_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 334/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.q_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 334/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.q_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 335/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.v_proj.weight]
Loading weights: 92%|βββββββββββββββββββββββββββββββββββββββββββββββββββββ | 335/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.36.self_attn.v_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 336/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.input_layernorm.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 336/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.input_layernorm.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 337/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.mlp.down_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 337/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.mlp.down_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 338/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.mlp.gate_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 338/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.mlp.gate_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 339/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.mlp.up_proj.weight]
Loading weights: 93%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 339/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.mlp.up_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββ | 340/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.post_attention_layernorm.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββ | 340/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.post_attention_layernorm.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 341/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.k_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 341/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.k_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 342/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.o_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 342/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.o_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 343/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.q_proj.weight]
Loading weights: 94%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 343/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.q_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 344/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.v_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 344/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.37.self_attn.v_proj.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 345/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.input_layernorm.weight]
Loading weights: 95%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 345/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.input_layernorm.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 346/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.mlp.down_proj.weight]
Loading weights: 95%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 346/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.mlp.down_proj.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 347/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.mlp.gate_proj.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 347/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.mlp.gate_proj.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 348/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.mlp.up_proj.weight]
Loading weights: 96%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 348/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.mlp.up_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββ | 349/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.post_attention_layernorm.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββ | 349/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.post_attention_layernorm.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 350/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.k_proj.weight]
Loading weights: 96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 350/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.k_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 351/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.o_proj.weight]
Loading weights: 97%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 351/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.o_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 352/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.q_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 352/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.q_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 353/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.v_proj.weight]
Loading weights: 97%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 353/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.38.self_attn.v_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 354/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.input_layernorm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 354/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.input_layernorm.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 355/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.mlp.down_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 355/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.mlp.down_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 356/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.mlp.gate_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 356/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.mlp.gate_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 357/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.mlp.up_proj.weight]
Loading weights: 98%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 357/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.mlp.up_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 358/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.post_attention_layernorm.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 358/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.post_attention_layernorm.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 359/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.k_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 359/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.k_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 360/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.o_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 360/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.o_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 361/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.q_proj.weight]
Loading weights: 99%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 361/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.q_proj.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 362/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.v_proj.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 362/363 [00:00<00:00, 2182.81it/s, Materializing param=model.layers.39.self_attn.v_proj.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:00<00:00, 2182.81it/s, Materializing param=model.norm.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:00<00:00, 2182.81it/s, Materializing param=model.norm.weight]
Loading weights: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 363/363 [00:00<00:00, 2040.45it/s, Materializing param=model.norm.weight]
[2026-02-06 15:51:29,234] [WARNING] [accelerate.utils.dataclasses.__post_init__:1962] [PID:7207] sync_module_states is obsolete in FSDP2, as it is not needed anymore.Setting sync_module_states to None.Multiple deprecation warnings due to FSDP2 conversion:
sharding_strategy is deprecated in favor of reshard_after_forward. This will be removed in a future version of Accelerate.
[2026-02-06 15:51:34,875] [WARNING] [py.warnings._showwarnmsg:110] [PID:7207] /root/axolotl/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4807: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
warnings.warn( # warn only once
[0m |