| python -m mlc_chat gen_config /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2 --quantization q4f16_1 --conv-template phi-2 --output /tmp/tmpiszj5ycn |
| [2023-12-28 06:50:40] INFO auto_config.py:115: [92mFound[0m model configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/config.json |
| [2023-12-28 06:50:40] INFO auto_config.py:151: [92mFound[0m model type: [1mphi-msft[0m. Use `--model-type` to override. |
| [2023-12-28 06:50:40] INFO phi_model.py:59: [1mcontext_window_size[0m not found in config.json. Falling back to n_positions (2048) |
| [2023-12-28 06:50:40] INFO gen_config.py:129: [91mNot found[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/tokenizer.model |
| [2023-12-28 06:50:40] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/tokenizer.json. Copying to [1m/tmp/tmpiszj5ycn/tokenizer.json[0m |
| [2023-12-28 06:50:40] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/vocab.json. Copying to [1m/tmp/tmpiszj5ycn/vocab.json[0m |
| [2023-12-28 06:50:40] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/merges.txt. Copying to [1m/tmp/tmpiszj5ycn/merges.txt[0m |
| [2023-12-28 06:50:40] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/added_tokens.json. Copying to [1m/tmp/tmpiszj5ycn/added_tokens.json[0m |
| [2023-12-28 06:50:40] INFO gen_config.py:127: [92mFound[0m tokenizer config: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/tokenizer_config.json. Copying to [1m/tmp/tmpiszj5ycn/tokenizer_config.json[0m |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mpad_token_id[0m: 0 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mbos_token_id[0m: 1 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1meos_token_id[0m: 2 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mtemperature[0m: 0.7 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mrepetition_penalty[0m: 1.0 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mtop_p[0m: 0.95 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mmean_gen_len[0m: 128 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mmax_gen_len[0m: 512 |
| [2023-12-28 06:50:40] INFO gen_config.py:69: [System default] Setting [1mshift_fill_factor[0m: 0.3 |
| [2023-12-28 06:50:40] INFO gen_config.py:157: Dumping configuration file to: [1m/tmp/tmpiszj5ycn/mlc-chat-config.json[0m |
| python -m mlc_chat convert_weight /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2 --quantization q4f16_1 --source-format auto --output /tmp/tmpiszj5ycn |
| [2023-12-28 06:50:40] INFO auto_config.py:115: [92mFound[0m model configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/config.json |
| [2023-12-28 06:50:41] INFO auto_device.py:75: [92mFound[0m device: cuda:0 |
| [2023-12-28 06:50:41] INFO auto_device.py:75: [92mFound[0m device: cuda:1 |
| [2023-12-28 06:50:41] INFO auto_device.py:84: [91mNot found[0m device: rocm:0 |
| [2023-12-28 06:50:41] INFO auto_device.py:84: [91mNot found[0m device: metal:0 |
| [2023-12-28 06:50:41] INFO auto_device.py:75: [92mFound[0m device: vulkan:0 |
| [2023-12-28 06:50:41] INFO auto_device.py:75: [92mFound[0m device: vulkan:1 |
| [2023-12-28 06:50:41] INFO auto_device.py:75: [92mFound[0m device: vulkan:2 |
| [2023-12-28 06:50:42] INFO auto_device.py:84: [91mNot found[0m device: opencl:0 |
| [2023-12-28 06:50:42] INFO auto_device.py:33: Using device: [1mcuda:0[0m |
| [2023-12-28 06:50:42] INFO auto_weight.py:70: Finding weights in: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2 |
| [2023-12-28 06:50:42] INFO auto_weight.py:136: [91mNot found[0m Huggingface PyTorch |
| [2023-12-28 06:50:42] INFO auto_weight.py:143: [92mFound[0m source weight format: huggingface-safetensor. Source configuration: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/model.safetensors.index.json |
| [2023-12-28 06:50:42] INFO auto_weight.py:106: Using source weight configuration: [1m/ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/model.safetensors.index.json[0m. Use `--source` to override. |
| [2023-12-28 06:50:42] INFO auto_weight.py:110: Using source weight format: [1mhuggingface-safetensor[0m. Use `--source-format` to override. |
| [2023-12-28 06:50:42] INFO auto_config.py:151: [92mFound[0m model type: [1mphi-msft[0m. Use `--model-type` to override. |
| [2023-12-28 06:50:42] INFO phi_model.py:59: [1mcontext_window_size[0m not found in config.json. Falling back to n_positions (2048) |
| [1mWeight conversion with arguments:[0m |
| [1m--config[0m /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/config.json |
| [1m--quantization[0m GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7) |
| [1m--model-type[0m phi-msft |
| [1m--device[0m cuda:0 |
| [1m--source[0m /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/model.safetensors.index.json |
| [1m--source-format[0m huggingface-safetensor |
| [1m--output[0m /tmp/tmpiszj5ycn |
|
0%| | 0/325 [00:00<?, ?it/s]
[2023-12-28 06:50:45] INFO huggingface_loader.py:169: Loading HF parameters from: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/model-00002-of-00002.safetensors |
|
0%| | 0/325 [00:00<?, ?it/s]
[2023-12-28 06:50:46] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mlm_head.linear.bias[0m", shape: (51200,), dtype: float16 |
|
0%| | 0/325 [00:00<?, ?it/s]
0%|β | 1/325 [00:00<01:29, 3.61it/s]
[2023-12-28 06:50:46] INFO group_quantization.py:200: Compiling quantize function for key: (51200, 2560, 'float16', 'cuda') |
|
0%|β | 1/325 [00:00<01:29, 3.61it/s]
[2023-12-28 06:50:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.linear.q_weight[0m", shape: (51200, 320), dtype: uint32 |
|
0%|β | 1/325 [00:01<01:29, 3.61it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.linear.q_scale[0m", shape: (51200, 80), dtype: float16 |
|
0%|β | 1/325 [00:01<01:29, 3.61it/s]
1%|ββ | 2/325 [00:01<03:31, 1.53it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mlm_head.ln.bias[0m", shape: (2560,), dtype: float16 |
|
1%|ββ | 2/325 [00:01<03:31, 1.53it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mlm_head.ln.weight[0m", shape: (2560,), dtype: float16 |
|
1%|ββ | 2/325 [00:01<03:31, 1.53it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.30.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
1%|ββ | 2/325 [00:01<03:31, 1.53it/s]
[2023-12-28 06:50:47] INFO group_quantization.py:200: Compiling quantize function for key: (7680, 2560, 'float16', 'cuda') |
|
1%|ββ | 2/325 [00:01<03:31, 1.53it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
1%|ββ | 2/325 [00:01<03:31, 1.53it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
1%|ββ | 2/325 [00:01<03:31, 1.53it/s]
2%|ββββ | 6/325 [00:01<01:11, 4.46it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.30.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
2%|ββββ | 6/325 [00:01<01:11, 4.46it/s]
[2023-12-28 06:50:47] INFO group_quantization.py:200: Compiling quantize function for key: (2560, 2560, 'float16', 'cuda') |
|
2%|ββββ | 6/325 [00:01<01:11, 4.46it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
2%|ββββ | 6/325 [00:01<01:11, 4.46it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
2%|ββββ | 6/325 [00:01<01:11, 4.46it/s]
2%|ββββββ | 8/325 [00:01<01:07, 4.72it/s]
[2023-12-28 06:50:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.30.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
2%|ββββββ | 8/325 [00:01<01:07, 4.72it/s]
[2023-12-28 06:50:47] INFO group_quantization.py:200: Compiling quantize function for key: (10240, 2560, 'float16', 'cuda') |
|
2%|ββββββ | 8/325 [00:01<01:07, 4.72it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
2%|ββββββ | 8/325 [00:02<01:07, 4.72it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
2%|ββββββ | 8/325 [00:02<01:07, 4.72it/s]
3%|βββββββ | 10/325 [00:02<01:04, 4.85it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.30.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
3%|βββββββ | 10/325 [00:02<01:04, 4.85it/s]
[2023-12-28 06:50:48] INFO group_quantization.py:200: Compiling quantize function for key: (2560, 10240, 'float16', 'cuda') |
|
3%|βββββββ | 10/325 [00:02<01:04, 4.85it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
3%|βββββββ | 10/325 [00:02<01:04, 4.85it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.30.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
3%|βββββββ | 10/325 [00:02<01:04, 4.85it/s]
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.31.ln.bias[0m", shape: (2560,), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.31.ln.weight[0m", shape: (2560,), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.31.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.31.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.31.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.31.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.31.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:179: Unloading HF weight file: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/model-00002-of-00002.safetensors |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:48] INFO huggingface_loader.py:169: Loading HF parameters from: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/model-00001-of-00002.safetensors |
|
4%|ββββββββ | 12/325 [00:02<01:03, 4.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.embd.q_weight[0m", shape: (51200, 320), dtype: uint32 |
|
4%|ββββββββ | 12/325 [00:04<01:03, 4.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.embd.q_scale[0m", shape: (51200, 80), dtype: float16 |
|
4%|ββββββββ | 12/325 [00:04<01:03, 4.94it/s]
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln.bias[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.ln.weight[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.0.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.0.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln.bias[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.ln.weight[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.1.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.1.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln.bias[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.ln.weight[0m", shape: (2560,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
7%|ββββββββββββββββ | 23/325 [00:04<00:54, 5.55it/s]
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.10.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.10.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln.bias[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.ln.weight[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.11.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.11.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln.bias[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.ln.weight[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
14%|ββββββββββββββββββββββββββββββββ | 47/325 [00:04<00:16, 17.00it/s]
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.12.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.12.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln.bias[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.ln.weight[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.13.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.13.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln.bias[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.ln.weight[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
21%|ββββββββββββββββββββββββββββββββββββββββββββββ | 68/325 [00:04<00:08, 29.61it/s]
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.14.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.14.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln.bias[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.ln.weight[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:04<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.15.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.15.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln.bias[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.ln.weight[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
27%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 89/325 [00:05<00:05, 44.94it/s]
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.16.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.16.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln.bias[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.ln.weight[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.17.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.17.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln.bias[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.ln.weight[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.18.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.18.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
34%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 111/325 [00:05<00:03, 63.17it/s]
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln.bias[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.ln.weight[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.19.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.19.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln.bias[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.ln.weight[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.2.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.2.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln.bias[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.ln.weight[0m", shape: (2560,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
41%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 133/325 [00:05<00:02, 82.48it/s]
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.20.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.20.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln.bias[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.ln.weight[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.21.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.21.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln.bias[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.ln.weight[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
48%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 157/325 [00:05<00:01, 105.42it/s]
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.22.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.22.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln.bias[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.ln.weight[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.23.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.23.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.24.ln.bias[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.24.ln.weight[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.24.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.24.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.24.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
55%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 179/325 [00:05<00:01, 125.78it/s]
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.24.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.24.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.25.ln.bias[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.25.ln.weight[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.25.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.25.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.25.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.25.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.25.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.26.ln.bias[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.26.ln.weight[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.26.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.26.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.26.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.26.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.26.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
62%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 201/325 [00:05<00:00, 141.79it/s]
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.27.ln.bias[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.27.ln.weight[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.27.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.27.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.27.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.27.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.27.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.28.ln.bias[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.28.ln.weight[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.28.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.28.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.28.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.28.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.28.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.29.ln.bias[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.29.ln.weight[0m", shape: (2560,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.29.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
69%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 223/325 [00:05<00:00, 154.51it/s]
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.29.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.29.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.29.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.29.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.ln.weight[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.3.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.3.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.30.ln.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.30.ln.weight[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.ln.weight[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
76%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 247/325 [00:05<00:00, 170.26it/s]
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.4.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.4.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln.bias[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.ln.weight[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.5.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.5.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln.bias[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.ln.weight[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
83%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 270/325 [00:05<00:00, 184.95it/s]
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.6.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.6.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln.bias[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.ln.weight[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.7.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.7.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln.bias[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.ln.weight[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:05<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
90%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 291/325 [00:06<00:00, 189.37it/s]
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.8.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.8.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln.bias[0m", shape: (2560,), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.ln.weight[0m", shape: (2560,), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mixer.Wqkv.bias[0m", shape: (7680,), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mixer.Wqkv.q_weight[0m", shape: (7680, 320), dtype: uint32 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mixer.Wqkv.q_scale[0m", shape: (7680, 80), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mixer.out_proj.bias[0m", shape: (2560,), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mixer.out_proj.q_weight[0m", shape: (2560, 320), dtype: uint32 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mixer.out_proj.q_scale[0m", shape: (2560, 80), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.fc1.bias[0m", shape: (10240,), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.fc1.q_weight[0m", shape: (10240, 320), dtype: uint32 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.fc1.q_scale[0m", shape: (10240, 80), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mtransformer.h.9.mlp.fc2.bias[0m", shape: (2560,), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.fc2.q_weight[0m", shape: (2560, 1280), dtype: uint32 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
[2023-12-28 06:50:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mtransformer.h.9.mlp.fc2.q_scale[0m", shape: (2560, 320), dtype: float16 |
|
96%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 313/325 [00:06<00:00, 191.43it/s]
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 325/325 [00:06<00:00, 53.34it/s] |
| [2023-12-28 06:50:51] INFO huggingface_loader.py:179: Unloading HF weight file: /ssd1/cfruan/mlc-llm-repos/mlc-llm-head/dist/models/phi-2/model-00001-of-00002.safetensors |
| [2023-12-28 06:50:52] INFO stats.py:71: [92mTime usage[0m: HF loading: 2.170 sec; Pre-quantization mapping: 0.733 sec; Quantization: 2.350 sec |
| [2023-12-28 06:50:52] INFO stats.py:85: [92mRAM usage[0m: Peak RAM: 4.640 GB. Total bytes loaded from disk: 5.178 GB |
| [2023-12-28 06:50:52] INFO convert_weight.py:110: [92mParameter size[0m after quantization: 1.457 GB |
| [2023-12-28 06:50:52] INFO convert_weight.py:115: [92mTotal parameters[0m: 2,779,683,840 |
| [2023-12-28 06:50:52] INFO convert_weight.py:116: [92mBits per parameter[0m: 4.504 |
| Start storing to cache /tmp/tmpiszj5ycn |
|
[0001/0455] saving lm_head.linear.bias
[0002/0455] saving lm_head.linear.q_weight
[0003/0455] saving lm_head.linear.q_scale
[0004/0455] saving lm_head.ln.bias
[0005/0455] saving lm_head.ln.weight
[0006/0455] saving transformer.h.30.mixer.Wqkv.bias
[0007/0455] saving transformer.h.30.mixer.Wqkv.q_weight
[0008/0455] saving transformer.h.30.mixer.Wqkv.q_scale
[0009/0455] saving transformer.h.30.mixer.out_proj.bias
[0010/0455] saving transformer.h.30.mixer.out_proj.q_weight
[0011/0455] saving transformer.h.30.mixer.out_proj.q_scale
[0012/0455] saving transformer.h.30.mlp.fc1.bias
[0013/0455] saving transformer.h.30.mlp.fc1.q_weight
[0014/0455] saving transformer.h.30.mlp.fc1.q_scale
[0015/0455] saving transformer.h.30.mlp.fc2.bias
[0016/0455] saving transformer.h.30.mlp.fc2.q_weight
[0017/0455] saving transformer.h.30.mlp.fc2.q_scale
[0018/0455] saving transformer.h.31.ln.bias
[0019/0455] saving transformer.h.31.ln.weight
[0020/0455] saving transformer.h.31.mixer.Wqkv.bias
[0021/0455] saving transformer.h.31.mixer.Wqkv.q_weight
[0022/0455] saving transformer.h.31.mixer.Wqkv.q_scale
[0023/0455] saving transformer.h.31.mixer.out_proj.bias
[0024/0455] saving transformer.h.31.mixer.out_proj.q_weight
[0025/0455] saving transformer.h.31.mixer.out_proj.q_scale
[0026/0455] saving transformer.h.31.mlp.fc1.bias
[0027/0455] saving transformer.h.31.mlp.fc1.q_weight
[0028/0455] saving transformer.h.31.mlp.fc1.q_scale
[0029/0455] saving transformer.h.31.mlp.fc2.bias
[0030/0455] saving transformer.h.31.mlp.fc2.q_weight
[0031/0455] saving transformer.h.31.mlp.fc2.q_scale
[0032/0455] saving transformer.embd.q_weight
[0033/0455] saving transformer.embd.q_scale
[0034/0455] saving transformer.h.0.ln.bias
[0035/0455] saving transformer.h.0.ln.weight
[0036/0455] saving transformer.h.0.mixer.Wqkv.bias
[0037/0455] saving transformer.h.0.mixer.Wqkv.q_weight
[0038/0455] saving transformer.h.0.mixer.Wqkv.q_scale
[0039/0455] saving transformer.h.0.mixer.out_proj.bias
[0040/0455] saving transformer.h.0.mixer.out_proj.q_weight
[0041/0455] saving transformer.h.0.mixer.out_proj.q_scale
[0042/0455] saving transformer.h.0.mlp.fc1.bias
[0043/0455] saving transformer.h.0.mlp.fc1.q_weight
[0044/0455] saving transformer.h.0.mlp.fc1.q_scale
[0045/0455] saving transformer.h.0.mlp.fc2.bias
[0046/0455] saving transformer.h.0.mlp.fc2.q_weight
[0047/0455] saving transformer.h.0.mlp.fc2.q_scale
[0048/0455] saving transformer.h.1.ln.bias
[0049/0455] saving transformer.h.1.ln.weight
[0050/0455] saving transformer.h.1.mixer.Wqkv.bias
[0051/0455] saving transformer.h.1.mixer.Wqkv.q_weight
[0052/0455] saving transformer.h.1.mixer.Wqkv.q_scale
[0053/0455] saving transformer.h.1.mixer.out_proj.bias
[0054/0455] saving transformer.h.1.mixer.out_proj.q_weight
[0055/0455] saving transformer.h.1.mixer.out_proj.q_scale
[0056/0455] saving transformer.h.1.mlp.fc1.bias
[0057/0455] saving transformer.h.1.mlp.fc1.q_weight
[0058/0455] saving transformer.h.1.mlp.fc1.q_scale
[0059/0455] saving transformer.h.1.mlp.fc2.bias
[0060/0455] saving transformer.h.1.mlp.fc2.q_weight
[0061/0455] saving transformer.h.1.mlp.fc2.q_scale
[0062/0455] saving transformer.h.10.ln.bias
[0063/0455] saving transformer.h.10.ln.weight
[0064/0455] saving transformer.h.10.mixer.Wqkv.bias
[0065/0455] saving transformer.h.10.mixer.Wqkv.q_weight
[0066/0455] saving transformer.h.10.mixer.Wqkv.q_scale
[0067/0455] saving transformer.h.10.mixer.out_proj.bias
[0068/0455] saving transformer.h.10.mixer.out_proj.q_weight
[0069/0455] saving transformer.h.10.mixer.out_proj.q_scale
[0070/0455] saving transformer.h.10.mlp.fc1.bias
[0071/0455] saving transformer.h.10.mlp.fc1.q_weight
[0072/0455] saving transformer.h.10.mlp.fc1.q_scale
[0073/0455] saving transformer.h.10.mlp.fc2.bias
[0074/0455] saving transformer.h.10.mlp.fc2.q_weight
[0075/0455] saving transformer.h.10.mlp.fc2.q_scale
[0076/0455] saving transformer.h.11.ln.bias
[0077/0455] saving transformer.h.11.ln.weight
[0078/0455] saving transformer.h.11.mixer.Wqkv.bias
[0079/0455] saving transformer.h.11.mixer.Wqkv.q_weight
[0080/0455] saving transformer.h.11.mixer.Wqkv.q_scale
[0081/0455] saving transformer.h.11.mixer.out_proj.bias
[0082/0455] saving transformer.h.11.mixer.out_proj.q_weight
[0083/0455] saving transformer.h.11.mixer.out_proj.q_scale
[0084/0455] saving transformer.h.11.mlp.fc1.bias
[0085/0455] saving transformer.h.11.mlp.fc1.q_weight
[0086/0455] saving transformer.h.11.mlp.fc1.q_scale
[0087/0455] saving transformer.h.11.mlp.fc2.bias
[0088/0455] saving transformer.h.11.mlp.fc2.q_weight
[0089/0455] saving transformer.h.11.mlp.fc2.q_scale
[0090/0455] saving transformer.h.12.ln.bias
[0091/0455] saving transformer.h.12.ln.weight
[0092/0455] saving transformer.h.12.mixer.Wqkv.bias
[0093/0455] saving transformer.h.12.mixer.Wqkv.q_weight
[0094/0455] saving transformer.h.12.mixer.Wqkv.q_scale
[0095/0455] saving transformer.h.12.mixer.out_proj.bias
[0096/0455] saving transformer.h.12.mixer.out_proj.q_weight
[0097/0455] saving transformer.h.12.mixer.out_proj.q_scale
[0098/0455] saving transformer.h.12.mlp.fc1.bias
[0099/0455] saving transformer.h.12.mlp.fc1.q_weight
[0100/0455] saving transformer.h.12.mlp.fc1.q_scale
[0101/0455] saving transformer.h.12.mlp.fc2.bias
[0102/0455] saving transformer.h.12.mlp.fc2.q_weight
[0103/0455] saving transformer.h.12.mlp.fc2.q_scale
[0104/0455] saving transformer.h.13.ln.bias
[0105/0455] saving transformer.h.13.ln.weight
[0106/0455] saving transformer.h.13.mixer.Wqkv.bias
[0107/0455] saving transformer.h.13.mixer.Wqkv.q_weight
[0108/0455] saving transformer.h.13.mixer.Wqkv.q_scale
[0109/0455] saving transformer.h.13.mixer.out_proj.bias
[0110/0455] saving transformer.h.13.mixer.out_proj.q_weight
[0111/0455] saving transformer.h.13.mixer.out_proj.q_scale
[0112/0455] saving transformer.h.13.mlp.fc1.bias
[0113/0455] saving transformer.h.13.mlp.fc1.q_weight
[0114/0455] saving transformer.h.13.mlp.fc1.q_scale
[0115/0455] saving transformer.h.13.mlp.fc2.bias
[0116/0455] saving transformer.h.13.mlp.fc2.q_weight
[0117/0455] saving transformer.h.13.mlp.fc2.q_scale
[0118/0455] saving transformer.h.14.ln.bias
[0119/0455] saving transformer.h.14.ln.weight
[0120/0455] saving transformer.h.14.mixer.Wqkv.bias
[0121/0455] saving transformer.h.14.mixer.Wqkv.q_weight
[0122/0455] saving transformer.h.14.mixer.Wqkv.q_scale
[0123/0455] saving transformer.h.14.mixer.out_proj.bias
[0124/0455] saving transformer.h.14.mixer.out_proj.q_weight
[0125/0455] saving transformer.h.14.mixer.out_proj.q_scale
[0126/0455] saving transformer.h.14.mlp.fc1.bias
[0127/0455] saving transformer.h.14.mlp.fc1.q_weight
[0128/0455] saving transformer.h.14.mlp.fc1.q_scale
[0129/0455] saving transformer.h.14.mlp.fc2.bias
[0130/0455] saving transformer.h.14.mlp.fc2.q_weight
[0131/0455] saving transformer.h.14.mlp.fc2.q_scale
[0132/0455] saving transformer.h.15.ln.bias
[0133/0455] saving transformer.h.15.ln.weight
[0134/0455] saving transformer.h.15.mixer.Wqkv.bias
[0135/0455] saving transformer.h.15.mixer.Wqkv.q_weight
[0136/0455] saving transformer.h.15.mixer.Wqkv.q_scale
[0137/0455] saving transformer.h.15.mixer.out_proj.bias
[0138/0455] saving transformer.h.15.mixer.out_proj.q_weight
[0139/0455] saving transformer.h.15.mixer.out_proj.q_scale
[0140/0455] saving transformer.h.15.mlp.fc1.bias
[0141/0455] saving transformer.h.15.mlp.fc1.q_weight
[0142/0455] saving transformer.h.15.mlp.fc1.q_scale
[0143/0455] saving transformer.h.15.mlp.fc2.bias
[0144/0455] saving transformer.h.15.mlp.fc2.q_weight
[0145/0455] saving transformer.h.15.mlp.fc2.q_scale
[0146/0455] saving transformer.h.16.ln.bias
[0147/0455] saving transformer.h.16.ln.weight
[0148/0455] saving transformer.h.16.mixer.Wqkv.bias
[0149/0455] saving transformer.h.16.mixer.Wqkv.q_weight
[0150/0455] saving transformer.h.16.mixer.Wqkv.q_scale
[0151/0455] saving transformer.h.16.mixer.out_proj.bias
[0152/0455] saving transformer.h.16.mixer.out_proj.q_weight
[0153/0455] saving transformer.h.16.mixer.out_proj.q_scale
[0154/0455] saving transformer.h.16.mlp.fc1.bias
[0155/0455] saving transformer.h.16.mlp.fc1.q_weight
[0156/0455] saving transformer.h.16.mlp.fc1.q_scale
[0157/0455] saving transformer.h.16.mlp.fc2.bias
[0158/0455] saving transformer.h.16.mlp.fc2.q_weight
[0159/0455] saving transformer.h.16.mlp.fc2.q_scale
[0160/0455] saving transformer.h.17.ln.bias
[0161/0455] saving transformer.h.17.ln.weight
[0162/0455] saving transformer.h.17.mixer.Wqkv.bias
[0163/0455] saving transformer.h.17.mixer.Wqkv.q_weight
[0164/0455] saving transformer.h.17.mixer.Wqkv.q_scale
[0165/0455] saving transformer.h.17.mixer.out_proj.bias
[0166/0455] saving transformer.h.17.mixer.out_proj.q_weight
[0167/0455] saving transformer.h.17.mixer.out_proj.q_scale
[0168/0455] saving transformer.h.17.mlp.fc1.bias
[0169/0455] saving transformer.h.17.mlp.fc1.q_weight
[0170/0455] saving transformer.h.17.mlp.fc1.q_scale
[0171/0455] saving transformer.h.17.mlp.fc2.bias
[0172/0455] saving transformer.h.17.mlp.fc2.q_weight
[0173/0455] saving transformer.h.17.mlp.fc2.q_scale
[0174/0455] saving transformer.h.18.ln.bias
[0175/0455] saving transformer.h.18.ln.weight
[0176/0455] saving transformer.h.18.mixer.Wqkv.bias
[0177/0455] saving transformer.h.18.mixer.Wqkv.q_weight
[0178/0455] saving transformer.h.18.mixer.Wqkv.q_scale
[0179/0455] saving transformer.h.18.mixer.out_proj.bias
[0180/0455] saving transformer.h.18.mixer.out_proj.q_weight
[0181/0455] saving transformer.h.18.mixer.out_proj.q_scale
[0182/0455] saving transformer.h.18.mlp.fc1.bias
[0183/0455] saving transformer.h.18.mlp.fc1.q_weight
[0184/0455] saving transformer.h.18.mlp.fc1.q_scale
[0185/0455] saving transformer.h.18.mlp.fc2.bias
[0186/0455] saving transformer.h.18.mlp.fc2.q_weight
[0187/0455] saving transformer.h.18.mlp.fc2.q_scale
[0188/0455] saving transformer.h.19.ln.bias
[0189/0455] saving transformer.h.19.ln.weight
[0190/0455] saving transformer.h.19.mixer.Wqkv.bias
[0191/0455] saving transformer.h.19.mixer.Wqkv.q_weight
[0192/0455] saving transformer.h.19.mixer.Wqkv.q_scale
[0193/0455] saving transformer.h.19.mixer.out_proj.bias
[0194/0455] saving transformer.h.19.mixer.out_proj.q_weight
[0195/0455] saving transformer.h.19.mixer.out_proj.q_scale
[0196/0455] saving transformer.h.19.mlp.fc1.bias
[0197/0455] saving transformer.h.19.mlp.fc1.q_weight
[0198/0455] saving transformer.h.19.mlp.fc1.q_scale
[0199/0455] saving transformer.h.19.mlp.fc2.bias
[0200/0455] saving transformer.h.19.mlp.fc2.q_weight
[0201/0455] saving transformer.h.19.mlp.fc2.q_scale
[0202/0455] saving transformer.h.2.ln.bias
[0203/0455] saving transformer.h.2.ln.weight
[0204/0455] saving transformer.h.2.mixer.Wqkv.bias
[0205/0455] saving transformer.h.2.mixer.Wqkv.q_weight
[0206/0455] saving transformer.h.2.mixer.Wqkv.q_scale
[0207/0455] saving transformer.h.2.mixer.out_proj.bias
[0208/0455] saving transformer.h.2.mixer.out_proj.q_weight
[0209/0455] saving transformer.h.2.mixer.out_proj.q_scale
[0210/0455] saving transformer.h.2.mlp.fc1.bias
[0211/0455] saving transformer.h.2.mlp.fc1.q_weight
[0212/0455] saving transformer.h.2.mlp.fc1.q_scale
[0213/0455] saving transformer.h.2.mlp.fc2.bias
[0214/0455] saving transformer.h.2.mlp.fc2.q_weight
[0215/0455] saving transformer.h.2.mlp.fc2.q_scale
[0216/0455] saving transformer.h.20.ln.bias
[0217/0455] saving transformer.h.20.ln.weight
[0218/0455] saving transformer.h.20.mixer.Wqkv.bias
[0219/0455] saving transformer.h.20.mixer.Wqkv.q_weight
[0220/0455] saving transformer.h.20.mixer.Wqkv.q_scale
[0221/0455] saving transformer.h.20.mixer.out_proj.bias
[0222/0455] saving transformer.h.20.mixer.out_proj.q_weight
[0223/0455] saving transformer.h.20.mixer.out_proj.q_scale
[0224/0455] saving transformer.h.20.mlp.fc1.bias
[0225/0455] saving transformer.h.20.mlp.fc1.q_weight
[0226/0455] saving transformer.h.20.mlp.fc1.q_scale
[0227/0455] saving transformer.h.20.mlp.fc2.bias
[0228/0455] saving transformer.h.20.mlp.fc2.q_weight
[0229/0455] saving transformer.h.20.mlp.fc2.q_scale
[0230/0455] saving transformer.h.21.ln.bias
[0231/0455] saving transformer.h.21.ln.weight
[0232/0455] saving transformer.h.21.mixer.Wqkv.bias
[0233/0455] saving transformer.h.21.mixer.Wqkv.q_weight
[0234/0455] saving transformer.h.21.mixer.Wqkv.q_scale
[0235/0455] saving transformer.h.21.mixer.out_proj.bias
[0236/0455] saving transformer.h.21.mixer.out_proj.q_weight
[0237/0455] saving transformer.h.21.mixer.out_proj.q_scale
[0238/0455] saving transformer.h.21.mlp.fc1.bias
[0239/0455] saving transformer.h.21.mlp.fc1.q_weight
[0240/0455] saving transformer.h.21.mlp.fc1.q_scale
[0241/0455] saving transformer.h.21.mlp.fc2.bias
[0242/0455] saving transformer.h.21.mlp.fc2.q_weight
[0243/0455] saving transformer.h.21.mlp.fc2.q_scale
[0244/0455] saving transformer.h.22.ln.bias
[0245/0455] saving transformer.h.22.ln.weight
[0246/0455] saving transformer.h.22.mixer.Wqkv.bias
[0247/0455] saving transformer.h.22.mixer.Wqkv.q_weight
[0248/0455] saving transformer.h.22.mixer.Wqkv.q_scale
[0249/0455] saving transformer.h.22.mixer.out_proj.bias
[0250/0455] saving transformer.h.22.mixer.out_proj.q_weight
[0251/0455] saving transformer.h.22.mixer.out_proj.q_scale
[0252/0455] saving transformer.h.22.mlp.fc1.bias
[0253/0455] saving transformer.h.22.mlp.fc1.q_weight
[0254/0455] saving transformer.h.22.mlp.fc1.q_scale
[0255/0455] saving transformer.h.22.mlp.fc2.bias
[0256/0455] saving transformer.h.22.mlp.fc2.q_weight
[0257/0455] saving transformer.h.22.mlp.fc2.q_scale
[0258/0455] saving transformer.h.23.ln.bias
[0259/0455] saving transformer.h.23.ln.weight
[0260/0455] saving transformer.h.23.mixer.Wqkv.bias
[0261/0455] saving transformer.h.23.mixer.Wqkv.q_weight
[0262/0455] saving transformer.h.23.mixer.Wqkv.q_scale
[0263/0455] saving transformer.h.23.mixer.out_proj.bias
[0264/0455] saving transformer.h.23.mixer.out_proj.q_weight
[0265/0455] saving transformer.h.23.mixer.out_proj.q_scale
[0266/0455] saving transformer.h.23.mlp.fc1.bias
[0267/0455] saving transformer.h.23.mlp.fc1.q_weight
[0268/0455] saving transformer.h.23.mlp.fc1.q_scale
[0269/0455] saving transformer.h.23.mlp.fc2.bias
[0270/0455] saving transformer.h.23.mlp.fc2.q_weight
[0271/0455] saving transformer.h.23.mlp.fc2.q_scale
[0272/0455] saving transformer.h.24.ln.bias
[0273/0455] saving transformer.h.24.ln.weight
[0274/0455] saving transformer.h.24.mixer.Wqkv.bias
[0275/0455] saving transformer.h.24.mixer.Wqkv.q_weight
[0276/0455] saving transformer.h.24.mixer.Wqkv.q_scale
[0277/0455] saving transformer.h.24.mixer.out_proj.bias
[0278/0455] saving transformer.h.24.mixer.out_proj.q_weight
[0279/0455] saving transformer.h.24.mixer.out_proj.q_scale
[0280/0455] saving transformer.h.24.mlp.fc1.bias
[0281/0455] saving transformer.h.24.mlp.fc1.q_weight
[0282/0455] saving transformer.h.24.mlp.fc1.q_scale
[0283/0455] saving transformer.h.24.mlp.fc2.bias
[0284/0455] saving transformer.h.24.mlp.fc2.q_weight
[0285/0455] saving transformer.h.24.mlp.fc2.q_scale
[0286/0455] saving transformer.h.25.ln.bias
[0287/0455] saving transformer.h.25.ln.weight
[0288/0455] saving transformer.h.25.mixer.Wqkv.bias
[0289/0455] saving transformer.h.25.mixer.Wqkv.q_weight
[0290/0455] saving transformer.h.25.mixer.Wqkv.q_scale
[0291/0455] saving transformer.h.25.mixer.out_proj.bias
[0292/0455] saving transformer.h.25.mixer.out_proj.q_weight
[0293/0455] saving transformer.h.25.mixer.out_proj.q_scale
[0294/0455] saving transformer.h.25.mlp.fc1.bias
[0295/0455] saving transformer.h.25.mlp.fc1.q_weight
[0296/0455] saving transformer.h.25.mlp.fc1.q_scale
[0297/0455] saving transformer.h.25.mlp.fc2.bias
[0298/0455] saving transformer.h.25.mlp.fc2.q_weight
[0299/0455] saving transformer.h.25.mlp.fc2.q_scale
[0300/0455] saving transformer.h.26.ln.bias
[0301/0455] saving transformer.h.26.ln.weight
[0302/0455] saving transformer.h.26.mixer.Wqkv.bias
[0303/0455] saving transformer.h.26.mixer.Wqkv.q_weight
[0304/0455] saving transformer.h.26.mixer.Wqkv.q_scale
[0305/0455] saving transformer.h.26.mixer.out_proj.bias
[0306/0455] saving transformer.h.26.mixer.out_proj.q_weight
[0307/0455] saving transformer.h.26.mixer.out_proj.q_scale
[0308/0455] saving transformer.h.26.mlp.fc1.bias
[0309/0455] saving transformer.h.26.mlp.fc1.q_weight
[0310/0455] saving transformer.h.26.mlp.fc1.q_scale
[0311/0455] saving transformer.h.26.mlp.fc2.bias
[0312/0455] saving transformer.h.26.mlp.fc2.q_weight
[0313/0455] saving transformer.h.26.mlp.fc2.q_scale
[0314/0455] saving transformer.h.27.ln.bias
[0315/0455] saving transformer.h.27.ln.weight
[0316/0455] saving transformer.h.27.mixer.Wqkv.bias
[0317/0455] saving transformer.h.27.mixer.Wqkv.q_weight
[0318/0455] saving transformer.h.27.mixer.Wqkv.q_scale
[0319/0455] saving transformer.h.27.mixer.out_proj.bias
[0320/0455] saving transformer.h.27.mixer.out_proj.q_weight
[0321/0455] saving transformer.h.27.mixer.out_proj.q_scale
[0322/0455] saving transformer.h.27.mlp.fc1.bias
[0323/0455] saving transformer.h.27.mlp.fc1.q_weight
[0324/0455] saving transformer.h.27.mlp.fc1.q_scale
[0325/0455] saving transformer.h.27.mlp.fc2.bias
[0326/0455] saving transformer.h.27.mlp.fc2.q_weight
[0327/0455] saving transformer.h.27.mlp.fc2.q_scale
[0328/0455] saving transformer.h.28.ln.bias
[0329/0455] saving transformer.h.28.ln.weight
[0330/0455] saving transformer.h.28.mixer.Wqkv.bias
[0331/0455] saving transformer.h.28.mixer.Wqkv.q_weight
[0332/0455] saving transformer.h.28.mixer.Wqkv.q_scale
[0333/0455] saving transformer.h.28.mixer.out_proj.bias
[0334/0455] saving transformer.h.28.mixer.out_proj.q_weight
[0335/0455] saving transformer.h.28.mixer.out_proj.q_scale
[0336/0455] saving transformer.h.28.mlp.fc1.bias
[0337/0455] saving transformer.h.28.mlp.fc1.q_weight
[0338/0455] saving transformer.h.28.mlp.fc1.q_scale
[0339/0455] saving transformer.h.28.mlp.fc2.bias
[0340/0455] saving transformer.h.28.mlp.fc2.q_weight
[0341/0455] saving transformer.h.28.mlp.fc2.q_scale
[0342/0455] saving transformer.h.29.ln.bias
[0343/0455] saving transformer.h.29.ln.weight
[0344/0455] saving transformer.h.29.mixer.Wqkv.bias
[0345/0455] saving transformer.h.29.mixer.Wqkv.q_weight
[0346/0455] saving transformer.h.29.mixer.Wqkv.q_scale
[0347/0455] saving transformer.h.29.mixer.out_proj.bias
[0348/0455] saving transformer.h.29.mixer.out_proj.q_weight
[0349/0455] saving transformer.h.29.mixer.out_proj.q_scale
[0350/0455] saving transformer.h.29.mlp.fc1.bias
[0351/0455] saving transformer.h.29.mlp.fc1.q_weight
[0352/0455] saving transformer.h.29.mlp.fc1.q_scale
[0353/0455] saving transformer.h.29.mlp.fc2.bias
[0354/0455] saving transformer.h.29.mlp.fc2.q_weight
[0355/0455] saving transformer.h.29.mlp.fc2.q_scale
[0356/0455] saving transformer.h.3.ln.bias
[0357/0455] saving transformer.h.3.ln.weight
[0358/0455] saving transformer.h.3.mixer.Wqkv.bias
[0359/0455] saving transformer.h.3.mixer.Wqkv.q_weight
[0360/0455] saving transformer.h.3.mixer.Wqkv.q_scale
[0361/0455] saving transformer.h.3.mixer.out_proj.bias
[0362/0455] saving transformer.h.3.mixer.out_proj.q_weight
[0363/0455] saving transformer.h.3.mixer.out_proj.q_scale
[0364/0455] saving transformer.h.3.mlp.fc1.bias
[0365/0455] saving transformer.h.3.mlp.fc1.q_weight
[0366/0455] saving transformer.h.3.mlp.fc1.q_scale
[0367/0455] saving transformer.h.3.mlp.fc2.bias
[0368/0455] saving transformer.h.3.mlp.fc2.q_weight
[0369/0455] saving transformer.h.3.mlp.fc2.q_scale
[0370/0455] saving transformer.h.30.ln.bias
[0371/0455] saving transformer.h.30.ln.weight
[0372/0455] saving transformer.h.4.ln.bias
[0373/0455] saving transformer.h.4.ln.weight
[0374/0455] saving transformer.h.4.mixer.Wqkv.bias
[0375/0455] saving transformer.h.4.mixer.Wqkv.q_weight
[0376/0455] saving transformer.h.4.mixer.Wqkv.q_scale
[0377/0455] saving transformer.h.4.mixer.out_proj.bias
[0378/0455] saving transformer.h.4.mixer.out_proj.q_weight
[0379/0455] saving transformer.h.4.mixer.out_proj.q_scale
[0380/0455] saving transformer.h.4.mlp.fc1.bias
[0381/0455] saving transformer.h.4.mlp.fc1.q_weight
[0382/0455] saving transformer.h.4.mlp.fc1.q_scale
[0383/0455] saving transformer.h.4.mlp.fc2.bias
[0384/0455] saving transformer.h.4.mlp.fc2.q_weight
[0385/0455] saving transformer.h.4.mlp.fc2.q_scale
[0386/0455] saving transformer.h.5.ln.bias
[0387/0455] saving transformer.h.5.ln.weight
[0388/0455] saving transformer.h.5.mixer.Wqkv.bias
[0389/0455] saving transformer.h.5.mixer.Wqkv.q_weight
[0390/0455] saving transformer.h.5.mixer.Wqkv.q_scale
[0391/0455] saving transformer.h.5.mixer.out_proj.bias
[0392/0455] saving transformer.h.5.mixer.out_proj.q_weight
[0393/0455] saving transformer.h.5.mixer.out_proj.q_scale
[0394/0455] saving transformer.h.5.mlp.fc1.bias
[0395/0455] saving transformer.h.5.mlp.fc1.q_weight
[0396/0455] saving transformer.h.5.mlp.fc1.q_scale
[0397/0455] saving transformer.h.5.mlp.fc2.bias
[0398/0455] saving transformer.h.5.mlp.fc2.q_weight
[0399/0455] saving transformer.h.5.mlp.fc2.q_scale
[0400/0455] saving transformer.h.6.ln.bias
[0401/0455] saving transformer.h.6.ln.weight
[0402/0455] saving transformer.h.6.mixer.Wqkv.bias
[0403/0455] saving transformer.h.6.mixer.Wqkv.q_weight
[0404/0455] saving transformer.h.6.mixer.Wqkv.q_scale
[0405/0455] saving transformer.h.6.mixer.out_proj.bias
[0406/0455] saving transformer.h.6.mixer.out_proj.q_weight
[0407/0455] saving transformer.h.6.mixer.out_proj.q_scale
[0408/0455] saving transformer.h.6.mlp.fc1.bias
[0409/0455] saving transformer.h.6.mlp.fc1.q_weight
[0410/0455] saving transformer.h.6.mlp.fc1.q_scale
[0411/0455] saving transformer.h.6.mlp.fc2.bias
[0412/0455] saving transformer.h.6.mlp.fc2.q_weight
[0413/0455] saving transformer.h.6.mlp.fc2.q_scale
[0414/0455] saving transformer.h.7.ln.bias
[0415/0455] saving transformer.h.7.ln.weight
[0416/0455] saving transformer.h.7.mixer.Wqkv.bias
[0417/0455] saving transformer.h.7.mixer.Wqkv.q_weight
[0418/0455] saving transformer.h.7.mixer.Wqkv.q_scale
[0419/0455] saving transformer.h.7.mixer.out_proj.bias
[0420/0455] saving transformer.h.7.mixer.out_proj.q_weight
[0421/0455] saving transformer.h.7.mixer.out_proj.q_scale
[0422/0455] saving transformer.h.7.mlp.fc1.bias
[0423/0455] saving transformer.h.7.mlp.fc1.q_weight
[0424/0455] saving transformer.h.7.mlp.fc1.q_scale
[0425/0455] saving transformer.h.7.mlp.fc2.bias
[0426/0455] saving transformer.h.7.mlp.fc2.q_weight
[0427/0455] saving transformer.h.7.mlp.fc2.q_scale
[0428/0455] saving transformer.h.8.ln.bias
[0429/0455] saving transformer.h.8.ln.weight
[0430/0455] saving transformer.h.8.mixer.Wqkv.bias
[0431/0455] saving transformer.h.8.mixer.Wqkv.q_weight
[0432/0455] saving transformer.h.8.mixer.Wqkv.q_scale
[0433/0455] saving transformer.h.8.mixer.out_proj.bias
[0434/0455] saving transformer.h.8.mixer.out_proj.q_weight
[0435/0455] saving transformer.h.8.mixer.out_proj.q_scale[2023-12-28 06:50:55] INFO convert_weight.py:132: Saved to directory: [1m/tmp/tmpiszj5ycn[0m |
|
[0436/0455] saving transformer.h.8.mlp.fc1.bias
[0437/0455] saving transformer.h.8.mlp.fc1.q_weight
[0438/0455] saving transformer.h.8.mlp.fc1.q_scale
[0439/0455] saving transformer.h.8.mlp.fc2.bias
[0440/0455] saving transformer.h.8.mlp.fc2.q_weight
[0441/0455] saving transformer.h.8.mlp.fc2.q_scale
[0442/0455] saving transformer.h.9.ln.bias
[0443/0455] saving transformer.h.9.ln.weight
[0444/0455] saving transformer.h.9.mixer.Wqkv.bias
[0445/0455] saving transformer.h.9.mixer.Wqkv.q_weight
[0446/0455] saving transformer.h.9.mixer.Wqkv.q_scale
[0447/0455] saving transformer.h.9.mixer.out_proj.bias
[0448/0455] saving transformer.h.9.mixer.out_proj.q_weight
[0449/0455] saving transformer.h.9.mixer.out_proj.q_scale
[0450/0455] saving transformer.h.9.mlp.fc1.bias
[0451/0455] saving transformer.h.9.mlp.fc1.q_weight
[0452/0455] saving transformer.h.9.mlp.fc1.q_scale
[0453/0455] saving transformer.h.9.mlp.fc2.bias
[0454/0455] saving transformer.h.9.mlp.fc2.q_weight
[0455/0455] saving transformer.h.9.mlp.fc2.q_scale |
| All finished, 51 total shards committed, record saved to /tmp/tmpiszj5ycn/ndarray-cache.json |
|
|