junrushao's picture
Initial commit
acaf83d
/home/junrushao/micromamba/envs/python311/bin/python -m mlc_chat gen_config /home/junrushao/tmp/tmp5aeqhx5w/repo --quantization q3f16_1 --conv-template wizard_coder_or_math --output /home/junrushao/tmp/tmplvq40lw6 --context-window-size 8192
[2023-12-29 13:45:39] INFO auto_config.py:115: Found model configuration: /home/junrushao/tmp/tmp5aeqhx5w/repo/config.json
[2023-12-29 13:45:39] INFO auto_config.py:151: Found model type: gpt_bigcode. Use `--model-type` to override.
[2023-12-29 13:45:39] INFO gpt_bigcode_model.py:41: context_window_size not found in config.json. Falling back to n_positions (8192)
[2023-12-29 13:45:39] INFO flags_model_config_override.py:63: Default prefill_chunk_size to context_window_size (8192) because it is not provided
[2023-12-29 13:45:39] INFO flags_model_config_override.py:112: Overriding context_window_size from 8192 to 8192
[2023-12-29 13:45:39] INFO flags_model_config_override.py:112: Overriding prefill_chunk_size from 8192 to 8192
[2023-12-29 13:45:39] INFO gen_config.py:115: [generation_config.json] Setting bos_token_id: 0
[2023-12-29 13:45:39] INFO gen_config.py:115: [generation_config.json] Setting eos_token_id: 0
[2023-12-29 13:45:39] INFO gen_config.py:129: Not found tokenizer config: /home/junrushao/tmp/tmp5aeqhx5w/repo/tokenizer.model
[2023-12-29 13:45:39] INFO gen_config.py:127: Found tokenizer config: /home/junrushao/tmp/tmp5aeqhx5w/repo/tokenizer.json. Copying to /home/junrushao/tmp/tmplvq40lw6/tokenizer.json
[2023-12-29 13:45:39] INFO gen_config.py:127: Found tokenizer config: /home/junrushao/tmp/tmp5aeqhx5w/repo/vocab.json. Copying to /home/junrushao/tmp/tmplvq40lw6/vocab.json
[2023-12-29 13:45:39] INFO gen_config.py:127: Found tokenizer config: /home/junrushao/tmp/tmp5aeqhx5w/repo/merges.txt. Copying to /home/junrushao/tmp/tmplvq40lw6/merges.txt
[2023-12-29 13:45:39] INFO gen_config.py:127: Found tokenizer config: /home/junrushao/tmp/tmp5aeqhx5w/repo/added_tokens.json. Copying to /home/junrushao/tmp/tmplvq40lw6/added_tokens.json
[2023-12-29 13:45:39] INFO gen_config.py:127: Found tokenizer config: /home/junrushao/tmp/tmp5aeqhx5w/repo/tokenizer_config.json. Copying to /home/junrushao/tmp/tmplvq40lw6/tokenizer_config.json
[2023-12-29 13:45:39] INFO gen_config.py:69: [System default] Setting pad_token_id: 0
[2023-12-29 13:45:39] INFO gen_config.py:69: [System default] Setting temperature: 0.7
[2023-12-29 13:45:39] INFO gen_config.py:69: [System default] Setting repetition_penalty: 1.0
[2023-12-29 13:45:39] INFO gen_config.py:69: [System default] Setting top_p: 0.95
[2023-12-29 13:45:39] INFO gen_config.py:69: [System default] Setting mean_gen_len: 128
[2023-12-29 13:45:39] INFO gen_config.py:69: [System default] Setting max_gen_len: 512
[2023-12-29 13:45:39] INFO gen_config.py:69: [System default] Setting shift_fill_factor: 0.3
[2023-12-29 13:45:39] INFO gen_config.py:157: Dumping configuration file to: /home/junrushao/tmp/tmplvq40lw6/mlc-chat-config.json
/home/junrushao/micromamba/envs/python311/bin/python -m mlc_chat convert_weight /home/junrushao/tmp/tmp5aeqhx5w/repo --quantization q3f16_1 --source-format auto --output /home/junrushao/tmp/tmplvq40lw6
[2023-12-29 13:45:40] INFO auto_config.py:115: Found model configuration: /home/junrushao/tmp/tmp5aeqhx5w/repo/config.json
[2023-12-29 13:45:41] INFO auto_device.py:76: Found device: cuda:0
[2023-12-29 13:45:41] INFO auto_device.py:76: Found device: cuda:1
[2023-12-29 13:45:41] INFO auto_device.py:76: Found device: cuda:2
[2023-12-29 13:45:41] INFO auto_device.py:76: Found device: cuda:3
[2023-12-29 13:45:41] INFO auto_device.py:85: Not found device: rocm:0
[2023-12-29 13:45:41] INFO auto_device.py:85: Not found device: metal:0
[2023-12-29 13:45:42] INFO auto_device.py:85: Not found device: vulkan:0
[2023-12-29 13:45:42] INFO auto_device.py:85: Not found device: opencl:0
[2023-12-29 13:45:42] INFO auto_device.py:33: Using device: cuda:0
[2023-12-29 13:45:42] INFO auto_weight.py:70: Finding weights in: /home/junrushao/tmp/tmp5aeqhx5w/repo
[2023-12-29 13:45:42] INFO auto_weight.py:129: Found source weight format: huggingface-torch. Source configuration: /home/junrushao/tmp/tmp5aeqhx5w/repo/pytorch_model.bin
[2023-12-29 13:45:42] INFO auto_weight.py:149: Not found Huggingface Safetensor
[2023-12-29 13:45:42] INFO auto_weight.py:106: Using source weight configuration: /home/junrushao/tmp/tmp5aeqhx5w/repo/pytorch_model.bin. Use `--source` to override.
[2023-12-29 13:45:42] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use `--source-format` to override.
[2023-12-29 13:45:42] INFO auto_config.py:151: Found model type: gpt_bigcode. Use `--model-type` to override.
[2023-12-29 13:45:42] INFO gpt_bigcode_model.py:41: context_window_size not found in config.json. Falling back to n_positions (8192)
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/__main__.py", line 39, in <module>
main()
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/__main__.py", line 28, in main
cli.main(sys.argv[2:])
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/cli/convert_weight.py", line 87, in main
convert_weight(
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/interface/convert_weight.py", line 147, in convert_weight
_convert_args(args)
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/interface/convert_weight.py", line 64, in _convert_args
model, quantize_map = args.model.quantize[args.quantization.kind](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/model/gpt_bigcode/gpt_bigcode_quantization.py", line 21, in group_quant
model = quantization.quantize_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/quantization/group_quantization.py", line 117, in quantize_model
model = mutator.visit(name_prefix, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/tvm-dev/python/tvm/relax/frontend/nn/visitor.py", line 140, in visit
setattr(node, key, self.visit_module(_get_child_name(name, key), value))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/quantization/group_quantization.py", line 113, in visit_module
return self.visit(name, node)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/tvm-dev/python/tvm/relax/frontend/nn/visitor.py", line 138, in visit
setattr(node, key, self.visit_modulelist(_get_child_name(name, key), value))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/tvm-dev/python/tvm/relax/frontend/nn/visitor.py", line 98, in visit_modulelist
return self.visit(name, node)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/tvm-dev/python/tvm/relax/frontend/nn/visitor.py", line 130, in visit
node[i] = self.visit_module(f"{name}.{i}", node[i])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/quantization/group_quantization.py", line 113, in visit_module
return self.visit(name, node)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/tvm-dev/python/tvm/relax/frontend/nn/visitor.py", line 140, in visit
setattr(node, key, self.visit_module(_get_child_name(name, key), value))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/quantization/group_quantization.py", line 113, in visit_module
return self.visit(name, node)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/tvm-dev/python/tvm/relax/frontend/nn/visitor.py", line 140, in visit
setattr(node, key, self.visit_module(_get_child_name(name, key), value))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/quantization/group_quantization.py", line 107, in visit_module
return GroupQuantizeLinear.from_linear(node, self.config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/quantization/group_quantization.py", line 323, in from_linear
_apply_sharding(shard, f"{shard.name}_q_weight", quantized_linear.q_weight)
File "/home/junrushao/Projects/mlc-llm/python/mlc_chat/quantization/group_quantization.py", line 444, in _apply_sharding
assert weight.shape[0] == sum(shard.rows)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Weight conversion with arguments:
--config /home/junrushao/tmp/tmp5aeqhx5w/repo/config.json
--quantization GroupQuantize(name='q3f16_1', kind='group-quant', group_size=40, quantize_dtype='int3', storage_dtype='uint32', model_dtype='float16', num_elem_per_storage=10, num_storage_per_group=4, max_int_value=3)
--model-type gpt_bigcode
--device cuda:0
--source /home/junrushao/tmp/tmp5aeqhx5w/repo/pytorch_model.bin
--source-format huggingface-torch
--output /home/junrushao/tmp/tmplvq40lw6