AssertionError: assert module.weight.shape[1] == 1 in fix_4bit_weight_quant_state_from_module during first inference step
#2
by KeizoMiyazawa - opened
AssertionError: assert module.weight.shape[1] == 1infix_4bit_weight_quant_state_from_module` during first inference step
Hi,
I'm getting a consistent error during the first inference step with the NF4 Instruct-Distil v2 model. The model loads successfully (70β80s, 46.7GB on GPU), but fails at step 0 of generation.
Environment:
- GPU: NVIDIA RTX PRO 5000 Blackwell, 48GB VRAM
- ComfyUI: 0.15.1
- PyTorch: 2.10.0+cu130
- Python: 3.13.11
- bitsandbytes: 0.48.2
- transformers: 5.2.0
- accelerate: 1.12.0
- Model:
HunyuanImage-3.0-Instruct-Distil-NF4-v2 - Node:
Comfy_HunyuanImage3(latest,git pullconfirmed up to date)
Error:
UserWarning: FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first.
AssertionError: assert module.weight.shape[1] == 1
File "bitsandbytes/nn/modules.py", line 407, in fix_4bit_weight_quant_state_from_module
Stack trace path:generate_image β pipeline β model forward β decoder_layer β mlp β shared_mlp β gate_and_up_proj β fix_4bit_weight_quant_state_from_module
What I've tried:
- Isolated to single GPU with
CUDA_VISIBLE_DEVICES=0 - Confirmed
bnb_4bit_quant_type: nf4andbnb_4bit_quant_storage: uint8in config.json - Confirmed
shared_mlpis inllm_int8_skip_modules - Rolled back bitsandbytes to 0.48.2 (was 0.49.2)
- Cleared HuggingFace modules cache
- Confirmed no conflicting custom nodes
Any idea what's causing shared_mlp.gate_and_up_proj to have an uninitialized quant state?
Thanks!