AssertionError: assert module.weight.shape[1] == 1 in fix_4bit_weight_quant_state_from_module during first inference step

#2
by KeizoMiyazawa - opened

AssertionError: assert module.weight.shape[1] == 1infix_4bit_weight_quant_state_from_module` during first inference step

Hi,

I'm getting a consistent error during the first inference step with the NF4 Instruct-Distil v2 model. The model loads successfully (70–80s, 46.7GB on GPU), but fails at step 0 of generation.

Environment:

  • GPU: NVIDIA RTX PRO 5000 Blackwell, 48GB VRAM
  • ComfyUI: 0.15.1
  • PyTorch: 2.10.0+cu130
  • Python: 3.13.11
  • bitsandbytes: 0.48.2
  • transformers: 5.2.0
  • accelerate: 1.12.0
  • Model: HunyuanImage-3.0-Instruct-Distil-NF4-v2
  • Node: Comfy_HunyuanImage3 (latest, git pull confirmed up to date)

Error:

UserWarning: FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first.

AssertionError: assert module.weight.shape[1] == 1
  File "bitsandbytes/nn/modules.py", line 407, in fix_4bit_weight_quant_state_from_module

Stack trace path:
generate_image β†’ pipeline β†’ model forward β†’ decoder_layer β†’ mlp β†’ shared_mlp β†’ gate_and_up_proj β†’ fix_4bit_weight_quant_state_from_module

What I've tried:

  • Isolated to single GPU with CUDA_VISIBLE_DEVICES=0
  • Confirmed bnb_4bit_quant_type: nf4 and bnb_4bit_quant_storage: uint8 in config.json
  • Confirmed shared_mlp is in llm_int8_skip_modules
  • Rolled back bitsandbytes to 0.48.2 (was 0.49.2)
  • Cleared HuggingFace modules cache
  • Confirmed no conflicting custom nodes

Any idea what's causing shared_mlp.gate_and_up_proj to have an uninitialized quant state?

Thanks!

Sign up or log in to comment