AssertionError: assert module.weight.shape[1] == 1 in fix_4bit_weight_quant_state_from_module during first inference step

by KeizoMiyazawa - opened Mar 8

Mar 8

AssertionError: assert module.weight.shape[1] == 1infix_4bit_weight_quant_state_from_module` during first inference step

Hi,

I'm getting a consistent error during the first inference step with the NF4 Instruct-Distil v2 model. The model loads successfully (70–80s, 46.7GB on GPU), but fails at step 0 of generation.

Environment:

GPU: NVIDIA RTX PRO 5000 Blackwell, 48GB VRAM
ComfyUI: 0.15.1
PyTorch: 2.10.0+cu130
Python: 3.13.11
bitsandbytes: 0.48.2
transformers: 5.2.0
accelerate: 1.12.0
Model: HunyuanImage-3.0-Instruct-Distil-NF4-v2
Node: Comfy_HunyuanImage3 (latest, git pull confirmed up to date)

Error:

UserWarning: FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first.

AssertionError: assert module.weight.shape[1] == 1
  File "bitsandbytes/nn/modules.py", line 407, in fix_4bit_weight_quant_state_from_module

Stack trace path:
generate_image → pipeline → model forward → decoder_layer → mlp → shared_mlp → gate_and_up_proj → fix_4bit_weight_quant_state_from_module

What I've tried:

Isolated to single GPU with CUDA_VISIBLE_DEVICES=0
Confirmed bnb_4bit_quant_type: nf4 and bnb_4bit_quant_storage: uint8 in config.json
Confirmed shared_mlp is in llm_int8_skip_modules
Rolled back bitsandbytes to 0.48.2 (was 0.49.2)
Cleared HuggingFace modules cache
Confirmed no conflicting custom nodes

Any idea what's causing shared_mlp.gate_and_up_proj to have an uninitialized quant state?

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment