model.language_model.layers.59.mlp.experts.213.down_proj.qweight is duplicated in shard 39 and 40

by pfn0 - opened Mar 27

Mar 27

the index says the weight should be in shard 38. gemini says this is not normal. vllm fails to load and also complains of a duplicate weight

Qwen3.5-397B-A17B-int4-AutoRound$ grep -ao "model.language_model.layers.59.mlp.experts.213.down_proj.qweight" *.safetensors | cut -d: -f1 | sort | uniq -c
      1 model-00039-of-00040.safetensors
      1 model-00040-of-00040.safetensors
Qwen3.5-397B-A17B-int4-AutoRound$ grep model.language_model.layers.59.mlp.experts.213.down_proj.qweight model.safetensors.index.json 
    "model.language_model.layers.59.mlp.experts.213.down_proj.qweight": "model-00038-of-00040.safetensors",

vllm error:

vllm  | (EngineCore pid=596) (RayWorkerWrapper pid=263, ip=192.168.177.12) ERROR 03-27 22:53:13 [ray_utils.py:74] Exception: FilesBufferOnDevice: key model.language_model.layers.59.mlp.experts.213.down_proj.qweight must be unique among files

pfn0

Apr 6

Looks like using load-format fastsafetensors causes vllm to error out, and using format safetensors ignores the duplicate weight

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment