model.language_model.layers.59.mlp.experts.213.down_proj.qweight is duplicated in shard 39 and 40

#5
by pfn0 - opened

the index says the weight should be in shard 38. gemini says this is not normal. vllm fails to load and also complains of a duplicate weight

Qwen3.5-397B-A17B-int4-AutoRound$ grep -ao "model.language_model.layers.59.mlp.experts.213.down_proj.qweight" *.safetensors | cut -d: -f1 | sort | uniq -c
      1 model-00039-of-00040.safetensors
      1 model-00040-of-00040.safetensors
Qwen3.5-397B-A17B-int4-AutoRound$ grep model.language_model.layers.59.mlp.experts.213.down_proj.qweight model.safetensors.index.json 
    "model.language_model.layers.59.mlp.experts.213.down_proj.qweight": "model-00038-of-00040.safetensors",

vllm error:

vllm  | (EngineCore pid=596) (RayWorkerWrapper pid=263, ip=192.168.177.12) ERROR 03-27 22:53:13 [ray_utils.py:74] Exception: FilesBufferOnDevice: key model.language_model.layers.59.mlp.experts.213.down_proj.qweight must be unique among files

Looks like using load-format fastsafetensors causes vllm to error out, and using format safetensors ignores the duplicate weight

Sign up or log in to comment