model.language_model.layers.59.mlp.experts.213.down_proj.qweight is duplicated in shard 39 and 40
#5
by pfn0 - opened
the index says the weight should be in shard 38. gemini says this is not normal. vllm fails to load and also complains of a duplicate weight
Qwen3.5-397B-A17B-int4-AutoRound$ grep -ao "model.language_model.layers.59.mlp.experts.213.down_proj.qweight" *.safetensors | cut -d: -f1 | sort | uniq -c
1 model-00039-of-00040.safetensors
1 model-00040-of-00040.safetensors
Qwen3.5-397B-A17B-int4-AutoRound$ grep model.language_model.layers.59.mlp.experts.213.down_proj.qweight model.safetensors.index.json
"model.language_model.layers.59.mlp.experts.213.down_proj.qweight": "model-00038-of-00040.safetensors",
vllm error:
vllm | (EngineCore pid=596) (RayWorkerWrapper pid=263, ip=192.168.177.12) ERROR 03-27 22:53:13 [ray_utils.py:74] Exception: FilesBufferOnDevice: key model.language_model.layers.59.mlp.experts.213.down_proj.qweight must be unique among files
Looks like using load-format fastsafetensors causes vllm to error out, and using format safetensors ignores the duplicate weight