vllm says it's not AWQ

#2
by jcowles - opened

It says it's compressed tensors

compressed tensors is the file format, AWQ was the quantization algorithm. It's AWQ.

Specifically --quantization awq complains that it's not an awq quantized model.

It does load and run without this flag though

Yeah, --quantization compressed-tensors would work too. No flag lets vLLM auto detect the file format.. the --quantization flag is misleading as to what it does.

Sign up or log in to comment