vllm says it's not AWQ
#2
by
jcowles
- opened
It says it's compressed tensors
compressed tensors is the file format, AWQ was the quantization algorithm. It's AWQ.
Specifically --quantization awq complains that it's not an awq quantized model.
It does load and run without this flag though
Yeah, --quantization compressed-tensors would work too. No flag lets vLLM auto detect the file format.. the --quantization flag is misleading as to what it does.