AWQ 4bit

#2
by MatthieuZ - opened

Any plan to make your usual AWQ 4bit version of this model?

Thanks for asking!

For now I don’t plan to make a separate “classic AWQ 4-bit” release.

One reason is that my recent AWQ-style releases have gradually moved away from data calibration. If we put the calibration strategy aside, the remaining part is mostly the packing/runtime format, which is essentially WnA16: low-bit weights with 16-bit activations. So format-wise, I’m now moving more toward a generic WnA16 / compressed-tensors representation rather than the older AWQ format.

Also, vLLM’s support for compressed-tensors has become much better, and it gives more fine-grained control over which parts to be kept in int4 / int8 / other formats. I’ve also seen people built FP4-Int8Mix variants based on this repo, so this format is quite convenient for experimentation.

If you specifically need the old AWQ format for compatibility with a certain backend, you can probably ask Codex / Claude Code to help repack this repo directly into an AWQ format. This is a pretty straightforward repacking task, and it should work out right without an issue.

Ok thanks for the hint, we'll try to repack.

MatthieuZ changed discussion status to closed

Sign up or log in to comment