AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper
•
2306.00978
•
Published
•
11
phi-3-mini-4k-instruct-awq-4bit is a version of the Microsoft Phi 3 mini 4k Instruct model that was quantized using the AWQ method developed by Lin et al. (2023).
Please refer to the Original Phi 3 mini model card for details about the model preparation and training processes.
autoawq==0.2.5 – AutoAWQ was used to quantize the phi-3 model.vllm==0.4.2 – vLLM was used to host models for benchmarking.