According to the LLM Compressor docs the save_compressed=True flag should be present as shown in this example from them: https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_int8

I believe this is an issue with https://huggingface.co/neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a8 also.

Red Hat AI org

This is set to True by default.

got it, my bad

shariqmobin changed pull request status to closed

Sign up or log in to comment