--- base_model: meta-llama/Llama-3.2-1B-Instruct language: - en library_name: transformers license: llama3.2 tags: - llama-3 - llama - meta - facebook - transformers --- Quantizing Llama-3.2-1B Eric Hartford I am creating several quants of Llama-3.1-1B for the purposes of testing vLLM Marlin. - https://huggingface.co/QuixiAI/Llama-3.2-1B - https://huggingface.co/QuixiAI/Llama-3.2-1B-FP8-Dynamic - https://huggingface.co/QuixiAI/Llama-3.2-1B-MXFP4 - https://huggingface.co/QuixiAI/Llama-3.2-1B-NVFP4A16 - https://huggingface.co/QuixiAI/Llama-3.2-1B-W4A16-AWQ - https://huggingface.co/QuixiAI/Llama-3.2-1B-W4A16-GPTQ - https://huggingface.co/QuixiAI/Llama-3.2-1B-W8A16-GPTQ The script I used to quant this: [quant.py](quant.py)