| | --- |
| | base_model: meta-llama/Llama-3.2-1B-Instruct |
| | language: |
| | - en |
| | library_name: transformers |
| | license: llama3.2 |
| | tags: |
| | - llama-3 |
| | - llama |
| | - meta |
| | - facebook |
| | - transformers |
| | --- |
| | |
| | Quantizing Llama-3.2-1B |
| | Eric Hartford |
| |
|
| | I am creating several quants of Llama-3.1-1B for the purposes of testing vLLM Marlin. |
| |
|
| | - https://huggingface.co/QuixiAI/Llama-3.2-1B |
| | - https://huggingface.co/QuixiAI/Llama-3.2-1B-FP8-Dynamic |
| | - https://huggingface.co/QuixiAI/Llama-3.2-1B-MXFP4 |
| | - https://huggingface.co/QuixiAI/Llama-3.2-1B-NVFP4A16 |
| | - https://huggingface.co/QuixiAI/Llama-3.2-1B-W4A16-AWQ |
| | - https://huggingface.co/QuixiAI/Llama-3.2-1B-W4A16-GPTQ |
| | - https://huggingface.co/QuixiAI/Llama-3.2-1B-W8A16-GPTQ |
| |
|
| | The script I used to quant this: |
| | [quant.py](quant.py) |