Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: gguf
|
| 3 |
+
tags:
|
| 4 |
+
- llama
|
| 5 |
+
- quantized
|
| 6 |
+
- gptq
|
| 7 |
+
- evopress
|
| 8 |
+
model_type: llama
|
| 9 |
+
base_model: meta-llama/Llama-3.2-1B-Instruct
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Llama-3.2-1B-Instruct GGUF DASLab Quantization
|
| 13 |
+
|
| 14 |
+
This repository contains advanced quantized versions of Llama 3.2 1B Instruct using **GPTQ quantization** and **GPTQ+EvoPress optimization** from the [DASLab GGUF Toolkit](https://github.com/IST-DASLab/gguf-toolkit).
|
| 15 |
+
|
| 16 |
+
## Models
|
| 17 |
+
|
| 18 |
+
- **GPTQ Uniform**: High-quality GPTQ quantization at 2-6 bit precision
|
| 19 |
+
- **GPTQ+EvoPress**: Non-uniform per-layer quantization discovered via evolutionary search
|
| 20 |
+
|
| 21 |
+
## Performance
|
| 22 |
+
|
| 23 |
+
Our GPTQ-based quantization methods achieve **superior quality-compression tradeoffs** compared to standard quantization:
|
| 24 |
+
|
| 25 |
+
- **Better perplexity** at equivalent bitwidths vs. naive quantization approaches
|
| 26 |
+
- **Error-correcting updates** during calibration for improved accuracy
|
| 27 |
+
- **Optimized configurations** that allocate bits based on layer sensitivity (EvoPress)
|
| 28 |
+
|
| 29 |
+
## Usage
|
| 30 |
+
|
| 31 |
+
Compatible with llama.cpp and all GGUF-supporting inference engines. No special setup required.
|
| 32 |
+
|
| 33 |
+
**Full documentation, evaluation results, and toolkit source**: https://github.com/IST-DASLab/gguf-toolkit
|
| 34 |
+
|
| 35 |
+
---
|