helcig commited on
Commit
f9f6667
·
verified ·
1 Parent(s): 6a1168b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: gguf
3
+ tags:
4
+ - llama
5
+ - quantized
6
+ - gptq
7
+ - evopress
8
+ model_type: llama
9
+ base_model: meta-llama/Llama-3.2-1B-Instruct
10
+ ---
11
+
12
+ # Llama-3.2-1B-Instruct GGUF DASLab Quantization
13
+
14
+ This repository contains advanced quantized versions of Llama 3.2 1B Instruct using **GPTQ quantization** and **GPTQ+EvoPress optimization** from the [DASLab GGUF Toolkit](https://github.com/IST-DASLab/gguf-toolkit).
15
+
16
+ ## Models
17
+
18
+ - **GPTQ Uniform**: High-quality GPTQ quantization at 2-6 bit precision
19
+ - **GPTQ+EvoPress**: Non-uniform per-layer quantization discovered via evolutionary search
20
+
21
+ ## Performance
22
+
23
+ Our GPTQ-based quantization methods achieve **superior quality-compression tradeoffs** compared to standard quantization:
24
+
25
+ - **Better perplexity** at equivalent bitwidths vs. naive quantization approaches
26
+ - **Error-correcting updates** during calibration for improved accuracy
27
+ - **Optimized configurations** that allocate bits based on layer sensitivity (EvoPress)
28
+
29
+ ## Usage
30
+
31
+ Compatible with llama.cpp and all GGUF-supporting inference engines. No special setup required.
32
+
33
+ **Full documentation, evaluation results, and toolkit source**: https://github.com/IST-DASLab/gguf-toolkit
34
+
35
+ ---