CompressedGemma
/

HPC-Quantize

Model card Files Files and versions

CompressedGemma commited on May 7

Commit

2a6ab91

·

verified ·

1 Parent(s): b33a755

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -133,11 +133,12 @@ python3 llama.cpp/convert_hf_to_gguf.py /path/to/model/     --outfile Model-BF16
 ```
 **Step B: Generate Importance Matrix (iMatrix)**
-Download a calibration dataset (like wikitext-2) and generate the iMatrix:
 ```bash
-llama-imatrix     -m Model-BF16.gguf     -f wikitext-2-raw/wiki.train.raw     -o imatrix.gguf     --chunks 300     -ngl 0
 ```
-*Tip: Set `-ngl 99` to use GPU acceleration, which speeds up this step significantly.*
 **Step C: Quantize with HPC**
 Execute the re-quantizer with your newly generated BF16 GGUF and iMatrix.

 ```
 **Step B: Generate Importance Matrix (iMatrix)**
+Download a calibration dataset (One is included) and generate the iMatrix:
 ```bash
+python3 /generate_imatrix.py /.gguf /calibration_data.txt -o /imatrix.dat --chunks 10 --verbose
 ```
+Note: The provided imatrix generator is superior to llama imatrix generator and is meant to be used with HPC quantize.
 **Step C: Quantize with HPC**
 Execute the re-quantizer with your newly generated BF16 GGUF and iMatrix.