CompressedGemma commited on
Commit
2a6ab91
·
verified ·
1 Parent(s): b33a755

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -133,11 +133,12 @@ python3 llama.cpp/convert_hf_to_gguf.py /path/to/model/ --outfile Model-BF16
133
  ```
134
 
135
  **Step B: Generate Importance Matrix (iMatrix)**
136
- Download a calibration dataset (like wikitext-2) and generate the iMatrix:
137
  ```bash
138
- llama-imatrix -m Model-BF16.gguf -f wikitext-2-raw/wiki.train.raw -o imatrix.gguf --chunks 300 -ngl 0
139
  ```
140
- *Tip: Set `-ngl 99` to use GPU acceleration, which speeds up this step significantly.*
 
141
 
142
  **Step C: Quantize with HPC**
143
  Execute the re-quantizer with your newly generated BF16 GGUF and iMatrix.
 
133
  ```
134
 
135
  **Step B: Generate Importance Matrix (iMatrix)**
136
+ Download a calibration dataset (One is included) and generate the iMatrix:
137
  ```bash
138
+ python3 /generate_imatrix.py /.gguf /calibration_data.txt -o /imatrix.dat --chunks 10 --verbose
139
  ```
140
+ Note: The provided imatrix generator is superior to llama imatrix generator and is meant to be used with HPC quantize.
141
+
142
 
143
  **Step C: Quantize with HPC**
144
  Execute the re-quantizer with your newly generated BF16 GGUF and iMatrix.