Update README.md
Browse files
README.md
CHANGED
|
@@ -204,14 +204,15 @@ When these globally-informed tokens are fed through the HPC forward pass for imp
|
|
| 204 |
|
| 205 |
Standard `llama.cpp` imatrix calibration at Q2_K typically requires hundreds of chunks (500K+ tokens) to avoid catastrophic degradation. The HPC pipeline achieves superior results with **one chunk** because the tokenizer has already done the work of compressing the entire document's structure into that chunk.
|
| 206 |
|
|
|
|
| 207 |
```bash
|
| 208 |
# Generate HPC importance matrix
|
| 209 |
python3 LLM/generate_imatrix.py \
|
| 210 |
model.gguf calibration_data.txt \
|
| 211 |
-
-o imatrix.dat --chunks
|
| 212 |
```
|
| 213 |
|
| 214 |
-
|
| 215 |
|
| 216 |
|
| 217 |
**Step C: Quantize with HPC**
|
|
|
|
| 204 |
|
| 205 |
Standard `llama.cpp` imatrix calibration at Q2_K typically requires hundreds of chunks (500K+ tokens) to avoid catastrophic degradation. The HPC pipeline achieves superior results with **one chunk** because the tokenizer has already done the work of compressing the entire document's structure into that chunk.
|
| 206 |
|
| 207 |
+
|
| 208 |
```bash
|
| 209 |
# Generate HPC importance matrix
|
| 210 |
python3 LLM/generate_imatrix.py \
|
| 211 |
model.gguf calibration_data.txt \
|
| 212 |
+
-o imatrix.dat --chunks 5 --verbose
|
| 213 |
```
|
| 214 |
|
| 215 |
+
5 Chunks is the 'sweet spot' for retaining most model intelligence I've found.
|
| 216 |
|
| 217 |
|
| 218 |
**Step C: Quantize with HPC**
|