| CALIBRATION DATA INFORMATION |
| ============================= |
|
|
| This model was quantized using importance matrix (imatrix) generation. |
| The imatrix captures which weights in the model are most important for |
| maintaining output quality during extreme compression (2-bit quantization). |
|
|
| WHAT IS CALIBRATION? |
| -------------------- |
| Calibration is the process of running sample inputs through the model to |
| measure which tensors (weight matrices) contribute most to the output. |
| These measurements create an "importance matrix" that guides the quantizer |
| to preserve precision where it matters most. |
|
|
| CALIBRATION DATA CHARACTERISTICS |
| -------------------------------- |
| Good calibration data should be: |
|
|
| 1. REPRESENTATIVE |
| - Matches the domain the model will operate in |
| - Similar vocabulary and complexity to expected inputs |
| - Reflects actual use case scenarios |
|
|
| 2. DIVERSE |
| - Multiple topics, subjects, and writing styles |
| - Mix of common and rare tokens |
| - Varied sentence structures and lengths |
|
|
| 3. SUFFICIENT |
| - 100-500 text chunks of typical document length |
| - More chunks = better quality (diminishing returns beyond ~500) |
| - Each chunk processed independently |
|
|
| 4. NATURAL |
| - Real-world text (not synthetic or random) |
| - Domain-appropriate (code for code models, medical for medical models) |
| - Representative token distribution |
|
|
| CALIBRATION PROCESS PARAMETERS |
| ------------------------------ |
| Typical settings for this quantization: |
|
|
| Chunks Processed: 200-500 (production quality) |
| Chunk Size: Typical document/paragraph length |
| GPU Acceleration: Enabled (99 layers offloaded) |
| Thread Count: Auto-detected based on CPU |
|
|
| QUALITY IMPACT |
| -------------- |
| The importance matrix generated from quality calibration data enables: |
|
|
| - 3-8% perplexity increase (vs 10-20% without imatrix) |
| - Preservation of critical weights |
| - Intelligent bit allocation per tensor |
| - 16x compression with minimal quality loss |
|
|
| CALIBRATION DATA SOURCES |
| ------------------------ |
| Common sources for high-quality calibration data: |
|
|
| - Wikitext-2-raw (general language models) |
| - Domain-specific corpora (medical, legal, code) |
| - The Pile subset (diverse web text) |
| - Custom curated datasets matching expected use |
|
|
| VERIFICATION |
| ------------ |
| Quantized models are tested for: |
| ✓ Perplexity measurement vs baseline |
| ✓ Sample inference quality |
| ✓ Token prediction accuracy |
| ✓ Model file integrity |
|
|
| NOTES |
| ----- |
| - Calibration is performed once per source model |
| - Same imatrix can be reused for different target formats |
| - Domain-specific calibration yields better results |
| - GPU acceleration significantly speeds up generation |
|
|
| For questions about the calibration methodology used for this model, |
| please open a discussion on the model's Hugging Face page. |
|
|