| --- |
| |
| |
| |
| |
| |
| title: IQ2_M - GeekedOut Quantizer |
|
|
|
|
|
|
|
|
|
|
| tags: |
|
|
|
|
|
|
|
|
|
|
| - gguf |
|
|
|
|
|
|
|
|
|
|
| - iq2-m |
|
|
|
|
|
|
|
|
|
|
| - quantization |
|
|
|
|
|
|
|
|
|
|
| - geeked-out |
|
|
|
|
|
|
|
|
|
|
| license: other |
|
|
|
|
|
|
|
|
|
|
| --- |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| # IQ2_M - GeekedOut Quantizer |
| |
| |
| |
| |
| |
| GeekedOut Quantizer is a specialized 2-bit quantization tool that implements the IQ2_M (Intelligent Quants) scheme for efficient model compression. This repository showcases IQ2_M quantized models with extreme low-bit precision while preserving critical model capabilities through intelligent weight allocation. |
| |
| |
| |
| |
| |
| |
| |
| ## About GeekedOut Quantizer |
| |
| |
| |
| |
| |
| GeekedOut Quantizer is an advanced quantization framework designed to: |
| |
| |
| |
| |
| |
| - Achieve 2-bit compression using the IQ2_M scheme |
| - |
| - Maintain high-quality inference performance |
| - |
| - Support GGUF format for local deployment |
| - |
| - Optimize memory efficiency through mixed-precision techniques |
| - |
|
|
|
|
|
|
|
|
|
|
|
|
| ## The IQ2_M Intelligence Concept |
| |
| |
| |
| |
| |
| GeekedOut Quantizer models are designed with intelligence as their primary capability. Through intelligent weight allocation, **intelligence** is preserved in critical parameters while less important weights are packed into minimal bit formats: |
| |
| |
| |
| |
| |
| - Mixed precision - different weights receive varying bit allocations based on their sensitivity and importance |
| - |
| - Block-wise quantization with optimized scaling factors applied across weight blocks |
| - |
| - 2-bit compression achieving extreme low-bit precision while preserving critical model capabilities |
| - |
| - Smart allocation where critical parameters are preserved in higher precision while less important weights are packed into minimal bit formats |
| - |
| |
| |
| |
| |
| |
| |
| ## The Quantization Process |
| |
| |
| |
| |
| |
| GeekedOut uses the A:\Geeked.Out software to create models that are intelligent through: |
| |
| |
| |
| |
| |
| 1. **Intelligent calibration** - imatrix-based calibration for optimal quantization quality |
| 2. |
| 2. **Mixed-precision allocation** - critical parameters receive higher precision while less important weights receive minimal bit formats |
| 3. |
| 3. **Block-wise optimization** - optimized scaling factors applied across weight blocks |
| 4. |
| 4. **Smart allocation** - intelligence is preserved through intelligent weight distribution |
| 5. |
| |
| |
| |
| |
| ## IQ2_M Quantization Features |
|
|
|
|
|
|
|
|
|
|
| The **IQ2_M** (Intelligent Quants) quantization scheme features: |
| |
| |
| |
| |
| |
| - The quantized models retain conversational capability while achieving significant size reduction |
| - |
| - Compatible with llama.cpp, LM Studio, Jan, and other local inference frameworks |
| - |
| - Uses imatrix-based calibration for optimal quantization quality |
| - |
| - Developed by GeekedOut - focused on intelligent quantization methods |
| - |
| |
| |
| |
| |
| |
| |
| ## Supported Use Cases |
| |
| |
| |
| |
| |
| GeekedOut Quantizer models are designed for: |
| |
| |
| |
| |
| |
| - Conversational AI applications where intelligence is preserved through IQ2_M quantization |
| - |
| - Local inference with llama.cpp, LM Studio, Jan, and similar tools |
| - |
| - Memory-efficient deployment scenarios |
| - |
| - Practical everyday use cases requiring reduced memory footprint |
| - |
| |
| |
| |
| |
| |
| |
| ## Usage Instructions |
| |
| |
| |
| |
| |
| To load IQ2_M quantized models locally using llama.cpp or compatible inference frameworks. The GGUF files are split into two parts for efficient storage (00001-of-00002 and 00002-of-00002). |
| |
| |
| |
| |
| |
| **Example:** |
| |
| ```bash |
| |
| # Load the IQ2_M quantized model using llama.cpp |
| |
| llama.cpp -hf LGxNDs/IQ2_M-2Bit-Quantization-By-Geeked-Out-Ai |
| |
| ``` |
| |
| |
| |
| |
| |
| ## Technical Notes |
| |
| |
| |
| |
| |
| - IQ2_M quantization maintains conversational capability while achieving significant size reduction |
| - |
| - Compatible with llama.cpp, LM Studio, Jan, and other local inference frameworks |
| - |
| - Uses imatrix-based calibration for optimal quantization quality |
| - |
| - Developed by GeekedOut - focused on intelligent quantization methods using A:\Geeked.Out software |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |