LGxNDs commited on
Commit
ffa8c49
·
verified ·
1 Parent(s): 9086a30

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -36
README.md CHANGED
@@ -1,9 +1,7 @@
1
  ---
2
 
3
- comments: |
4
 
5
- This model card describes the Qwen3.6 IQ2_M 2-bit quantized model repository.
6
-
7
  tags:
8
 
9
  - gguf
@@ -14,6 +12,8 @@ tags:
14
 
15
  - conversational
16
 
 
 
17
  ---
18
 
19
 
@@ -22,8 +22,6 @@ tags:
22
 
23
 
24
 
25
- ## Overview
26
-
27
  This repository contains a **Qwen3.6** large language model series that has been quantized to **IQ2_M (Intelligent Quants) 2-bit precision** using the GGUF format. The model retains high-quality performance while significantly reducing memory footprint through advanced mixed-precision quantization techniques.
28
 
29
 
@@ -52,13 +50,13 @@ This repository contains a **Qwen3.6** large language model series that has been
52
 
53
  The **IQ2_M** quantization scheme is part of the Intelligent Quants (IQ) family developed for efficient model inference:
54
 
55
- - **Mixed Precision**: Different weights receive varying bit allocations based on their sensitivity and importance
56
  -
57
- - **Block-wise Quantization**: Optimized scaling factors applied across weight blocks
58
  -
59
- - **2-Bit Compression**: Achieves extreme low-bit precision while preserving critical model capabilities
60
  -
61
- - **Smart Allocation**: Critical parameters preserved in higher precision, less important weights packed into minimal bit formats
62
  -
63
 
64
 
@@ -78,42 +76,17 @@ This quantized model is designed for:
78
 
79
  ## Usage Instructions
80
 
81
- To load this model locally:
82
-
83
- ```bash
84
-
85
- # Using llama.cpp
86
-
87
- llama-server --model Qwen3.6-GeekedOutAi-35B-A3B-BF16-IQ2_M.gguf
88
-
89
-
90
-
91
- # Using LM Studio or compatible tools
92
-
93
- Load the GGUF files from this repository
94
-
95
- ```
96
 
97
 
98
 
99
  ## Technical Notes
100
 
101
- - The model is split into two parts (00001-of-00002 and 00002-of-00002) for efficient storage
102
- -
103
  - IQ2_M quantization maintains conversational capability while achieving significant size reduction
104
  -
105
- - Compatible with llama.cpp and local inference frameworks
106
  -
107
  - Uses imatrix-based calibration for optimal quantization quality
108
- -
109
-
110
-
111
- ## License & Attribution
112
-
113
- This model card documents a community-contributed quantized version of Qwen3.6 optimized for practical deployment scenarios.
114
-
115
-
116
-
117
 
118
 
119
 
 
1
  ---
2
 
3
+ title: Qwen3.6 IQ2_M 2-Bit Quantized Model
4
 
 
 
5
  tags:
6
 
7
  - gguf
 
12
 
13
  - conversational
14
 
15
+ license: other
16
+
17
  ---
18
 
19
 
 
22
 
23
 
24
 
 
 
25
  This repository contains a **Qwen3.6** large language model series that has been quantized to **IQ2_M (Intelligent Quants) 2-bit precision** using the GGUF format. The model retains high-quality performance while significantly reducing memory footprint through advanced mixed-precision quantization techniques.
26
 
27
 
 
50
 
51
  The **IQ2_M** quantization scheme is part of the Intelligent Quants (IQ) family developed for efficient model inference:
52
 
53
+ - Mixed precision - different weights receive varying bit allocations based on their sensitivity and importance
54
  -
55
+ - Block-wise quantization with optimized scaling factors applied across weight blocks
56
  -
57
+ - 2-bit compression achieving extreme low-bit precision while preserving critical model capabilities
58
  -
59
+ - Smart allocation where critical parameters are preserved in higher precision while less important weights are packed into minimal bit formats
60
  -
61
 
62
 
 
76
 
77
  ## Usage Instructions
78
 
79
+ To load this model locally using llama.cpp or compatible inference frameworks. The GGUF files are split into two parts for efficient storage (00001-of-00002 and 00002-of-00002).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
 
82
 
83
  ## Technical Notes
84
 
 
 
85
  - IQ2_M quantization maintains conversational capability while achieving significant size reduction
86
  -
87
+ - Compatible with llama.cpp, LM Studio, Jan, and other local inference frameworks
88
  -
89
  - Uses imatrix-based calibration for optimal quantization quality
 
 
 
 
 
 
 
 
 
90
 
91
 
92