LGxNDs commited on
Commit
e91407c
·
verified ·
1 Parent(s): ffa8c49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -18
README.md CHANGED
@@ -1,54 +1,110 @@
1
  ---
2
 
3
- title: Qwen3.6 IQ2_M 2-Bit Quantized Model
 
 
 
 
 
 
 
 
4
 
5
  tags:
6
 
 
 
 
 
7
  - gguf
8
 
 
 
 
 
9
  - iq2-m
10
 
11
- - qwen
12
 
13
- - conversational
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  license: other
16
 
 
 
 
 
17
  ---
18
 
19
 
20
 
21
- # Qwen3.6 - IQ2_M 2-Bit Quantized Model
22
 
23
 
24
 
25
- This repository contains a **Qwen3.6** large language model series that has been quantized to **IQ2_M (Intelligent Quants) 2-bit precision** using the GGUF format. The model retains high-quality performance while significantly reducing memory footprint through advanced mixed-precision quantization techniques.
 
 
 
 
26
 
27
 
 
28
 
29
- ## Model Details
30
 
31
- | Property | Value |
32
 
33
- |----------|-------|
34
 
35
- | Architecture | Qwen35Moe |
36
 
37
- | Context Length | 262,144 tokens |
38
 
39
- | Quantization Scheme | IQ2_M (2-bit) |
40
 
41
- | Format | GGUF (split across 2 files) |
42
 
43
- | Total Parameters | ~35B (with MoE routing) |
44
 
45
- | File Size | ~8.3 GB + ~3.4 GB |
46
 
47
 
48
 
49
- ## IQ2_M Quantization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
- The **IQ2_M** quantization scheme is part of the Intelligent Quants (IQ) family developed for efficient model inference:
52
 
53
  - Mixed precision - different weights receive varying bit allocations based on their sensitivity and importance
54
  -
@@ -60,9 +116,21 @@ The **IQ2_M** quantization scheme is part of the Intelligent Quants (IQ) family
60
  -
61
 
62
 
 
 
 
 
63
  ## Supported Use Cases
64
 
65
- This quantized model is designed for:
 
 
 
 
 
 
 
 
66
 
67
  - Conversational AI applications
68
  -
@@ -74,19 +142,42 @@ This quantized model is designed for:
74
  -
75
 
76
 
 
 
 
 
77
  ## Usage Instructions
78
 
79
- To load this model locally using llama.cpp or compatible inference frameworks. The GGUF files are split into two parts for efficient storage (00001-of-00002 and 00002-of-00002).
 
 
 
 
 
 
 
 
80
 
81
 
82
 
83
  ## Technical Notes
84
 
 
 
 
 
85
  - IQ2_M quantization maintains conversational capability while achieving significant size reduction
86
  -
87
  - Compatible with llama.cpp, LM Studio, Jan, and other local inference frameworks
88
  -
89
  - Uses imatrix-based calibration for optimal quantization quality
 
 
 
 
 
 
 
90
 
91
 
92
 
 
1
  ---
2
 
3
+
4
+
5
+
6
+
7
+ title: IQ2_M - GeekedOut Quantizer
8
+
9
+
10
+
11
+
12
 
13
  tags:
14
 
15
+
16
+
17
+
18
+
19
  - gguf
20
 
21
+
22
+
23
+
24
+
25
  - iq2-m
26
 
 
27
 
28
+
29
+
30
+
31
+ - quantization
32
+
33
+
34
+
35
+
36
+
37
+ - geeked-out
38
+
39
+
40
+
41
+
42
 
43
  license: other
44
 
45
+
46
+
47
+
48
+
49
  ---
50
 
51
 
52
 
 
53
 
54
 
55
 
56
+
57
+ # IQ2_M - GeekedOut Quantizer
58
+
59
+
60
+
61
 
62
 
63
+ GeekedOut Quantizer is a specialized 2-bit quantization tool that implements the IQ2_M (Intelligent Quants) scheme for efficient model compression. This repository showcases IQ2_M quantized models with extreme low-bit precision while preserving critical model capabilities through intelligent weight allocation.
64
 
 
65
 
 
66
 
 
67
 
 
68
 
 
69
 
 
70
 
71
+ ## About GeekedOut Quantizer
72
 
 
73
 
 
74
 
75
 
76
 
77
+ GeekedOut Quantizer is an advanced quantization framework designed to:
78
+
79
+
80
+
81
+
82
+
83
+ - Achieve 2-bit compression using the IQ2_M scheme
84
+ -
85
+ - Maintain high-quality inference performance
86
+ -
87
+ - Support GGUF format for local deployment
88
+ -
89
+ - Optimize memory efficiency through mixed-precision techniques
90
+ -
91
+
92
+
93
+
94
+
95
+
96
+
97
+ ## IQ2_M Quantization Features
98
+
99
+
100
+
101
+
102
+
103
+ The **IQ2_M** (Intelligent Quants) quantization scheme features:
104
+
105
+
106
+
107
 
 
108
 
109
  - Mixed precision - different weights receive varying bit allocations based on their sensitivity and importance
110
  -
 
116
  -
117
 
118
 
119
+
120
+
121
+
122
+
123
  ## Supported Use Cases
124
 
125
+
126
+
127
+
128
+
129
+ GeekedOut Quantizer models are designed for:
130
+
131
+
132
+
133
+
134
 
135
  - Conversational AI applications
136
  -
 
142
  -
143
 
144
 
145
+
146
+
147
+
148
+
149
  ## Usage Instructions
150
 
151
+
152
+
153
+
154
+
155
+ To load IQ2_M quantized models locally using llama.cpp or compatible inference frameworks. The GGUF files are split into two parts for efficient storage (00001-of-00002 and 00002-of-00002).
156
+
157
+
158
+
159
+
160
 
161
 
162
 
163
  ## Technical Notes
164
 
165
+
166
+
167
+
168
+
169
  - IQ2_M quantization maintains conversational capability while achieving significant size reduction
170
  -
171
  - Compatible with llama.cpp, LM Studio, Jan, and other local inference frameworks
172
  -
173
  - Uses imatrix-based calibration for optimal quantization quality
174
+ -
175
+ - Developed by GeekedOut - focused on intelligent quantization methods
176
+
177
+
178
+
179
+
180
+
181
 
182
 
183