LGxNDs commited on
Commit
9086a30
·
verified ·
1 Parent(s): 6b42c79

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md CHANGED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ comments: |
4
+
5
+ This model card describes the Qwen3.6 IQ2_M 2-bit quantized model repository.
6
+
7
+ tags:
8
+
9
+ - gguf
10
+
11
+ - iq2-m
12
+
13
+ - qwen
14
+
15
+ - conversational
16
+
17
+ ---
18
+
19
+
20
+
21
+ # Qwen3.6 - IQ2_M 2-Bit Quantized Model
22
+
23
+
24
+
25
+ ## Overview
26
+
27
+ This repository contains a **Qwen3.6** large language model series that has been quantized to **IQ2_M (Intelligent Quants) 2-bit precision** using the GGUF format. The model retains high-quality performance while significantly reducing memory footprint through advanced mixed-precision quantization techniques.
28
+
29
+
30
+
31
+ ## Model Details
32
+
33
+ | Property | Value |
34
+
35
+ |----------|-------|
36
+
37
+ | Architecture | Qwen35Moe |
38
+
39
+ | Context Length | 262,144 tokens |
40
+
41
+ | Quantization Scheme | IQ2_M (2-bit) |
42
+
43
+ | Format | GGUF (split across 2 files) |
44
+
45
+ | Total Parameters | ~35B (with MoE routing) |
46
+
47
+ | File Size | ~8.3 GB + ~3.4 GB |
48
+
49
+
50
+
51
+ ## IQ2_M Quantization
52
+
53
+ The **IQ2_M** quantization scheme is part of the Intelligent Quants (IQ) family developed for efficient model inference:
54
+
55
+ - **Mixed Precision**: Different weights receive varying bit allocations based on their sensitivity and importance
56
+ -
57
+ - **Block-wise Quantization**: Optimized scaling factors applied across weight blocks
58
+ -
59
+ - **2-Bit Compression**: Achieves extreme low-bit precision while preserving critical model capabilities
60
+ -
61
+ - **Smart Allocation**: Critical parameters preserved in higher precision, less important weights packed into minimal bit formats
62
+ -
63
+
64
+
65
+ ## Supported Use Cases
66
+
67
+ This quantized model is designed for:
68
+
69
+ - Conversational AI applications
70
+ -
71
+ - Local inference with llama.cpp, LM Studio, Jan, and similar tools
72
+ -
73
+ - Memory-efficient deployment scenarios
74
+ -
75
+ - Practical everyday use cases requiring reduced memory footprint
76
+ -
77
+
78
+
79
+ ## Usage Instructions
80
+
81
+ To load this model locally:
82
+
83
+ ```bash
84
+
85
+ # Using llama.cpp
86
+
87
+ llama-server --model Qwen3.6-GeekedOutAi-35B-A3B-BF16-IQ2_M.gguf
88
+
89
+
90
+
91
+ # Using LM Studio or compatible tools
92
+
93
+ Load the GGUF files from this repository
94
+
95
+ ```
96
+
97
+
98
+
99
+ ## Technical Notes
100
+
101
+ - The model is split into two parts (00001-of-00002 and 00002-of-00002) for efficient storage
102
+ -
103
+ - IQ2_M quantization maintains conversational capability while achieving significant size reduction
104
+ -
105
+ - Compatible with llama.cpp and local inference frameworks
106
+ -
107
+ - Uses imatrix-based calibration for optimal quantization quality
108
+ -
109
+
110
+
111
+ ## License & Attribution
112
+
113
+ This model card documents a community-contributed quantized version of Qwen3.6 optimized for practical deployment scenarios.
114
+
115
+
116
+
117
+
118
+
119
+
120
+
121
+
122
+
123
+
124
+
125
+