zenpeach commited on
Commit
6eff764
·
verified ·
1 Parent(s): 0643d73

Update readme

Browse files
Files changed (1) hide show
  1. README.md +92 -3
README.md CHANGED
@@ -1,3 +1,92 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Qwen3 GGUF (4_K_M Quantized)
5
+
6
+ This repository hosts GGUF-format quantized versions of Qwen3 models at multiple parameter sizes.
7
+ All models are quantized at 4_K_M, selected to provide a practical balance of inference performance, memory usage, and output quality.
8
+
9
+ These files are intended for use with SciTools’ Understand and Onboard, as well as other tools and runtimes that support the GGUF format (for example, llama.cpp-based applications).
10
+
11
+ ---
12
+
13
+ ## Model Details
14
+
15
+ - Base models: Qwen3 (various parameter sizes)
16
+ - Format: GGUF
17
+ - Quantization: 4_K_M
18
+ - Intended use: Local inference, code understanding, general-purpose chat
19
+ - Languages: Multilingual (as supported by Qwen3)
20
+
21
+ ### Available Variants
22
+
23
+ This repository includes multiple Qwen3 parameter sizes, each quantized independently but consistently using the same 4_K_M scheme. Refer to the file names for exact parameter counts.
24
+
25
+ ---
26
+
27
+ ## Quantization Process
28
+
29
+ - All models are quantized using the 4_K_M quantization method.
30
+ - Quantization was performed directly by the Qwen team where available.
31
+ - In a small number of cases, quantization was performed by Unsloth.
32
+ - No further modifications, rebalancing, or fine-tuning were applied.
33
+ - The quantization parameters and defaults were not altered from the original sources.
34
+
35
+ The goal is to provide faithful, reproducible GGUF variants that behave as closely as possible to their upstream counterparts within the constraints of 4-bit quantization.
36
+
37
+ ---
38
+
39
+ ## What We Did Not Do
40
+
41
+ To be explicit:
42
+
43
+ - No additional fine-tuning
44
+ - No instruction rebalancing
45
+ - No safety, alignment, or prompt modifications
46
+ - No merging or model surgery
47
+
48
+ If a model behaves a certain way, that behavior comes from Qwen3 combined with 4_K_M quantization, not from any downstream changes here.
49
+
50
+ ---
51
+
52
+ ## Intended Use
53
+
54
+ These models are suitable for:
55
+
56
+ - SciTools Understand and SciTools Onboard
57
+ - Local AI workflows
58
+ - Code comprehension and exploration
59
+ - Interactive chat and analysis
60
+ - Integration into developer tools that support GGUF
61
+
62
+ They are not intended for:
63
+
64
+ - Safety-critical or regulated decision-making
65
+ - Use cases requiring guaranteed factual accuracy
66
+ - Production deployment without independent evaluation
67
+
68
+ ---
69
+
70
+ ## Limitations
71
+
72
+ - As 4-bit quantized models, some degradation in reasoning depth and numerical precision is expected compared to full-precision checkpoints.
73
+ - Output quality varies by parameter size and task.
74
+ - Like all large language models, Qwen3 may produce hallucinations or incorrect information.
75
+
76
+ Evaluate carefully for your specific workload.
77
+
78
+ ---
79
+
80
+ ## License & Attribution
81
+
82
+ - Original models: Qwen / Alibaba Cloud
83
+ - Quantization: Qwen and Unsloth
84
+ - Format: GGUF (llama.cpp ecosystem)
85
+
86
+ Please refer to the original Qwen3 license and usage terms. This repository redistributes quantized artifacts only and does not change the underlying licensing conditions.
87
+
88
+ ---
89
+
90
+ ## Acknowledgements
91
+
92
+ Thanks to the Qwen team for releasing Qwen3 models and to Unsloth for high-quality, reproducible quantization tooling that enables efficient local inference across a wide range of tools.