|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
# Qwen3 GGUF (4_K_M Quantized) |
|
|
|
|
|
This repository hosts GGUF-format quantized versions of Qwen3 models at multiple parameter sizes. |
|
|
All models are quantized at 4_K_M, selected to provide a practical balance of inference performance, memory usage, and output quality. |
|
|
|
|
|
These files are intended for use with SciTools’ Understand and Onboard, as well as other tools and runtimes that support the GGUF format (for example, llama.cpp-based applications). |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- Base models: Qwen3 (various parameter sizes) |
|
|
- Format: GGUF |
|
|
- Quantization: 4_K_M |
|
|
- Intended use: Local inference, code understanding, general-purpose chat |
|
|
- Languages: Multilingual (as supported by Qwen3) |
|
|
|
|
|
### Available Variants |
|
|
|
|
|
This repository includes multiple Qwen3 parameter sizes, each quantized independently but consistently using the same 4_K_M scheme. Refer to the file names for exact parameter counts. |
|
|
|
|
|
--- |
|
|
|
|
|
## Quantization Process |
|
|
|
|
|
- All models are quantized using the 4_K_M quantization method. |
|
|
- Quantization was performed directly by the Qwen team where available. |
|
|
- In a small number of cases, quantization was performed by Unsloth. |
|
|
- No further modifications, rebalancing, or fine-tuning were applied. |
|
|
- The quantization parameters and defaults were not altered from the original sources. |
|
|
|
|
|
The goal is to provide faithful, reproducible GGUF variants that behave as closely as possible to their upstream counterparts within the constraints of 4-bit quantization. |
|
|
|
|
|
--- |
|
|
|
|
|
## What We Did Not Do |
|
|
|
|
|
To be explicit: |
|
|
|
|
|
- No additional fine-tuning |
|
|
- No instruction rebalancing |
|
|
- No safety, alignment, or prompt modifications |
|
|
- No merging or model surgery |
|
|
|
|
|
If a model behaves a certain way, that behavior comes from Qwen3 combined with 4_K_M quantization, not from any downstream changes here. |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
These models are suitable for: |
|
|
|
|
|
- SciTools Understand and SciTools Onboard |
|
|
- Local AI workflows |
|
|
- Code comprehension and exploration |
|
|
- Interactive chat and analysis |
|
|
- Integration into developer tools that support GGUF |
|
|
|
|
|
They are not intended for: |
|
|
|
|
|
- Safety-critical or regulated decision-making |
|
|
- Use cases requiring guaranteed factual accuracy |
|
|
- Production deployment without independent evaluation |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- As 4-bit quantized models, some degradation in reasoning depth and numerical precision is expected compared to full-precision checkpoints. |
|
|
- Output quality varies by parameter size and task. |
|
|
- Like all large language models, Qwen3 may produce hallucinations or incorrect information. |
|
|
|
|
|
Evaluate carefully for your specific workload. |
|
|
|
|
|
--- |
|
|
|
|
|
## License & Attribution |
|
|
|
|
|
- Original models: Qwen / Alibaba Cloud |
|
|
- Quantization: Qwen and Unsloth |
|
|
- Format: GGUF (llama.cpp ecosystem) |
|
|
|
|
|
Please refer to the original Qwen3 license and usage terms. This repository redistributes quantized artifacts only and does not change the underlying licensing conditions. |
|
|
|
|
|
--- |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
Thanks to the Qwen team for releasing Qwen3 models and to Unsloth for high-quality, reproducible quantization tooling that enables efficient local inference across a wide range of tools. |
|
|
|