Zoed commited on
Commit
8deec80
·
verified ·
1 Parent(s): f96f130

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +90 -3
README.md CHANGED
@@ -1,3 +1,90 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
4
+ base_model_relation: quantized
5
+ language:
6
+ - en
7
+ tags:
8
+ - qwen3
9
+ - qwen3-coder
10
+ - code
11
+ - gguf
12
+ - quantized
13
+ - q4_k_m
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # Qwen3-Coder-30B-A3B-Instruct · Q4_K_M GGUF
18
+
19
+ This is a **Q4_K_M GGUF quantization** of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct), produced from the f16 base.
20
+
21
+ | Property | Value |
22
+ |---|---|
23
+ | Base model | Qwen/Qwen3-Coder-30B-A3B-Instruct |
24
+ | Quantization | Q4_K_M |
25
+ | Format | GGUF |
26
+ | Parameters | 30B (MoE, ~3B active) |
27
+
28
+ ## About the base model
29
+
30
+ Qwen3-Coder-30B-A3B-Instruct is a Mixture-of-Experts (MoE) code-focused instruction model developed by [Qwen Team, Alibaba Cloud](https://qwenlm.github.io/). It features 30B total parameters with ~3B active parameters per token.
31
+
32
+ For full details, see the [original model page](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct).
33
+
34
+ ## Usage
35
+
36
+ ### llama.cpp
37
+
38
+ ```bash
39
+ llama-cli \
40
+ -m Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf \
41
+ --chat-template qwen3 \
42
+ -p "Write a Python function that sorts a list of dictionaries by a given key." \
43
+ -n 512
44
+ ```
45
+
46
+ ### llama-server
47
+
48
+ ```bash
49
+ llama-server \
50
+ -m Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf \
51
+ --chat-template qwen3 \
52
+ --port 8080
53
+ ```
54
+
55
+ ### Ollama (via Modelfile)
56
+
57
+ ```
58
+ FROM ./Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf
59
+ PARAMETER num_ctx 32768
60
+ TEMPLATE "{{ ... }}" # use Qwen3 chat template
61
+ ```
62
+
63
+ ## Quantization details
64
+
65
+ | File | Quant | Size (approx.) |
66
+ |---|---|---|
67
+ | `Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf` | Q4_K_M | ~17 GB |
68
+
69
+ **Q4_K_M** uses 4-bit quantization with K-quant method on most layers, providing a good balance between size and quality.
70
+
71
+ ## License
72
+
73
+ This quantized model is derived from [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) and is released under the same [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
74
+
75
+ Per Qwen's terms, appropriate credit is given to the original authors:
76
+
77
+ > Qwen3-Coder-30B-A3B-Instruct is developed by Qwen Team, Alibaba Cloud.
78
+ > Original model: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
79
+
80
+ ## Citation
81
+
82
+ ```bibtex
83
+ @misc{qwen3coder,
84
+ title = {Qwen3-Coder},
85
+ author = {Qwen Team},
86
+ year = {2025},
87
+ organization = {Alibaba Cloud},
88
+ url = {https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct}
89
+ }
90
+ ```