Userb1az commited on
Commit
4450bee
·
verified ·
1 Parent(s): 5782c6a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +157 -12
README.md CHANGED
@@ -1,21 +1,166 @@
1
  ---
2
- quantized_by: bartowski
 
 
3
  pipeline_tag: text-generation
4
- base_model_relation: quantized
5
- base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
6
  ---
7
- ## 💫 Community Model> Qwen3 Coder 30B A3B Instruct by Qwen
8
 
9
- *👾 [LM Studio](https://lmstudio.ai) Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on [Discord](https://discord.gg/aPQfnNkxGC)*.
 
 
 
10
 
11
- **Model creator:** [Qwen](https://huggingface.co/Qwen)<br>
12
- **Original model**: [Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)<br>
13
- **GGUF quantization:** provided by [bartowski](https://huggingface.co/bartowski) based on `llama.cpp` release [b6014](https://github.com/ggerganov/llama.cpp/releases/tag/b6014)<br>
14
 
15
- ## Special thanks
16
 
17
- 🙏 Special thanks to [Georgi Gerganov](https://github.com/ggerganov) and the whole team working on [llama.cpp](https://github.com/ggerganov/llama.cpp/) for making all of this possible.
 
 
18
 
19
- ## Disclaimers
20
 
21
- LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE
5
  pipeline_tag: text-generation
 
 
6
  ---
 
7
 
8
+ # Qwen3-Coder-30B-A3B-Instruct
9
+ <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
10
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
11
+ </a>
12
 
13
+ ## Highlights
 
 
14
 
15
+ **Qwen3-Coder** is available in multiple sizes. Today, we're excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:
16
 
17
+ - **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks.
18
+ - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
19
+ - **Agentic Coding** supporting for most platform such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
20
 
21
+ ![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-30a3-main.jpg)
22
 
23
+ ## Model Overview
24
+
25
+ **Qwen3-Coder-30B-A3B-Instruct** has the following features:
26
+ - Type: Causal Language Models
27
+ - Training Stage: Pretraining & Post-training
28
+ - Number of Parameters: 30.5B in total and 3.3B activated
29
+ - Number of Layers: 48
30
+ - Number of Attention Heads (GQA): 32 for Q and 4 for KV
31
+ - Number of Experts: 128
32
+ - Number of Activated Experts: 8
33
+ - Context Length: **262,144 natively**.
34
+
35
+ **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
36
+
37
+ For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).
38
+
39
+
40
+ ## Quickstart
41
+
42
+ We advise you to use the latest version of `transformers`.
43
+
44
+ With `transformers<4.51.0`, you will encounter the following error:
45
+ ```
46
+ KeyError: 'qwen3_moe'
47
+ ```
48
+
49
+ The following contains a code snippet illustrating how to use the model generate content based on given inputs.
50
+ ```python
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+
53
+ model_name = "Qwen/Qwen3-Coder-30B-A3B-Instruct"
54
+
55
+ # load the tokenizer and the model
56
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
57
+ model = AutoModelForCausalLM.from_pretrained(
58
+ model_name,
59
+ torch_dtype="auto",
60
+ device_map="auto"
61
+ )
62
+
63
+ # prepare the model input
64
+ prompt = "Write a quick sort algorithm."
65
+ messages = [
66
+ {"role": "user", "content": prompt}
67
+ ]
68
+ text = tokenizer.apply_chat_template(
69
+ messages,
70
+ tokenize=False,
71
+ add_generation_prompt=True,
72
+ )
73
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
74
+
75
+ # conduct text completion
76
+ generated_ids = model.generate(
77
+ **model_inputs,
78
+ max_new_tokens=65536
79
+ )
80
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
81
+
82
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
83
+
84
+ print("content:", content)
85
+ ```
86
+
87
+ **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
88
+
89
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
90
+
91
+ ## Agentic Coding
92
+
93
+ Qwen3-Coder excels in tool calling capabilities.
94
+
95
+ You can simply define or use any tools as following example.
96
+ ```python
97
+ # Your tool implementation
98
+ def square_the_number(num: float) -> dict:
99
+ return num ** 2
100
+
101
+ # Define Tools
102
+ tools=[
103
+ {
104
+ "type":"function",
105
+ "function":{
106
+ "name": "square_the_number",
107
+ "description": "output the square of the number.",
108
+ "parameters": {
109
+ "type": "object",
110
+ "required": ["input_num"],
111
+ "properties": {
112
+ 'input_num': {
113
+ 'type': 'number',
114
+ 'description': 'input_num is a number that will be squared'
115
+ }
116
+ },
117
+ }
118
+ }
119
+ }
120
+ ]
121
+
122
+ import OpenAI
123
+ # Define LLM
124
+ client = OpenAI(
125
+ # Use a custom endpoint compatible with OpenAI API
126
+ base_url='http://localhost:8000/v1', # api_base
127
+ api_key="EMPTY"
128
+ )
129
+
130
+ messages = [{'role': 'user', 'content': 'square the number 1024'}]
131
+
132
+ completion = client.chat.completions.create(
133
+ messages=messages,
134
+ model="Qwen3-Coder-30B-A3B-Instruct",
135
+ max_tokens=65536,
136
+ tools=tools,
137
+ )
138
+
139
+ print(completion.choice[0])
140
+ ```
141
+
142
+ ## Best Practices
143
+
144
+ To achieve optimal performance, we recommend the following settings:
145
+
146
+ 1. **Sampling Parameters**:
147
+ - We suggest using `temperature=0.7`, `top_p=0.8`, `top_k=20`, `repetition_penalty=1.05`.
148
+
149
+ 2. **Adequate Output Length**: We recommend using an output length of 65,536 tokens for most queries, which is adequate for instruct models.
150
+
151
+
152
+ ### Citation
153
+
154
+ If you find our work helpful, feel free to give us a cite.
155
+
156
+ ```
157
+ @misc{qwen3technicalreport,
158
+ title={Qwen3 Technical Report},
159
+ author={Qwen Team},
160
+ year={2025},
161
+ eprint={2505.09388},
162
+ archivePrefix={arXiv},
163
+ primaryClass={cs.CL},
164
+ url={https://arxiv.org/abs/2505.09388},
165
+ }
166
+ ```