aivedha commited on
Commit
732520b
·
verified ·
1 Parent(s): 656f5b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +159 -99
README.md CHANGED
@@ -1,187 +1,215 @@
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
- license_link: https://huggingface.co/Qwen/Qwen3-Coder-Next/blob/main/LICENSE
5
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
6
  ---
7
 
8
- # Qwen3-Coder-Next
 
 
9
 
10
- ## Highlights
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- Today, we're announcing **Qwen3-Coder-Next**, an open-weight language model designed specifically for coding agents and local development. It features the following key enhancements:
13
 
14
- - **Super Efficient with Significant Performance**: With only 3B activated parameters (80B total parameters), it achieves performance comparable to models with 10–20x more active parameters, making it highly cost-effective for agent deployment.
15
- - **Advanced Agentic Capabilities**: Through an elaborate training recipe, it excels at long-horizon reasoning, complex tool usage, and recovery from execution failures, ensuring robust performance in dynamic coding tasks.
16
- - **Versatile Integration with Real-World IDE**: Its 256k context length, combined with adaptability to various scaffold templates, enables seamless integration with different CLI/IDE platforms (e.g., Claude Code, Qwen Code, Qoder, Kilo, Trae, Cline, etc.), supporting diverse development environments.
17
 
18
- ![image/jpeg](https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen3-Coder-Next/benchmarks.png)
19
 
20
- ![image/jpeg](https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen3-Coder-Next/swebench_pro.png)
 
 
 
 
21
 
22
  ## Model Overview
23
 
24
- **Qwen3-Coder-Next** has the following features:
25
- - Type: Causal Language Models
26
- - Training Stage: Pretraining & Post-training
27
- - Number of Parameters: 80B in total and 3B activated
28
- - Number of Parameters (Non-Embedding): 79B
29
- - Hidden Dimension: 2048
30
- - Number of Layers: 48
31
- - Hybrid Layout: 12 \* (3 \* (Gated DeltaNet -> MoE) -> 1 \* (Gated Attention -> MoE))
32
- - Gated Attention:
33
- - Number of Attention Heads: 16 for Q and 2 for KV
34
- - Head Dimension: 256
35
- - Rotary Position Embedding Dimension: 64
36
- - Gated DeltaNet:
37
- - Number of Linear Attention Heads: 32 for V and 16 for QK
38
- - Head Dimension: 128
39
- - Mixture of Experts:
40
- - Number of Experts: 512
41
- - Number of Activated Experts: 10
42
- - Number of Shared Experts: 1
43
- - Expert Intermediate Dimension: 512
44
- - Context Length: 262,144 natively
45
-
46
- **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
47
-
48
- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwen.ai/blog?id=qwen3-coder-next), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).
49
 
 
 
 
50
 
51
  ## Quickstart
52
 
53
- We advise you to use the latest version of `transformers`.
54
 
55
- The following contains a code snippet illustrating how to use the model generate content based on given inputs.
56
  ```python
57
  from transformers import AutoModelForCausalLM, AutoTokenizer
58
 
59
- model_name = "Qwen/Qwen3-Coder-Next"
60
 
61
- # load the tokenizer and the model
62
  tokenizer = AutoTokenizer.from_pretrained(model_name)
63
  model = AutoModelForCausalLM.from_pretrained(
64
- model_name,
65
- torch_dtype="auto",
66
- device_map="auto"
67
  )
68
 
69
- # prepare the model input
70
  prompt = "Write a quick sort algorithm."
71
  messages = [
72
- {"role": "user", "content": prompt}
73
  ]
74
  text = tokenizer.apply_chat_template(
75
- messages,
76
- tokenize=False,
77
- add_generation_prompt=True,
78
  )
79
  model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
80
 
81
- # conduct text completion
82
  generated_ids = model.generate(
83
  **model_inputs,
84
  max_new_tokens=65536
85
  )
86
- output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
87
 
88
  content = tokenizer.decode(output_ids, skip_special_tokens=True)
89
-
90
- print("content:", content)
91
  ```
92
 
93
- **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
 
 
94
 
95
- For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
96
 
97
  ## Deployment
98
 
99
- For deployment, you can use the latest `sglang` or `vllm` to create an OpenAI-compatible API endpoint.
100
 
101
  ### SGLang
102
 
103
- [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models and vision language models.
104
- SGLang could be used to launch a server with OpenAI-compatible API service.
105
 
106
- `sglang>=v0.5.8` is required for Qwen3-Coder-Next, which can be installed using:
107
  ```shell
108
  pip install 'sglang[all]>=v0.5.8'
109
  ```
110
- See [its documentation](https://docs.sglang.ai/get_started/install.html) for more details.
111
 
112
- The following command can be used to create an API endpoint at `http://localhost:30000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
 
113
  ```shell
114
- python -m sglang.launch_server --model Qwen/Qwen3-Coder-Next --port 30000 --tp-size 2 --tool-call-parser qwen3_coder
 
 
 
 
115
  ```
116
 
117
- > [!Note]
118
- > The default context length is 256K. Consider reducing the context length to a smaller value, e.g., `32768`, if the server fails to start.
 
119
 
 
120
 
121
  ### vLLM
122
 
123
- [vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs.
124
- vLLM could be used to launch a server with OpenAI-compatible API service.
125
 
126
- `vllm>=0.15.0` is required for Qwen3-Coder-Next, which can be installed using:
127
  ```shell
128
  pip install 'vllm>=0.15.0'
129
  ```
130
- See [its documentation](https://docs.vllm.ai/en/stable/getting_started/installation/index.html) for more details.
131
 
132
- The following command can be used to create an API endpoint at `http://localhost:8000/v1` with maximum context length 256K tokens using tensor parallel on 4 GPUs.
 
133
  ```shell
134
- vllm serve Qwen/Qwen3-Coder-Next --port 8000 --tensor-parallel-size 2 --enable-auto-tool-choice --tool-call-parser qwen3_coder
 
 
 
 
135
  ```
136
 
137
- > [!Note]
138
- > The default context length is 256K. Consider reducing the context length to a smaller value, e.g., `32768`, if the server fails to start.
139
 
 
140
 
141
- ## Agentic Coding
142
 
143
- Qwen3-Coder-Next excels in tool calling capabilities.
 
 
144
 
145
- You can simply define or use any tools as following example.
146
  ```python
147
- # Your tool implementation
148
- def square_the_number(num: float) -> dict:
149
  return num ** 2
150
 
151
- # Define Tools
152
- tools=[
153
  {
154
- "type":"function",
155
- "function":{
156
  "name": "square_the_number",
157
- "description": "output the square of the number.",
158
  "parameters": {
159
  "type": "object",
160
  "required": ["input_num"],
161
  "properties": {
162
- 'input_num': {
163
- 'type': 'number',
164
- 'description': 'input_num is a number that will be squared'
165
- }
166
- },
167
  }
168
  }
169
  }
170
  ]
171
 
172
  from openai import OpenAI
173
- # Define LLM
 
174
  client = OpenAI(
175
- # Use a custom endpoint compatible with OpenAI API
176
- base_url='http://localhost:8000/v1', # api_base
177
  api_key="EMPTY"
178
  )
179
-
180
- messages = [{'role': 'user', 'content': 'square the number 1024'}]
181
 
182
  completion = client.chat.completions.create(
183
  messages=messages,
184
- model="Qwen3-Coder-Next",
185
  max_tokens=65536,
186
  tools=tools,
187
  )
@@ -189,20 +217,52 @@ completion = client.chat.completions.create(
189
  print(completion.choices[0])
190
  ```
191
 
 
 
192
  ## Best Practices
193
 
194
- To achieve optimal performance, we recommend the following sampling parameters: `temperature=1.0`, `top_p=0.95`, `top_k=40`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
 
 
 
 
 
 
 
 
196
 
197
  ## Citation
198
 
199
- If you find our work helpful, feel free to give us a cite.
200
 
201
- ```
202
- @techreport{qwen_qwen3_coder_next_tech_report,
203
- title = {Qwen3-Coder-Next Technical Report},
204
- author = {{Qwen Team}},
205
- url = {https://github.com/QwenLM/Qwen3-Coder/blob/main/qwen3_coder_next_tech_report.pdf},
206
- note = {Accessed: 2026-02-03}
207
- }
208
  ```
 
1
  ---
2
  library_name: transformers
3
  license: apache-2.0
4
+ license_link: https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE
5
  pipeline_tag: text-generation
6
+ base_model: aivedha/aicippy-Coder
7
+ tags:
8
+ - aicippy
9
+ - aivedha
10
+ - aivibe
11
+ - coding-agent
12
+ - code-generation
13
+ - agentic-coding
14
  ---
15
 
16
+ <p align="center">
17
+ <img src="https://aivibe.cloud/assets/aivibe-logo.png" alt="AiVibe Logo" width="180"/>
18
+ </p>
19
 
20
+ <h1 align="center">AiCIPPY-Coder</h1>
21
+
22
+ <p align="center">
23
+ <b>The Agentic Coding Intelligence behind AiCIPPY</b><br/>
24
+ <i>by AiVedha · AiVibe Software Services Private Limited</i>
25
+ </p>
26
+
27
+ <p align="center">
28
+ <a href="https://aicippy.com">aicippy.com</a> ·
29
+ <a href="https://aivedha.ai">aivedha.ai</a> ·
30
+ <a href="https://aivibe.cloud">aivibe.cloud</a> ·
31
+ <a href="https://pypi.org/project/aicippy">PyPI</a>
32
+ </p>
33
 
34
+ ---
35
 
36
+ ## Highlights
 
 
37
 
38
+ We are releasing **AiCIPPY-Coder** — the open-weight coding intelligence model powering the AiCIPPY agent platform. Built for real-world agentic software development, this model is the foundation of AiCIPPY's CLI and IDE-integrated coding workflows.
39
 
40
+ - **Efficient Yet Powerful**: With only 3B activated parameters (80B total), AiCIPPY-Coder delivers performance comparable to models with 10–20x more active parameters — making it highly cost-effective for production agent deployment at scale.
41
+ - **Advanced Agentic Capabilities**: Trained with an elaborate agentic recipe, the model excels at long-horizon reasoning, complex multi-step tool usage, and graceful recovery from execution failures — essential for robust real-world coding tasks.
42
+ - **Seamless IDE and CLI Integration**: A native 256K context window, combined with full adaptability to diverse scaffold templates, enables plug-and-play integration with CLI agents (including AiCIPPY CLI), VS Code extensions, and platforms such as Cline, Kilo, Trae, and others.
43
+
44
+ ---
45
 
46
  ## Model Overview
47
 
48
+ **AiCIPPY-Coder** carries the following architecture:
49
+
50
+ | Property | Value |
51
+ |---|---|
52
+ | Model Type | Causal Language Model |
53
+ | Training Stage | Pretraining & Post-training |
54
+ | Total Parameters | 80B |
55
+ | Activated Parameters | 3B |
56
+ | Non-Embedding Parameters | 79B |
57
+ | Hidden Dimension | 2048 |
58
+ | Number of Layers | 48 |
59
+ | Context Length | 262,144 tokens (native) |
60
+ | Thinking Mode | Non-thinking (no `<think>` blocks) |
61
+
62
+ **Architecture Details:**
63
+ - **Hybrid Layout:** 12 × (3 × Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)
64
+ - **Gated Attention:** 16 heads for Q, 2 for KV, Head Dim 256, RoPE Dim 64
65
+ - **Gated DeltaNet:** 32 heads for V, 16 for QK, Head Dim 128
66
+ - **Mixture of Experts:** 512 total experts, 10 activated, 1 shared, Expert Intermediate Dim 512
 
 
 
 
 
 
67
 
68
+ > **Note:** This model operates in non-thinking mode only. The `<think></think>` output blocks are not generated. Setting `enable_thinking=False` is not required.
69
+
70
+ ---
71
 
72
  ## Quickstart
73
 
74
+ Ensure you are using the latest version of `transformers` before proceeding.
75
 
 
76
  ```python
77
  from transformers import AutoModelForCausalLM, AutoTokenizer
78
 
79
+ model_name = "aivedha/aicippy-Coder"
80
 
81
+ # Load tokenizer and model
82
  tokenizer = AutoTokenizer.from_pretrained(model_name)
83
  model = AutoModelForCausalLM.from_pretrained(
84
+ model_name,
85
+ torch_dtype="auto",
86
+ device_map="auto"
87
  )
88
 
89
+ # Prepare input
90
  prompt = "Write a quick sort algorithm."
91
  messages = [
92
+ {"role": "user", "content": prompt}
93
  ]
94
  text = tokenizer.apply_chat_template(
95
+ messages,
96
+ tokenize=False,
97
+ add_generation_prompt=True,
98
  )
99
  model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
100
 
101
+ # Generate
102
  generated_ids = model.generate(
103
  **model_inputs,
104
  max_new_tokens=65536
105
  )
106
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
107
 
108
  content = tokenizer.decode(output_ids, skip_special_tokens=True)
109
+ print("AiCIPPY-Coder:", content)
 
110
  ```
111
 
112
+ > **Note:** If you encounter out-of-memory (OOM) issues, reduce the context length for example, to `32,768` tokens.
113
+
114
+ For local use, AiCIPPY-Coder is compatible with **Ollama**, **LMStudio**, **MLX-LM**, **llama.cpp**, and **KTransformers**.
115
 
116
+ ---
117
 
118
  ## Deployment
119
 
120
+ AiCIPPY-Coder can be served via `sglang` or `vllm` as an OpenAI-compatible API endpoint — the same interface used by the AiCIPPY production platform.
121
 
122
  ### SGLang
123
 
124
+ [SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language and vision language models.
 
125
 
 
126
  ```shell
127
  pip install 'sglang[all]>=v0.5.8'
128
  ```
 
129
 
130
+ Launch the server with 256K context using tensor parallelism:
131
+
132
  ```shell
133
+ python -m sglang.launch_server \
134
+ --model aivedha/aicippy-Coder \
135
+ --port 30000 \
136
+ --tp-size 2 \
137
+ --tool-call-parser aicippy-coder
138
  ```
139
 
140
+ > **Note:** If the server fails to start, reduce context length with `--context-length 32768`.
141
+
142
+ API endpoint available at: `http://localhost:30000/v1`
143
 
144
+ ---
145
 
146
  ### vLLM
147
 
148
+ [vLLM](https://github.com/vllm-project/vllm) is a high-throughput, memory-efficient inference and serving engine for LLMs.
 
149
 
 
150
  ```shell
151
  pip install 'vllm>=0.15.0'
152
  ```
 
153
 
154
+ Launch with 256K context:
155
+
156
  ```shell
157
+ vllm serve aivedha/aicippy-Coder \
158
+ --port 8000 \
159
+ --tensor-parallel-size 2 \
160
+ --enable-auto-tool-choice \
161
+ --tool-call-parser aicippy-coder
162
  ```
163
 
164
+ > **Note:** Reduce context length to `32768` if startup fails.
 
165
 
166
+ API endpoint available at: `http://localhost:8000/v1`
167
 
168
+ ---
169
 
170
+ ## Agentic Coding with AiCIPPY-Coder
171
+
172
+ AiCIPPY-Coder is purpose-built for tool-calling agentic workflows. Define tools and invoke them directly:
173
 
 
174
  ```python
175
+ # Tool implementation
176
+ def square_the_number(num: float) -> float:
177
  return num ** 2
178
 
179
+ # Tool definition
180
+ tools = [
181
  {
182
+ "type": "function",
183
+ "function": {
184
  "name": "square_the_number",
185
+ "description": "Returns the square of the given number.",
186
  "parameters": {
187
  "type": "object",
188
  "required": ["input_num"],
189
  "properties": {
190
+ "input_num": {
191
+ "type": "number",
192
+ "description": "The number to be squared."
193
+ }
194
+ }
195
  }
196
  }
197
  }
198
  ]
199
 
200
  from openai import OpenAI
201
+
202
+ # Point to your AiCIPPY-Coder local endpoint
203
  client = OpenAI(
204
+ base_url="http://localhost:8000/v1",
 
205
  api_key="EMPTY"
206
  )
207
+
208
+ messages = [{"role": "user", "content": "Square the number 1024"}]
209
 
210
  completion = client.chat.completions.create(
211
  messages=messages,
212
+ model="aivedha/aicippy-Coder",
213
  max_tokens=65536,
214
  tools=tools,
215
  )
 
217
  print(completion.choices[0])
218
  ```
219
 
220
+ ---
221
+
222
  ## Best Practices
223
 
224
+ For optimal generation quality, use the following sampling parameters:
225
+
226
+ | Parameter | Recommended Value |
227
+ |---|---|
228
+ | `temperature` | `1.0` |
229
+ | `top_p` | `0.95` |
230
+ | `top_k` | `40` |
231
+
232
+ ---
233
+
234
+ ## About AiCIPPY
235
+
236
+ **AiCIPPY** is AiVibe's production-grade agentic coding platform — available as a CLI tool on PyPI and deployable on AWS Bedrock. It combines multi-LLM orchestration, persistent memory via DynamoDB, WebSocket streaming, and enterprise SSO via AWS Cognito.
237
+
238
+ - **Platform:** [aicippy.com](https://aicippy.com)
239
+ - **CLI:** `pip install aicippy`
240
+ - **Organisation:** AiVibe Software Services Private Limited, Chennai, India
241
+
242
+ ---
243
+
244
+ ## About AiVedha
245
+
246
+ **AiVedha** (aivedha.ai) is AiVibe's AI-powered cybersecurity audit and compliance platform — available on AWS Marketplace (`prod-kulys2bmix2nm`). AiVedha and AiCIPPY together form the core of AiVibe's enterprise AI product portfolio.
247
+
248
+ ---
249
 
250
+ ## License
251
+
252
+ This model is released under the **Apache 2.0 License**. See [LICENSE](https://huggingface.co/aivedha/aicippy-Coder/blob/main/LICENSE) for full terms.
253
+
254
+ The underlying architecture is derived from Qwen3-Coder-Next (Qwen Team, Alibaba Cloud), used in accordance with its Apache 2.0 license terms.
255
+
256
+ ---
257
 
258
  ## Citation
259
 
260
+ If you use AiCIPPY-Coder in your research or products, please cite:
261
 
262
+ ```bibtex
263
+ @misc{aivibe_aicippy_coder_2026,
264
+ title = {AiCIPPY-Coder: Agentic Coding Intelligence by AiVedha},
265
+ author = {{AiVibe Software Services Private Limited}},
266
+ year = {2026},
267
+ url = {https://huggingface.co/aivedha/aicippy-Coder}}
 
268
  ```