littlebird13 commited on
Commit
b6786cb
·
verified ·
1 Parent(s): 662f4c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ license_link: https://huggingface.co/Qwen/Qwen3-Coder-Next-Base/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ ---
7
+
8
+ # Qwen3-Coder-Next-Base
9
+ <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
10
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
11
+ </a>
12
+
13
+ ## Highlights
14
+
15
+ Today, we're announcing **Qwen3-Coder-Next-Base**, an open-weight language model designed specifically for coding agents and local development. It features the following key enhancements:
16
+
17
+ - **Advanced architecture**: It integrates the Hybrid Attention with highly sparse MoE, enabling high throughput and strong ultra-long-context modeling.
18
+
19
+ - **Robust data foundation**: Trained on highly diverse, broad-coverage corpora, with native 256K context and support for 370+ languages, it leaves ample headroom for post-training.
20
+
21
+ - **Agentic coding capability**: With a carefully designed training recipe, it has strong capabilities in tool calling, scaffold/template adaptation, and error detection/recovery, making it a strong backbone for reliable coding agents.
22
+
23
+ ## Model Overview
24
+
25
+ **Qwen3-Coder-Next-Base** has the following features:
26
+ - Type: Causal Language Models
27
+ - Training Stage: Pretraining
28
+ - Number of Parameters: 80B in total and 3B activated
29
+ - Number of Parameters (Non-Embedding): 79B
30
+ - Hidden Dimension: 2048
31
+ - Number of Layers: 48
32
+ - Hybrid Layout: 12 \* (3 \* (Gated DeltaNet -> MoE) -> 1 \* (Gated Attention -> MoE))
33
+ - Gated Attention:
34
+ - Number of Attention Heads: 16 for Q and 2 for KV
35
+ - Head Dimension: 256
36
+ - Rotary Position Embedding Dimension: 64
37
+ - Gated DeltaNet:
38
+ - Number of Linear Attention Heads: 32 for V and 16 for QK
39
+ - Head Dimension: 128
40
+ - Mixture of Experts:
41
+ - Number of Experts: 512
42
+ - Number of Activated Experts: 10
43
+ - Number of Shared Experts: 1
44
+ - Expert Intermediate Dimension: 512
45
+ - Context Length: 262,144 natively
46
+
47
+ **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
48
+
49
+ For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-coder-next/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).
50
+
51
+ ## Best Practices
52
+
53
+ To achieve optimal performance, we recommend the following sampling parameters: `temperature=1.0`, `top_p=0.95`, `top_k=40`.
54
+
55
+
56
+ ## Citation
57
+
58
+ If you find our work helpful, feel free to give us a cite.
59
+
60
+ ```
61
+ @misc{qwen3codernexttechnicalreport,
62
+ title={Qwen3-Coder-Next Technical Report},
63
+ author={Qwen Team},
64
+ year={2026},
65
+ eprint={},
66
+ archivePrefix={arXiv},
67
+ primaryClass={cs.CL},
68
+ url={https://arxiv.org/abs/},
69
+ }
70
+ ```