| # Qwen3-Coder-30B-A3B-Instruct-RTPurbo | |
| ## Model Overview | |
| - **Model Optimizations:** | |
| - **Sliding Window Attention:** 85% | |
| - **Full Attention:** 15% | |
| - **Version:** 1.0 | |
| <img src="./headwise.png" alt="screenshot"> | |
| RTPurbo uses hybrid HeadWise Attention to compress the Qwen3Coder model. Specifically, it divides attention into two parts according to attention type: | |
| 1. **Retrieval Heads**: These heads perform **Full Attention** over the entire sequence (or a large chunk), allowing them to capture rich, long-range dependencies and act as a powerful information retrieval component. | |
| 2. **non Retrieval Heads**: These heads use **Sink SWA Attention**, processing tokens in a sliding-window or fixed-cache manner. They are highly efficient and ideal for handling very long sequences while maintaining local context. | |
| The following code can be used for inference. HeadWise will be triggered in scenarios where SeqLen > 16,384. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig | |
| model_name = "RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| config = AutoConfig.from_pretrained(model_name, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| config=config, | |
| trust_remote_code=True, | |
| torch_dtype="auto", | |
| device_map="auto" | |
| ) | |
| # prepare the model input | |
| prompt = "Write a quick sort algorithm." | |
| messages = [ | |
| {"role": "user", "content": prompt} | |
| ] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| ) | |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| # conduct text completion | |
| generated_ids = model.generate( | |
| **model_inputs, | |
| max_new_tokens=128 | |
| ) | |
| output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() | |
| content = tokenizer.decode(output_ids, skip_special_tokens=True) | |
| print("content:", content) | |
| ``` | |
| ## Evaluation | |
| This model was evaluated in the [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) benchmark using [Qwen3-Coder-30B-A3B-Instruct](https://www.modelscope.cn/models/Qwen/Qwen3-Coder-30B-A3B-Instruct) as evaluator. | |
| <table style="border-collapse:collapse; border-top:2px solid #000; border-bottom:2px solid #000;"> | |
| <thead> | |
| <tr style="border-bottom:2px solid #000;"> | |
| <th align="center" style="padding:8px 14px;">Longbench</th> | |
| <th align="center" style="padding:8px 14px;">lcc</th> | |
| <th align="center" style="padding:8px 14px;">repo-p</th> | |
| <th align="center" style="padding:8px 14px;">samsum</th> | |
| <th align="center" style="padding:8px 14px;">trec</th> | |
| <th align="center" style="padding:8px 14px;">lsht</th> | |
| <th align="center" style="padding:8px 14px;">2wikim</th> | |
| <th align="center" style="padding:8px 14px;">hotpot</th> | |
| <th align="center" style="padding:8px 14px;">multi-en</th> | |
| <th align="center" style="padding:8px 14px;">multi-zh</th> | |
| <th align="center" style="padding:8px 14px;">musique</th> | |
| <th align="center" style="padding:8px 14px;">qasper</th> | |
| <th align="center" style="padding:8px 14px;">vcsum</th> | |
| <th align="center" style="padding:8px 14px;">qmsum</th> | |
| <th align="center" style="padding:8px 14px;">PR-en</th> | |
| <th align="center" style="padding:8px 14px;">PR-zh</th> | |
| <th align="center" style="padding:8px 14px;">Avg. (%)</th> | |
| </tr> | |
| <tr style="border-bottom:2px solid #000;"> | |
| <th align="center" colspan="17" style="padding:10px 14px;">Qwen3-Coder-30B-A3B</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr style="border-bottom:2px solid #000;"> | |
| <td align="center" style="padding:8px 14px;"><b>Full Attn</b></td> | |
| <td align="center" style="padding:8px 14px;">34.34</td> | |
| <td align="center" style="padding:8px 14px;">27.14</td> | |
| <td align="center" style="padding:8px 14px;">45.80</td> | |
| <td align="center" style="padding:8px 14px;">81.00</td> | |
| <td align="center" style="padding:8px 14px;">47.50</td> | |
| <td align="center" style="padding:8px 14px;">42.08</td> | |
| <td align="center" style="padding:8px 14px;">57.64</td> | |
| <td align="center" style="padding:8px 14px;">52.89</td> | |
| <td align="center" style="padding:8px 14px;">65.99</td> | |
| <td align="center" style="padding:8px 14px;">38.30</td> | |
| <td align="center" style="padding:8px 14px;">39.25</td> | |
| <td align="center" style="padding:8px 14px;">13.55</td> | |
| <td align="center" style="padding:8px 14px;">23.77</td> | |
| <td align="center" style="padding:8px 14px;">99.00</td> | |
| <td align="center" style="padding:8px 14px;">99.75</td> | |
| <td align="center" style="padding:8px 14px;">51.20</td> | |
| </tr> | |
| <tr style="border-bottom:2px solid #000;"> | |
| <td align="center" style="padding:8px 14px;"><b>RTPurbo</b></td> | |
| <td align="center" style="padding:8px 14px;">35.96</td> | |
| <td align="center" style="padding:8px 14px;">35.21</td> | |
| <td align="center" style="padding:8px 14px;">46.49</td> | |
| <td align="center" style="padding:8px 14px;">81.00</td> | |
| <td align="center" style="padding:8px 14px;">49.00</td> | |
| <td align="center" style="padding:8px 14px;">47.39</td> | |
| <td align="center" style="padding:8px 14px;">55.44</td> | |
| <td align="center" style="padding:8px 14px;">52.93</td> | |
| <td align="center" style="padding:8px 14px;">65.23</td> | |
| <td align="center" style="padding:8px 14px;">35.58</td> | |
| <td align="center" style="padding:8px 14px;">39.78</td> | |
| <td align="center" style="padding:8px 14px;">13.80</td> | |
| <td align="center" style="padding:8px 14px;">23.68</td> | |
| <td align="center" style="padding:8px 14px;">99.00</td> | |
| <td align="center" style="padding:8px 14px;">99.75</td> | |
| <td align="center" style="padding:8px 14px;">52.02</td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| ## Media Coverage | |
| Our work has been featured by **Minds in AI (机器之心)**. Please visit [it](https://mp.weixin.qq.com/s/wFAJ6oG1CsKBJiCBE45BsQ) for more details. |