# Qwen3-Coder-30B-A3B-Instruct-RTPurbo ## Model Overview - **Model Optimizations:** - **Sliding Window Attention:** 85% - **Full Attention:** 15% - **Version:** 1.0 screenshot

RTPurbo uses hybrid HeadWise Attention to compress the Qwen3Coder model. Specifically, it divides attention into two parts according to attention type: 1. **Retrieval Heads**: These heads perform **Full Attention** over the entire sequence (or a large chunk), allowing them to capture rich, long-range dependencies and act as a powerful information retrieval component. 2. **non Retrieval Heads**: These heads use **Sink SWA Attention**, processing tokens in a sliding-window or fixed-cache manner. They are highly efficient and ideal for handling very long sequences while maintaining local context. The following code can be used for inference. HeadWise will be triggered in scenarios where SeqLen > 16,384. ```python from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig model_name = "RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo" tokenizer = AutoTokenizer.from_pretrained(model_name) config = AutoConfig.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, config=config, trust_remote_code=True, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "Write a quick sort algorithm." messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=128 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() content = tokenizer.decode(output_ids, skip_special_tokens=True) print("content:", content) ``` ## Evaluation This model was evaluated in the [lm_eval](https://github.com/EleutherAI/lm-evaluation-harness) benchmark using [Qwen3-Coder-30B-A3B-Instruct](https://www.modelscope.cn/models/Qwen/Qwen3-Coder-30B-A3B-Instruct) as evaluator.

Longbench	lcc	repo-p	samsum	trec	lsht	2wikim	hotpot	multi-en	multi-zh	musique	qasper	vcsum	qmsum	PR-en	PR-zh	Avg. (%)
Qwen3-Coder-30B-A3B
Full Attn	34.34	27.14	45.80	81.00	47.50	42.08	57.64	52.89	65.99	38.30	39.25	13.55	23.77	99.00	99.75	51.20
RTPurbo	35.96	35.21	46.49	81.00	49.00	47.39	55.44	52.93	65.23	35.58	39.78	13.80	23.68	99.00	99.75	52.02

## Media Coverage Our work has been featured by **Minds in AI (机器之心)**. Please visit [it](https://mp.weixin.qq.com/s/wFAJ6oG1CsKBJiCBE45BsQ) for more details.