| | --- |
| | license: mit |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen2.5-Coder-32B-Instruct |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | tags: |
| | - code |
| | - chat |
| | - microsoft |
| | - nextcoder |
| | - selekt |
| | datasets: |
| | - microsoft/NextCoderDataset |
| | --- |
| | |
| |
|
| | # NextCoder-32B |
| | <p align="center"> |
| | <a href="https://github.com/microsoft/NextCoder">GitHub</a>   |    <a href="https://arxiv.org/abs/2503.03656">Arxiv</a> |
| | </p> |
| | |
| | > Published in ICML'2025 |
| |
|
| | ## Introduction |
| |
|
| | NextCoder is the latest series of Code-Editing large language models developed using the Qwen2.5-Coder Instruct variants as base and trained with novel Selective Knowledge Transfer finetuning methodology as introduced in the paper. NextCoder family model comes in 3 different sizes 7, 14, 32 billion parameters, to meet the needs of different developers. |
| | Following are the key improvements: |
| | - Significantly improvements in **code editing**, NextCoder-32B has performing on par with GPT-4o on complex benchmarks like Aider-Polyglot with performance increment of 44% from their base model. |
| | - No loss of generalizibility, due to our new finetuning method **SeleKT** |
| | - **Long-context Support** up to 32K tokens. |
| |
|
| | **This repo contains the NextCoder-32B model**, which has the following features: |
| | - Type: Causal Language Models |
| | - Training Stage: Post-training with SeleKT |
| | - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias |
| | - Number of Parameters: 32.5B |
| | - Number of Paramaters (Non-Embedding): 31.0B |
| | - Number of Layers: 64 |
| | - Number of Attention Heads (GQA): 40 for Q and 8 for KV |
| | |
| | For more details, please refer to our [blog](), [GitHub](https://github.com/microsoft/NextCoder), [Arxiv](https://arxiv.org/abs/2503.03656). |
| |
|
| | ## Requirements |
| |
|
| | The code of NextCoder is based on Qwen2.5 base models which has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`. |
| |
|
| | With `transformers<4.37.0`, you will encounter the following error: |
| | ``` |
| | KeyError: 'qwen2' |
| | ``` |
| |
|
| | ## Quickstart |
| |
|
| | Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents. |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model_name = "microsoft/NextCoder-32B" |
| | |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_name, |
| | torch_dtype="auto", |
| | device_map="auto", |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained(model_name) |
| | |
| | prompt = """ |
| | Fix the following function that divides two numbers to handle all the edge cases: |
| | |
| | def divide(a, b) |
| | returm a/b |
| | """ |
| | messages = [ |
| | {"role": "user", "content": prompt} |
| | ] |
| | text = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True |
| | ) |
| | model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| | |
| | generated_ids = model.generate( |
| | **model_inputs, |
| | max_new_tokens=1024 |
| | ) |
| | generated_ids = [ |
| | output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
| | ] |
| | |
| | response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| | ``` |
| | ## Evaluation and Performanc |
| |
|
| | | Models | HUMANEVALEDIT | CANITEDIT | AIDER | POLYGLOT | |
| | |--------|---------------|-----------|-------|----------| |
| | | QwenCoder-2.5-3B | 73.2 | 37.1 | 36.8 | - | |
| | | QwenCoder-2.5-3B-LoRA | 64.6 | 36.2 | 35.8 | - | |
| | | QwenCoder-2.5-3B-SFT | 76.2 | 32.4 | 30.1 | - | |
| | | **NextCoder-3B** | 75.6 | 42.4 | 37.6 | - | |
| | | QwenCoder-2.5-14B | 87.8 | 58.1 | 66.9 | 9.3 | |
| | | QwenCoder-2.5-14B-LoRA | 78.0 | 50.9 | 66.2 | 5.3 | |
| | | QwenCoder-2.5-14B-SFT | 79.9 | 42.4 | 36.8 | 3.1 | |
| | | **NextCoder-14B** | 89.8 | 60.2 | 72.2 | 12.2 | |
| | | QwenCoder-2.5-32B | **90.2** | 61.0 | 72.9 | 16.4 | |
| | | QwenCoder-2.5-32B-LoRA | 82.3 | 52.4 | 60.2 | 6.7 | |
| | | QwenCoder-2.5-32B-SFT | 81.7 | 49.5 | 66.9 | 8.4 | |
| | | **NextCoder-32B** | 88.9 | **62.4** | **74.7** | **23.6** | |
| |
|
| | *Comparison of base QwenCoder-2.5 models of different sizes and their SELEKT-enhanced versions across three code editing benchmarks.* |
| |
|
| | **Detailed evaluation results are reported in this [📑 paper](https://arxiv.org/abs/2503.03656).** |
| |
|
| | ## Citation |
| |
|
| | If you find our work helpful, feel free to give us a cite. |
| |
|
| | ``` |
| | // todo |
| | ``` |