File size: 3,789 Bytes
3ff5c09 5870f28 2481d8a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
---
license: apache-2.0
datasets:
- zkzhang88/OCEData
base_model:
- Qwen/Qwen2.5-Coder-7B
---
# OpenCodeEdit Series Models Quick Start Guide (OpenCodeEdit-Qwen2.5-7B)
**For details, please refer to our [Arxiv](https://arxiv.org/abs/2509.25203)**
We advise you to use the latest version of `transformers`.
Requirements:
```
transformers
torchvision
torchaudio
tensorboard
```
## Model Overview
**OpenCodeEdit-Qwen2.5-7B** has the following features:
- Type: Causal Language Models
- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- Number of Parameters: 7.61B
- Number of Paramaters (Non-Embedding): 6.53B
- Number of Layers: 28
- Number of Attention Heads (GQA): 28 for Q and 4 for KV
- Context Length: Full 131,072 tokens
The following contains a prompt template. Please construct the prompt according to the template.
Prompt Template:
```
System Prompt:
You are a code editor. You will be provided the original code snippet and an instruction that specifies the changes you need to make. You will produce the changed code, based on the original code and the instruction given. Only produce the code, do not include any additional prose.
User Prompt:
## Code Before:
{pre_edit_code}
## Instruction:
{instruction}
## Code After:
```
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
import re
from transformers import AutoModelForCausalLM, AutoTokenizer
def extract_first_python_block(text: str) -> str:
pattern = r"```python\s*(.*?)```"
match = re.search(pattern, text, re.DOTALL)
if match:
return match.group(1).strip()
return ""
model_name ="zkzhang88/OpenCodeEdit-Qwen2.5-7B" #"zkzhang88/OpenCodeEdit-Qwen3-8B" "zkzhang88/OpenCodeEdit-Qwen2.5-7B" "zkzhang88/OpenCodeEdit-DSC-6.7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
pre_edit_code = """
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
"""
SYSTEM_PROMPT = "You are a code editor. You will be provided the original code snippet and an instruction that specifies the changes you need to make. You will produce the changed code, based on the original code and the instruction given. Only produce the code, do not include any additional prose."
instruction = "Optimize the calculation method for the Fibonacci sequence by reducing recursive calls and employing dynamic programming to enhance efficiency."
formatted_input = f"""
## Code Before:
{pre_edit_code}
## Instruction:
{instruction}
## Code After:
"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": formatted_input}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print(extract_first_python_block(content))
```
## Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{zhang2025generatinghighqualitydatasetscode,
title={Generating High-Quality Datasets for Code Editing via Open-Source Language Models},
author={Zekai Zhang and Mingwei Liu and Zhenxi Chen and Linxi Liang and Yuxuan Chen and Guangsheng Ou and Yanlin Wang and Dan Li and Xin Peng and Zibin Zheng},
year={2025},
eprint={2509.25203},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2509.25203},
}
```
|