File size: 6,721 Bytes
f686a40 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 | ---
language:
- en
license: mit
library_name: transformers
base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
tags:
- code
- code-editing
- merge
- fastedit
- qwen2
pipeline_tag: text-generation
---
# FastEdit 1.7B
A fine-tuned **Qwen2.5-Coder-1.5B-Instruct** for merging code edit snippets into source files. Given an original code chunk (~35 lines) and a compact edit snippet with context markers, the model produces the merged result.
This model is designed to be used with the [FastEdit](https://github.com/parcadei/fastedit) toolkit, which handles AST scoping, deterministic edits, and post-processing. **Using the model directly requires the exact prompt format described below.**
## Model variants
All variants are in this repo under subfolders:
| Subfolder | Format | Size | Use case |
|-----------|--------|------|----------|
| `bf16/` | BF16 safetensors | 3.2 GB | Fine-tuning, reference, GPU serving via vLLM/TGI |
| `mlx-8bit/` | MLX 8-bit | 1.7 GB | Apple Silicon (recommended for local use) |
| `gguf/` | GGUF Q8_0 | 1.7 GB | llama.cpp, LM Studio, Ollama |
## Prompt format
The model expects a specific 2-message chat format. **Using a different prompt will produce poor results.**
### System message
```
You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated. /no_think
```
The `/no_think` suffix disables Qwen's thinking mode — without it, the model may emit thousands of reasoning tokens before producing output.
### User message
```
Merge all changes from the <update> snippet into the <code> below.
- Preserve the code's structure, order, comments, and indentation exactly.
- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.
- Do not include any additional text, explanations, placeholders, ellipses, or code fences.
<code>{original_code}</code>
<update>{update_snippet}</update>
Provide the complete updated code.
```
### Expected output
The model outputs the merged code wrapped in `<updated-code>` tags:
```
<updated-code>
def process(data):
try:
result = transform(data)
return result
except Error as e:
return {"error": str(e)}
</updated-code>
```
### Complete example
**Original code** (what tree-sitter extracts for the target function):
```python
def process(data):
result = transform(data)
return result
```
**Edit snippet** (what the user/agent writes):
```python
def process(data):
try:
# ... existing code ...
except Error as e:
return {"error": str(e)}
```
**Model output:**
```python
<updated-code>
def process(data):
try:
result = transform(data)
return result
except Error as e:
return {"error": str(e)}
</updated-code>
```
The model understands `# ... existing code ...` markers (and language-specific variants like `// ... existing code ...`) as instructions to preserve the original lines in that region.
## How it fits into FastEdit
In production, the model is the **fallback** — not the primary path:
1. **AST scoping** — tree-sitter finds the target function by name (~35 lines), so the model never sees the whole file
2. **Deterministic text-match** �� 74% of edits are resolved by matching context lines and splicing in new lines (0 tokens, <1ms)
3. **Model merge** — the remaining 26% of edits (structural changes like wrapping in try/catch, full rewrites) go to this model
The model only ever processes ~35-line chunks. It was trained on function-scoped edits, not whole files. Feeding it large inputs will degrade quality.
## Using without FastEdit
If you want to use the model directly (without the toolkit), you need to:
1. **Scope the input yourself** — extract only the target function/class, not the whole file
2. **Use the exact prompt format** above — different prompts will produce different (worse) results
3. **Parse the output** — extract text between `<updated-code>` and `</updated-code>` tags
4. **Handle edge cases** — the model may emit `<think>` blocks (strip them), use variant tag names (`<update-code>`, `<updated_code>`), or truncate output on long functions
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# BF16 (GPU / fine-tuning)
model = AutoModelForCausalLM.from_pretrained("continuous-lab/FastEdit", subfolder="bf16", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("continuous-lab/FastEdit", subfolder="bf16")
messages = [
{"role": "system", "content": "You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated. /no_think"},
{"role": "user", "content": """Merge all changes from the <update> snippet into the <code> below.
- Preserve the code's structure, order, comments, and indentation exactly.
- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.
- Do not include any additional text, explanations, placeholders, ellipses, or code fences.
<code>def process(data):
result = transform(data)
return result</code>
<update>def process(data):
try:
# ... existing code ...
except Error as e:
return {"error": str(e)}</update>
Provide the complete updated code."""}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0)
result = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
# Parse: extract text between <updated-code> and </updated-code>
```
## Training
- **Base model**: Qwen2.5-Coder-1.5B-Instruct
- **Task**: Code edit merging across 13 languages
## Evaluation
Tested on 22 structurally distinct edit patterns (73 cases) across 13 languages:
| Path | Accuracy | Avg tokens | Avg latency |
|------|----------|------------|-------------|
| Deterministic (74% of edits) | 100% | 0 | <1ms |
| Model (26% of edits) | 92% | ~40 | ~500ms |
| **Combined** | **~98%** | **~10** | **~130ms** |
Per-language model accuracy (156-example benchmark):
| Language | Accuracy |
|----------|----------|
| Python, Java, Kotlin, C, PHP | 92% |
| JavaScript, TypeScript, Rust, Swift | 85% |
| Go, C++, Ruby | 77% |
## Limitations
- Performance degrades on inputs longer than ~100 lines.
- Does not handle whole-file edits well — use the FastEdit toolkit's AST scoping.
- The edit snippet must use `# ... existing code ...` markers (or language-equivalent) for context preservation. Without markers, the model treats the entire snippet as a replacement.
- Languages not in the training set may work but are untested.
## License
MIT
|