File size: 6,721 Bytes
f686a40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
language:
  - en
license: mit
library_name: transformers
base_model: Qwen/Qwen2.5-Coder-1.5B-Instruct
tags:
  - code
  - code-editing
  - merge
  - fastedit
  - qwen2
pipeline_tag: text-generation
---

# FastEdit 1.7B

A fine-tuned **Qwen2.5-Coder-1.5B-Instruct** for merging code edit snippets into source files. Given an original code chunk (~35 lines) and a compact edit snippet with context markers, the model produces the merged result.

This model is designed to be used with the [FastEdit](https://github.com/parcadei/fastedit) toolkit, which handles AST scoping, deterministic edits, and post-processing. **Using the model directly requires the exact prompt format described below.**

## Model variants

All variants are in this repo under subfolders:

| Subfolder | Format | Size | Use case |
|-----------|--------|------|----------|
| `bf16/` | BF16 safetensors | 3.2 GB | Fine-tuning, reference, GPU serving via vLLM/TGI |
| `mlx-8bit/` | MLX 8-bit | 1.7 GB | Apple Silicon (recommended for local use) |
| `gguf/` | GGUF Q8_0 | 1.7 GB | llama.cpp, LM Studio, Ollama |

## Prompt format

The model expects a specific 2-message chat format. **Using a different prompt will produce poor results.**

### System message

```
You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated. /no_think
```

The `/no_think` suffix disables Qwen's thinking mode — without it, the model may emit thousands of reasoning tokens before producing output.

### User message

```
Merge all changes from the <update> snippet into the <code> below.
- Preserve the code's structure, order, comments, and indentation exactly.
- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.
- Do not include any additional text, explanations, placeholders, ellipses, or code fences.

<code>{original_code}</code>

<update>{update_snippet}</update>

Provide the complete updated code.
```

### Expected output

The model outputs the merged code wrapped in `<updated-code>` tags:

```
<updated-code>
def process(data):
    try:
        result = transform(data)
        return result
    except Error as e:
        return {"error": str(e)}
</updated-code>
```

### Complete example

**Original code** (what tree-sitter extracts for the target function):

```python
def process(data):
    result = transform(data)
    return result
```

**Edit snippet** (what the user/agent writes):

```python
def process(data):
    try:
        # ... existing code ...
    except Error as e:
        return {"error": str(e)}
```

**Model output:**

```python
<updated-code>
def process(data):
    try:
        result = transform(data)
        return result
    except Error as e:
        return {"error": str(e)}
</updated-code>
```

The model understands `# ... existing code ...` markers (and language-specific variants like `// ... existing code ...`) as instructions to preserve the original lines in that region.

## How it fits into FastEdit

In production, the model is the **fallback** — not the primary path:

1. **AST scoping** — tree-sitter finds the target function by name (~35 lines), so the model never sees the whole file
2. **Deterministic text-match** �� 74% of edits are resolved by matching context lines and splicing in new lines (0 tokens, <1ms)
3. **Model merge** — the remaining 26% of edits (structural changes like wrapping in try/catch, full rewrites) go to this model

The model only ever processes ~35-line chunks. It was trained on function-scoped edits, not whole files. Feeding it large inputs will degrade quality.

## Using without FastEdit

If you want to use the model directly (without the toolkit), you need to:

1. **Scope the input yourself** — extract only the target function/class, not the whole file
2. **Use the exact prompt format** above — different prompts will produce different (worse) results
3. **Parse the output** — extract text between `<updated-code>` and `</updated-code>` tags
4. **Handle edge cases** — the model may emit `<think>` blocks (strip them), use variant tag names (`<update-code>`, `<updated_code>`), or truncate output on long functions

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# BF16 (GPU / fine-tuning)
model = AutoModelForCausalLM.from_pretrained("continuous-lab/FastEdit", subfolder="bf16", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("continuous-lab/FastEdit", subfolder="bf16")

messages = [
    {"role": "system", "content": "You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated. /no_think"},
    {"role": "user", "content": """Merge all changes from the <update> snippet into the <code> below.
- Preserve the code's structure, order, comments, and indentation exactly.
- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.
- Do not include any additional text, explanations, placeholders, ellipses, or code fences.

<code>def process(data):
    result = transform(data)
    return result</code>

<update>def process(data):
    try:
        # ... existing code ...
    except Error as e:
        return {"error": str(e)}</update>

Provide the complete updated code."""}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0)
result = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
# Parse: extract text between <updated-code> and </updated-code>
```

## Training

- **Base model**: Qwen2.5-Coder-1.5B-Instruct
- **Task**: Code edit merging across 13 languages

## Evaluation

Tested on 22 structurally distinct edit patterns (73 cases) across 13 languages:

| Path | Accuracy | Avg tokens | Avg latency |
|------|----------|------------|-------------|
| Deterministic (74% of edits) | 100% | 0 | <1ms |
| Model (26% of edits) | 92% | ~40 | ~500ms |
| **Combined** | **~98%** | **~10** | **~130ms** |

Per-language model accuracy (156-example benchmark):

| Language | Accuracy |
|----------|----------|
| Python, Java, Kotlin, C, PHP | 92% |
| JavaScript, TypeScript, Rust, Swift | 85% |
| Go, C++, Ruby | 77% |

## Limitations

- Performance degrades on inputs longer than ~100 lines.
- Does not handle whole-file edits well — use the FastEdit toolkit's AST scoping.
- The edit snippet must use `# ... existing code ...` markers (or language-equivalent) for context preservation. Without markers, the model treats the entire snippet as a replacement.
- Languages not in the training set may work but are untested.

## License

MIT