Use Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "continuous-lab/FastEdit" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "continuous-lab/FastEdit",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'FastEdit 1.7B
A fine-tuned Qwen2.5-Coder-1.5B-Instruct for merging code edit snippets into source files. Given an original code chunk (~35 lines) and a compact edit snippet with context markers, the model produces the merged result.
This model is designed to be used with the FastEdit toolkit, which handles AST scoping, deterministic edits, and post-processing. Using the model directly requires the exact prompt format described below.
Model variants
All variants are in this repo under subfolders:
| Subfolder | Format | Size | Use case |
|---|---|---|---|
bf16/ |
BF16 safetensors | 3.2 GB | Fine-tuning, reference, GPU serving via vLLM/TGI |
mlx-8bit/ |
MLX 8-bit | 1.7 GB | Apple Silicon (recommended for local use) |
gguf/ |
GGUF Q8_0 | 1.7 GB | llama.cpp, LM Studio, Ollama |
Prompt format
The model expects a specific 2-message chat format. Using a different prompt will produce poor results.
System message
You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated. /no_think
The /no_think suffix disables Qwen's thinking mode โ without it, the model may emit thousands of reasoning tokens before producing output.
User message
Merge all changes from the <update> snippet into the <code> below.
- Preserve the code's structure, order, comments, and indentation exactly.
- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.
- Do not include any additional text, explanations, placeholders, ellipses, or code fences.
<code>{original_code}</code>
<update>{update_snippet}</update>
Provide the complete updated code.
Expected output
The model outputs the merged code wrapped in <updated-code> tags:
<updated-code>
def process(data):
try:
result = transform(data)
return result
except Error as e:
return {"error": str(e)}
</updated-code>
Complete example
Original code (what tree-sitter extracts for the target function):
def process(data):
result = transform(data)
return result
Edit snippet (what the user/agent writes):
def process(data):
try:
# ... existing code ...
except Error as e:
return {"error": str(e)}
Model output:
<updated-code>
def process(data):
try:
result = transform(data)
return result
except Error as e:
return {"error": str(e)}
</updated-code>
The model understands # ... existing code ... markers (and language-specific variants like // ... existing code ...) as instructions to preserve the original lines in that region.
How it fits into FastEdit
In production, the model is the fallback โ not the primary path:
- AST scoping โ tree-sitter finds the target function by name (~35 lines), so the model never sees the whole file
- Deterministic text-match ๏ฟฝ๏ฟฝ 74% of edits are resolved by matching context lines and splicing in new lines (0 tokens, <1ms)
- Model merge โ the remaining 26% of edits (structural changes like wrapping in try/catch, full rewrites) go to this model
The model only ever processes ~35-line chunks. It was trained on function-scoped edits, not whole files. Feeding it large inputs will degrade quality.
Using without FastEdit
If you want to use the model directly (without the toolkit), you need to:
- Scope the input yourself โ extract only the target function/class, not the whole file
- Use the exact prompt format above โ different prompts will produce different (worse) results
- Parse the output โ extract text between
<updated-code>and</updated-code>tags - Handle edge cases โ the model may emit
<think>blocks (strip them), use variant tag names (<update-code>,<updated_code>), or truncate output on long functions
from transformers import AutoModelForCausalLM, AutoTokenizer
# BF16 (GPU / fine-tuning)
model = AutoModelForCausalLM.from_pretrained("continuous-lab/FastEdit", subfolder="bf16", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("continuous-lab/FastEdit", subfolder="bf16")
messages = [
{"role": "system", "content": "You are a coding assistant that helps merge code updates, ensuring every modification is fully integrated. /no_think"},
{"role": "user", "content": """Merge all changes from the <update> snippet into the <code> below.
- Preserve the code's structure, order, comments, and indentation exactly.
- Output only the updated code, enclosed within <updated-code> and </updated-code> tags.
- Do not include any additional text, explanations, placeholders, ellipses, or code fences.
<code>def process(data):
result = transform(data)
return result</code>
<update>def process(data):
try:
# ... existing code ...
except Error as e:
return {"error": str(e)}</update>
Provide the complete updated code."""}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0)
result = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
# Parse: extract text between <updated-code> and </updated-code>
Training
- Base model: Qwen2.5-Coder-1.5B-Instruct
- Task: Code edit merging across 13 languages
Evaluation
Tested on 22 structurally distinct edit patterns (73 cases) across 13 languages:
| Path | Accuracy | Avg tokens | Avg latency |
|---|---|---|---|
| Deterministic (74% of edits) | 100% | 0 | <1ms |
| Model (26% of edits) | 92% | ~40 | ~500ms |
| Combined | ~98% | ~10 | ~130ms |
Per-language model accuracy (156-example benchmark):
| Language | Accuracy |
|---|---|
| Python, Java, Kotlin, C, PHP | 92% |
| JavaScript, TypeScript, Rust, Swift | 85% |
| Go, C++, Ruby | 77% |
Limitations
- Performance degrades on inputs longer than ~100 lines.
- Does not handle whole-file edits well โ use the FastEdit toolkit's AST scoping.
- The edit snippet must use
# ... existing code ...markers (or language-equivalent) for context preservation. Without markers, the model treats the entire snippet as a replacement. - Languages not in the training set may work but are untested.
License
MIT
- Downloads last month
- 34
8-bit
Model tree for continuous-lab/FastEdit
Base model
Qwen/Qwen2.5-1.5B
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "continuous-lab/FastEdit" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "continuous-lab/FastEdit", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'