sweep-next-edit-v2-7B
A 7B parameter model that predicts the next edit a developer will make. Given the current file, recent diffs, and cursor position, the model predicts what code block the developer will change next and how.
Usage
pip install transformers torch accelerate
python inference.py
See inference.py for a complete working example.
from inference import build_prompt, generate, FileChunk, DIFF_FORMAT
prompt, code_block, block_start, relative_cursor = build_prompt(
file_path="example.py",
file_contents=edited_contents,
cursor_position=cursor_position,
recent_changes=recent_changes,
retrieval_chunks=[FileChunk("utils.py", "def helper(): ...")],
changes_above_cursor=False,
)
completion = generate(model, tokenizer, prompt, device="cuda")
Prompt format
The model uses <|file_sep|> delimiters and a <|cursor|> marker:
<|file_sep|>{file_path}
{file_contents}
{retrieval_chunks}
{recent_changes_as_diffs}
<|file_sep|>original/{file_path}:{start}:{end}
{code_block_before_last_edit}
<|file_sep|>current/{file_path}:{start}:{end}
{code_block_with_cursor_marker}
<|file_sep|>updated/{file_path}:{start}:{end}
{prefill}
The model completes the updated/ section with the predicted new code block.
- file_path section: ~300 lines of file context around the cursor
- retrieval chunks: Cross-file context from related files
- recent changes: Diffs of recent edits in
original:/updated:format - original/: Code block around cursor before the last edit
- current/: Same block with
<|cursor|>inserted at cursor position - updated/: Model output โ the predicted edited code block
Prefill strategy
The updated/ section is seeded with a prefill to constrain generation:
- Default (
changes_above_cursor=False): Prefill everything up to the cursor line. The model only generates from the cursor line onward. - After insertion (
changes_above_cursor=True): Prefill only the first line + trailing blank lines. Gives the model freedom to rewrite lines between the insertion point and cursor.
Recent changes format
<|file_sep|>{file_path}:{start_line}:{end_line}
original:
{old_code}
updated:
{new_code}
Details
Fine-tuned from Qwen2.5-Coder-7B on developer editing traces using SFT, then GRPO, then DPO.
| Base model | Qwen2.5-Coder-7B |
| Fine-tuning | SFT โ GRPO โ DPO |
| Parameters | 7B |
| Precision | bfloat16 |
| Context length | 32,768 tokens |
| Architecture | Qwen2 (28 layers, hidden dim 3584) |
| Stop tokens | <|endoftext|>, <|file_sep|> |
| Max output tokens | 1024 |
| Decoding | Greedy (temperature=0) |
- Downloads last month
- 40