fim_deepseek-coder-6.7b-code-autoCompletion-finetuned
A Fill-in-the-Middle (FIM) fine-tuned version of deepseek-ai/deepseek-coder-6.7b-base, trained with QLoRA to complete missing code segments given both a prefix and a suffix — exactly how modern IDE autocomplete works.
Kaggle notebook: autocompleteion
Model description
Standard language models generate code left-to-right. This model is trained with the Fill-in-the-Middle objective, which teaches it to reason about both sides of a cursor position and generate the code that belongs in between.
Given a prefix (code before the cursor) and a suffix (code after the cursor), the model generates a contextually accurate middle segment — from a single line to a full function body.
The base model, deepseek-coder-6.7b-base, was pre-trained with FIM natively, making it the ideal starting point. Its tokenizer includes native FIM special tokens (<|fim▁begin|>, <|fim▁end|>, <|fim▁hole|>) which this fine-tune fully exploits.
FIM format
This model uses DeepSeek-Coder's native FIM token format:
| Token | String |
|---|---|
FIM_PREFIX |
<|fim▁begin|> |
FIM_SUFFIX |
<|fim▁end|> |
FIM_MIDDLE |
<|fim▁hole|> |
EOS |
<|EOT|> |
A prompt is structured as:
<|fim▁begin|>{prefix}<|fim▁end|>{suffix}<|fim▁hole|>
The model then generates the middle segment and stops at <|EOT|>.
Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_NAME = "AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned"
FIM_PREFIX = "<|fim▁begin|>"
FIM_SUFFIX = "<|fim▁end|>"
FIM_MIDDLE = "<|fim▁hole|>"
EOS_TOKEN = "<|EOT|>"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
def fim_complete(prefix: str, suffix: str = "", max_new_tokens: int = 150, temperature: float = 0.2) -> str:
"""Generate the code segment that fills the gap between prefix and suffix."""
prompt = f"{FIM_PREFIX}{prefix}{FIM_SUFFIX}{suffix}{FIM_MIDDLE}"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=temperature > 0,
temperature=temperature if temperature > 0 else 1.0,
top_p=0.95,
eos_token_id=tokenizer.convert_tokens_to_ids(EOS_TOKEN),
pad_token_id=tokenizer.eos_token_id,
)
generated_ids = outputs[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(generated_ids, skip_special_tokens=True)
Example: complete a binary search
prefix = (
"def binary_search(arr: list, target: int) -> int:\n"
" \"\"\"Return index of target in sorted arr, or -1 if not found.\"\"\"\n"
" left, right = 0, len(arr) - 1\n"
" while left <= right:\n"
" mid = (left + right) // 2\n"
)
suffix = (
" elif arr[mid] < target:\n"
" left = mid + 1\n"
" else:\n"
" right = mid - 1\n"
" return -1\n"
)
print(fim_complete(prefix, suffix))
# → if arr[mid] == target:
# → return mid
Example: complete error handling
prefix = (
"def read_json_file(filepath: str) -> dict:\n"
" \"\"\"Read and parse a JSON file safely.\"\"\"\n"
" try:\n"
" with open(filepath, 'r', encoding='utf-8') as f:\n"
)
suffix = (
" except FileNotFoundError:\n"
" raise FileNotFoundError(f\"File not found: {filepath}\")\n"
)
print(fim_complete(prefix, suffix))
# → return json.load(f)
Training details
Base model
deepseek-ai/deepseek-coder-6.7b-base — chosen specifically because it was pre-trained with FIM objectives and already understands FIM special tokens natively. Using the instruct variant would have degraded FIM performance.
Dataset
FIM training examples were generated from two Python code sources:
| Source | Snippets extracted | FIM examples generated |
|---|---|---|
sahil2801/CodeAlpaca-20k |
~15,000 | ~30,000 |
iamtarun/python_code_instructions_18k_alpaca |
~11,400 | ~11,400 |
| Total (capped) | — | 10,000 |
Each raw code snippet was split into prefix / middle / suffix at a random cut point (20–80% of file length), snapped to the nearest newline, and repeated N_AUGMENTS=2 times per snippet to create variety. Only examples where the middle section was at least 10 characters were kept.
The final 10,000 examples were split 99/1 into train (9,900) and validation (100).
Fine-tuning method: QLoRA via LLaMA-Factory
Training used the pre-training (pt) stage — not the SFT stage — because FIM is a raw completion objective with no instruction template.
| Hyperparameter | Value |
|---|---|
| Framework | LLaMA-Factory 0.9.5 |
| Fine-tuning type | LoRA (QLoRA 4-bit NF4) |
| LoRA rank | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 + double quantization |
| Context length (cutoff_len) | 1024 tokens |
| Batch size per device | 1 |
| Gradient accumulation steps | 8 (effective batch size = 8) |
| Learning rate | 1e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Epochs | 3 |
| Optimizer | AdamW (torch) |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Mixed precision | FP16 |
| Hardware | 2× NVIDIA Tesla T4 (Kaggle) |
| Experiment tracking | Weights & Biases (fim-autocomplete-deepseek-6.7b) |
After training, LoRA adapters were merged into the base model weights using LLaMA-Factory's export pipeline and pushed as a single standalone model.
Intended use
This model is designed for Python code autocompletion tasks where both prefix and suffix context is available:
- IDE plugins that complete mid-function code
- Jupyter / notebook inline suggestions
- Coding assistants with cursor-aware context
- Educational tools that help complete partially written algorithms
Out-of-scope use
- Languages other than Python (performance will degrade significantly)
- Instruction following or chat (use an instruct model instead)
- Production use without human review of generated code
Limitations
- Optimised for Python; other languages are not supported
- Context window is limited to 1024 tokens — very long files may lose coherence
- Generated code should always be reviewed before execution
- The model may generate plausible-looking but incorrect completions for complex algorithmic logic
- Training data was capped at 10,000 examples; broader coverage may improve quality
Citation
If you use this model, please cite the original DeepSeek-Coder work:
@misc{guo2024deepseekcoderlargelanguagemodel,
title={DeepSeek-Coder: When the Large Language Model Meets Programming},
author={Daya Guo et al.},
year={2024},
eprint={2401.14196},
archivePrefix={arXiv}
}
Fine-tuned by AbdoSaad24 · Kaggle notebook: autocompleteion
- Downloads last month
- 14
Model tree for AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned
Base model
deepseek-ai/deepseek-coder-6.7b-base