fim_deepseek-coder-6.7b-code-autoCompletion-finetuned

A Fill-in-the-Middle (FIM) fine-tuned version of deepseek-ai/deepseek-coder-6.7b-base, trained with QLoRA to complete missing code segments given both a prefix and a suffix — exactly how modern IDE autocomplete works.

Kaggle notebook: autocompleteion

Model description

Standard language models generate code left-to-right. This model is trained with the Fill-in-the-Middle objective, which teaches it to reason about both sides of a cursor position and generate the code that belongs in between.

Given a prefix (code before the cursor) and a suffix (code after the cursor), the model generates a contextually accurate middle segment — from a single line to a full function body.

The base model, deepseek-coder-6.7b-base, was pre-trained with FIM natively, making it the ideal starting point. Its tokenizer includes native FIM special tokens (<｜fim▁begin｜>, <｜fim▁end｜>, <｜fim▁hole｜>) which this fine-tune fully exploits.

FIM format

This model uses DeepSeek-Coder's native FIM token format:

Token	String
`FIM_PREFIX`	`<｜fim▁begin｜>`
`FIM_SUFFIX`	`<｜fim▁end｜>`
`FIM_MIDDLE`	`<｜fim▁hole｜>`
`EOS`	`<\|EOT\|>`

A prompt is structured as:

<｜fim▁begin｜>{prefix}<｜fim▁end｜>{suffix}<｜fim▁hole｜>

The model then generates the middle segment and stops at <|EOT|>.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_NAME = "AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned"

FIM_PREFIX = "<｜fim▁begin｜>"
FIM_SUFFIX = "<｜fim▁end｜>"
FIM_MIDDLE = "<｜fim▁hole｜>"
EOS_TOKEN  = "<|EOT|>"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

def fim_complete(prefix: str, suffix: str = "", max_new_tokens: int = 150, temperature: float = 0.2) -> str:
    """Generate the code segment that fills the gap between prefix and suffix."""
    prompt = f"{FIM_PREFIX}{prefix}{FIM_SUFFIX}{suffix}{FIM_MIDDLE}"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=temperature > 0,
            temperature=temperature if temperature > 0 else 1.0,
            top_p=0.95,
            eos_token_id=tokenizer.convert_tokens_to_ids(EOS_TOKEN),
            pad_token_id=tokenizer.eos_token_id,
        )

    generated_ids = outputs[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(generated_ids, skip_special_tokens=True)

Example: complete a binary search

prefix = (
    "def binary_search(arr: list, target: int) -> int:\n"
    "    \"\"\"Return index of target in sorted arr, or -1 if not found.\"\"\"\n"
    "    left, right = 0, len(arr) - 1\n"
    "    while left <= right:\n"
    "        mid = (left + right) // 2\n"
)
suffix = (
    "        elif arr[mid] < target:\n"
    "            left = mid + 1\n"
    "        else:\n"
    "            right = mid - 1\n"
    "    return -1\n"
)

print(fim_complete(prefix, suffix))
# → if arr[mid] == target:
# →     return mid

Example: complete error handling

prefix = (
    "def read_json_file(filepath: str) -> dict:\n"
    "    \"\"\"Read and parse a JSON file safely.\"\"\"\n"
    "    try:\n"
    "        with open(filepath, 'r', encoding='utf-8') as f:\n"
)
suffix = (
    "    except FileNotFoundError:\n"
    "        raise FileNotFoundError(f\"File not found: {filepath}\")\n"
)

print(fim_complete(prefix, suffix))
# → return json.load(f)

Training details

Base model

deepseek-ai/deepseek-coder-6.7b-base — chosen specifically because it was pre-trained with FIM objectives and already understands FIM special tokens natively. Using the instruct variant would have degraded FIM performance.

Dataset

FIM training examples were generated from two Python code sources:

Source	Snippets extracted	FIM examples generated
`sahil2801/CodeAlpaca-20k`	~15,000	~30,000
`iamtarun/python_code_instructions_18k_alpaca`	~11,400	~11,400
Total (capped)	—	10,000

Each raw code snippet was split into prefix / middle / suffix at a random cut point (20–80% of file length), snapped to the nearest newline, and repeated N_AUGMENTS=2 times per snippet to create variety. Only examples where the middle section was at least 10 characters were kept.

The final 10,000 examples were split 99/1 into train (9,900) and validation (100).

Fine-tuning method: QLoRA via LLaMA-Factory

Training used the pre-training (pt) stage — not the SFT stage — because FIM is a raw completion objective with no instruction template.

Hyperparameter	Value
Framework	LLaMA-Factory 0.9.5
Fine-tuning type	LoRA (QLoRA 4-bit NF4)
LoRA rank	64
LoRA alpha	128
LoRA dropout	0.05
LoRA target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
Quantization	4-bit NF4 + double quantization
Context length (cutoff_len)	1024 tokens
Batch size per device	1
Gradient accumulation steps	8 (effective batch size = 8)
Learning rate	1e-4
LR scheduler	Cosine
Warmup ratio	0.05
Epochs	3
Optimizer	AdamW (torch)
Weight decay	0.01
Max grad norm	1.0
Mixed precision	FP16
Hardware	2× NVIDIA Tesla T4 (Kaggle)
Experiment tracking	Weights & Biases (`fim-autocomplete-deepseek-6.7b`)

After training, LoRA adapters were merged into the base model weights using LLaMA-Factory's export pipeline and pushed as a single standalone model.

Intended use

This model is designed for Python code autocompletion tasks where both prefix and suffix context is available:

IDE plugins that complete mid-function code
Jupyter / notebook inline suggestions
Coding assistants with cursor-aware context
Educational tools that help complete partially written algorithms

Out-of-scope use

Languages other than Python (performance will degrade significantly)
Instruction following or chat (use an instruct model instead)
Production use without human review of generated code

Limitations

Optimised for Python; other languages are not supported
Context window is limited to 1024 tokens — very long files may lose coherence
Generated code should always be reviewed before execution
The model may generate plausible-looking but incorrect completions for complex algorithmic logic
Training data was capped at 10,000 examples; broader coverage may improve quality

Citation

If you use this model, please cite the original DeepSeek-Coder work:

@misc{guo2024deepseekcoderlargelanguagemodel,
  title={DeepSeek-Coder: When the Large Language Model Meets Programming},
  author={Daya Guo et al.},
  year={2024},
  eprint={2401.14196},
  archivePrefix={arXiv}
}

Fine-tuned by AbdoSaad24 · Kaggle notebook: autocompleteion

Downloads last month: 14

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned

Base model

deepseek-ai/deepseek-coder-6.7b-base

Adapter

(48)

this model

Datasets used to train AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned

Paper for AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 72