fim_deepseek-coder-6.7b-code-autoCompletion-finetuned

A Fill-in-the-Middle (FIM) fine-tuned version of deepseek-ai/deepseek-coder-6.7b-base, trained with QLoRA to complete missing code segments given both a prefix and a suffix — exactly how modern IDE autocomplete works.

Kaggle notebook: autocompleteion


Model description

Standard language models generate code left-to-right. This model is trained with the Fill-in-the-Middle objective, which teaches it to reason about both sides of a cursor position and generate the code that belongs in between.

Given a prefix (code before the cursor) and a suffix (code after the cursor), the model generates a contextually accurate middle segment — from a single line to a full function body.

The base model, deepseek-coder-6.7b-base, was pre-trained with FIM natively, making it the ideal starting point. Its tokenizer includes native FIM special tokens (<|fim▁begin|>, <|fim▁end|>, <|fim▁hole|>) which this fine-tune fully exploits.


FIM format

This model uses DeepSeek-Coder's native FIM token format:

Token String
FIM_PREFIX <|fim▁begin|>
FIM_SUFFIX <|fim▁end|>
FIM_MIDDLE <|fim▁hole|>
EOS <|EOT|>

A prompt is structured as:

<|fim▁begin|>{prefix}<|fim▁end|>{suffix}<|fim▁hole|>

The model then generates the middle segment and stops at <|EOT|>.


Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_NAME = "AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned"

FIM_PREFIX = "<|fim▁begin|>"
FIM_SUFFIX = "<|fim▁end|>"
FIM_MIDDLE = "<|fim▁hole|>"
EOS_TOKEN  = "<|EOT|>"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

def fim_complete(prefix: str, suffix: str = "", max_new_tokens: int = 150, temperature: float = 0.2) -> str:
    """Generate the code segment that fills the gap between prefix and suffix."""
    prompt = f"{FIM_PREFIX}{prefix}{FIM_SUFFIX}{suffix}{FIM_MIDDLE}"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=temperature > 0,
            temperature=temperature if temperature > 0 else 1.0,
            top_p=0.95,
            eos_token_id=tokenizer.convert_tokens_to_ids(EOS_TOKEN),
            pad_token_id=tokenizer.eos_token_id,
        )

    generated_ids = outputs[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(generated_ids, skip_special_tokens=True)

Example: complete a binary search

prefix = (
    "def binary_search(arr: list, target: int) -> int:\n"
    "    \"\"\"Return index of target in sorted arr, or -1 if not found.\"\"\"\n"
    "    left, right = 0, len(arr) - 1\n"
    "    while left <= right:\n"
    "        mid = (left + right) // 2\n"
)
suffix = (
    "        elif arr[mid] < target:\n"
    "            left = mid + 1\n"
    "        else:\n"
    "            right = mid - 1\n"
    "    return -1\n"
)

print(fim_complete(prefix, suffix))
# → if arr[mid] == target:
# →     return mid

Example: complete error handling

prefix = (
    "def read_json_file(filepath: str) -> dict:\n"
    "    \"\"\"Read and parse a JSON file safely.\"\"\"\n"
    "    try:\n"
    "        with open(filepath, 'r', encoding='utf-8') as f:\n"
)
suffix = (
    "    except FileNotFoundError:\n"
    "        raise FileNotFoundError(f\"File not found: {filepath}\")\n"
)

print(fim_complete(prefix, suffix))
# → return json.load(f)

Training details

Base model

deepseek-ai/deepseek-coder-6.7b-base — chosen specifically because it was pre-trained with FIM objectives and already understands FIM special tokens natively. Using the instruct variant would have degraded FIM performance.

Dataset

FIM training examples were generated from two Python code sources:

Source Snippets extracted FIM examples generated
sahil2801/CodeAlpaca-20k ~15,000 ~30,000
iamtarun/python_code_instructions_18k_alpaca ~11,400 ~11,400
Total (capped) 10,000

Each raw code snippet was split into prefix / middle / suffix at a random cut point (20–80% of file length), snapped to the nearest newline, and repeated N_AUGMENTS=2 times per snippet to create variety. Only examples where the middle section was at least 10 characters were kept.

The final 10,000 examples were split 99/1 into train (9,900) and validation (100).

Fine-tuning method: QLoRA via LLaMA-Factory

Training used the pre-training (pt) stage — not the SFT stage — because FIM is a raw completion objective with no instruction template.

Hyperparameter Value
Framework LLaMA-Factory 0.9.5
Fine-tuning type LoRA (QLoRA 4-bit NF4)
LoRA rank 64
LoRA alpha 128
LoRA dropout 0.05
LoRA target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit NF4 + double quantization
Context length (cutoff_len) 1024 tokens
Batch size per device 1
Gradient accumulation steps 8 (effective batch size = 8)
Learning rate 1e-4
LR scheduler Cosine
Warmup ratio 0.05
Epochs 3
Optimizer AdamW (torch)
Weight decay 0.01
Max grad norm 1.0
Mixed precision FP16
Hardware 2× NVIDIA Tesla T4 (Kaggle)
Experiment tracking Weights & Biases (fim-autocomplete-deepseek-6.7b)

After training, LoRA adapters were merged into the base model weights using LLaMA-Factory's export pipeline and pushed as a single standalone model.


Intended use

This model is designed for Python code autocompletion tasks where both prefix and suffix context is available:

  • IDE plugins that complete mid-function code
  • Jupyter / notebook inline suggestions
  • Coding assistants with cursor-aware context
  • Educational tools that help complete partially written algorithms

Out-of-scope use

  • Languages other than Python (performance will degrade significantly)
  • Instruction following or chat (use an instruct model instead)
  • Production use without human review of generated code

Limitations

  • Optimised for Python; other languages are not supported
  • Context window is limited to 1024 tokens — very long files may lose coherence
  • Generated code should always be reviewed before execution
  • The model may generate plausible-looking but incorrect completions for complex algorithmic logic
  • Training data was capped at 10,000 examples; broader coverage may improve quality

Citation

If you use this model, please cite the original DeepSeek-Coder work:

@misc{guo2024deepseekcoderlargelanguagemodel,
  title={DeepSeek-Coder: When the Large Language Model Meets Programming},
  author={Daya Guo et al.},
  year={2024},
  eprint={2401.14196},
  archivePrefix={arXiv}
}

Fine-tuned by AbdoSaad24 · Kaggle notebook: autocompleteion

Downloads last month
14
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned

Adapter
(48)
this model

Datasets used to train AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned

Paper for AbdoSaad24/fim_deepseek-coder-6.7b-code-autoCompletion-finetuned