deepseek-coder-6.7b-code-gen-finetuned
A supervised fine-tuned (SFT) version of deepseek-ai/deepseek-coder-6.7b-instruct trained with QLoRA on a curated blend of high-quality code instruction datasets. The model is optimised for Python code generation — given a natural language instruction, it produces clean, correct, executable code.
Kaggle notebook: code-refining
Model description
This model improves upon the already capable deepseek-coder-6.7b-instruct base by fine-tuning on 10,000 carefully filtered instruction-output pairs drawn from three complementary code datasets. Training used the SFT (supervised fine-tuning) stage with the deepseekcoder chat template, making it a drop-in replacement for the base instruct model with improved instruction-following on coding tasks.
Performance was tracked using the HumanEval benchmark (Pass@1) — the proportion of 164 programming problems where the model's first generated solution passes all hidden test cases.
Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_NAME = "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
model.eval()
def generate_code(instruction: str, max_new_tokens: int = 512) -> str:
"""Generate Python code from a natural language instruction."""
messages = [
{
"role": "system",
"content": (
"You are a Python coding assistant. "
"Complete the given function. "
"Return ONLY the complete function code with no explanation, "
"no markdown, no extra text."
)
},
{
"role": "user",
"content": f"Complete this Python function:\n\n{instruction}"
}
]
formatted = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False, # greedy decoding for reproducibility
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
)
generated = outputs[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(generated, skip_special_tokens=True)
Example: function completion
prompt = """
from typing import List
def has_close_elements(numbers: List[float], threshold: float) -> bool:
\"\"\" Check if in given list of numbers, are any two numbers closer to each
other than given threshold.
\"\"\"
"""
print(generate_code(prompt))
Example: instruction-driven generation
instruction = "Write a Python function that checks whether a string is a palindrome, ignoring case and spaces."
print(generate_code(instruction))
Evaluation
The model was evaluated on the HumanEval benchmark (164 programming problems), which tests functional correctness by executing generated code against hidden test cases.
| Metric | Value |
|---|---|
| Benchmark | HumanEval |
| Evaluation strategy | Pass@1 (greedy decoding, do_sample=False) |
| Problems evaluated | 20-problem subset (during training run) |
Full 164-problem Pass@1 evaluation was set up in the notebook — update this card with the final score after running the complete evaluation.
Training details
Base model
deepseek-ai/deepseek-coder-6.7b-instruct — the instruction-tuned variant of DeepSeek-Coder, chosen for its strong Python baseline and native support for the deepseekcoder chat template.
Dataset
Three code instruction datasets were combined, filtered, shuffled, and capped at 10,000 examples:
| Dataset | Description |
|---|---|
m-a-p/CodeFeedback-Filtered-Instruction |
High-quality code instruction-response pairs with feedback filtering |
nickrosh/Evol-Instruct-Code-80k-v1 |
80k evolved coding instructions (WizardCoder-style) |
sahil2801/CodeAlpaca-20k |
20k code instruction-output pairs in Alpaca format |
All datasets were mapped to a unified Alpaca format (instruction, input, output) and filtered to remove examples with outputs shorter than 50 characters. The combined pool was shuffled with seed=42, capped at 10,000 examples, and split 99/1 into train (9,900) and validation (100).
Fine-tuning method: QLoRA SFT via LLaMA-Factory
Training used the SFT stage with the deepseekcoder chat template, meaning examples are formatted as instruction-response pairs using DeepSeek-Coder's native conversational format.
| Hyperparameter | Value |
|---|---|
| Framework | LLaMA-Factory 0.9.5 |
| Stage | SFT (supervised fine-tuning) |
| Fine-tuning type | LoRA (QLoRA 4-bit NF4) |
| Chat template | deepseekcoder |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 + double quantization |
| Context length (cutoff_len) | 1024 tokens |
| Batch size per device | 1 |
| Gradient accumulation steps | 16 (effective batch size = 16) |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.05 |
| Epochs | 3 |
| Optimizer | AdamW (torch) |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Mixed precision | FP16 |
| Eval strategy | Every 50 steps |
| Hardware | NVIDIA Tesla T4 × 2 (Kaggle) |
| Experiment tracking | Weights & Biases (Generation) |
After training, LoRA adapters were merged into the base model weights using LLaMA-Factory's export pipeline (llamafactory-cli export) and pushed as a single standalone model.
Intended use
This model is designed for Python code generation from natural language instructions:
- Completing partially written functions from their docstrings or signatures
- Generating utility functions from plain-English descriptions
- Coding assistants and IDE integrations
- Educational tools for learning Python patterns
- Automated code scaffolding in development workflows
Out-of-scope use
- Languages other than Python (training data is Python-heavy; other languages may produce lower quality output)
- Security-critical code generation without expert review
- Generating code for harmful or malicious purposes
Limitations
- Context window is limited to 1024 tokens — very long functions or multi-file contexts may be truncated
- Training data was capped at 10,000 examples; broader or domain-specific coverage may improve performance on specialised tasks
- Generated code should always be reviewed and tested before use in production
- The model may produce plausible-looking but incorrect implementations for complex algorithmic problems
- Performance on non-Python languages is not guaranteed
Citation
If you use this model, please cite the original DeepSeek-Coder work:
@misc{guo2024deepseekcoderlargelanguagemodel,
title={DeepSeek-Coder: When the Large Language Model Meets Programming},
author={Daya Guo et al.},
year={2024},
eprint={2401.14196},
archivePrefix={arXiv}
}
Fine-tuned by AbdoSaad24 · Kaggle notebook: code-refining
- Downloads last month
- 19
Model tree for AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned
Base model
deepseek-ai/deepseek-coder-6.7b-instructDatasets used to train AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned
Paper for AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned
Evaluation results
- Pass@1 on HumanEvalself-reportedevaluated on 20-problem subset during training