Instructions to use AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned")
model = AutoModelForCausalLM.from_pretrained("AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned

SGLang

How to use AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned with Docker Model Runner:
```
docker model run hf.co/AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned
```

deepseek-coder-6.7b-code-gen-finetuned

A supervised fine-tuned (SFT) version of deepseek-ai/deepseek-coder-6.7b-instruct trained with QLoRA on a curated blend of high-quality code instruction datasets. The model is optimised for Python code generation — given a natural language instruction, it produces clean, correct, executable code.

Kaggle notebook: code-refining

Model description

This model improves upon the already capable deepseek-coder-6.7b-instruct base by fine-tuning on 10,000 carefully filtered instruction-output pairs drawn from three complementary code datasets. Training used the SFT (supervised fine-tuning) stage with the deepseekcoder chat template, making it a drop-in replacement for the base instruct model with improved instruction-following on coding tasks.

Performance was tracked using the HumanEval benchmark (Pass@1) — the proportion of 164 programming problems where the model's first generated solution passes all hidden test cases.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_NAME = "AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model.eval()

def generate_code(instruction: str, max_new_tokens: int = 512) -> str:
    """Generate Python code from a natural language instruction."""
    messages = [
        {
            "role": "system",
            "content": (
                "You are a Python coding assistant. "
                "Complete the given function. "
                "Return ONLY the complete function code with no explanation, "
                "no markdown, no extra text."
            )
        },
        {
            "role": "user",
            "content": f"Complete this Python function:\n\n{instruction}"
        }
    ]

    formatted = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,  # greedy decoding for reproducibility
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.eos_token_id,
        )

    generated = outputs[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(generated, skip_special_tokens=True)

Example: function completion

prompt = """
from typing import List

def has_close_elements(numbers: List[float], threshold: float) -> bool:
    \"\"\" Check if in given list of numbers, are any two numbers closer to each
    other than given threshold.
    \"\"\"
"""

print(generate_code(prompt))

Example: instruction-driven generation

instruction = "Write a Python function that checks whether a string is a palindrome, ignoring case and spaces."
print(generate_code(instruction))

Evaluation

The model was evaluated on the HumanEval benchmark (164 programming problems), which tests functional correctness by executing generated code against hidden test cases.

Metric	Value
Benchmark	HumanEval
Evaluation strategy	Pass@1 (greedy decoding, `do_sample=False`)
Problems evaluated	20-problem subset (during training run)

Full 164-problem Pass@1 evaluation was set up in the notebook — update this card with the final score after running the complete evaluation.

Training details

Base model

deepseek-ai/deepseek-coder-6.7b-instruct — the instruction-tuned variant of DeepSeek-Coder, chosen for its strong Python baseline and native support for the deepseekcoder chat template.

Dataset

Three code instruction datasets were combined, filtered, shuffled, and capped at 10,000 examples:

Dataset	Description
`m-a-p/CodeFeedback-Filtered-Instruction`	High-quality code instruction-response pairs with feedback filtering
`nickrosh/Evol-Instruct-Code-80k-v1`	80k evolved coding instructions (WizardCoder-style)
`sahil2801/CodeAlpaca-20k`	20k code instruction-output pairs in Alpaca format

All datasets were mapped to a unified Alpaca format (instruction, input, output) and filtered to remove examples with outputs shorter than 50 characters. The combined pool was shuffled with seed=42, capped at 10,000 examples, and split 99/1 into train (9,900) and validation (100).

Fine-tuning method: QLoRA SFT via LLaMA-Factory

Training used the SFT stage with the deepseekcoder chat template, meaning examples are formatted as instruction-response pairs using DeepSeek-Coder's native conversational format.

Hyperparameter	Value
Framework	LLaMA-Factory 0.9.5
Stage	SFT (supervised fine-tuning)
Fine-tuning type	LoRA (QLoRA 4-bit NF4)
Chat template	`deepseekcoder`
LoRA rank	32
LoRA alpha	64
LoRA dropout	0.05
LoRA target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
Quantization	4-bit NF4 + double quantization
Context length (cutoff_len)	1024 tokens
Batch size per device	1
Gradient accumulation steps	16 (effective batch size = 16)
Learning rate	2e-4
LR scheduler	Cosine
Warmup ratio	0.05
Epochs	3
Optimizer	AdamW (torch)
Weight decay	0.01
Max grad norm	1.0
Mixed precision	FP16
Eval strategy	Every 50 steps
Hardware	NVIDIA Tesla T4 × 2 (Kaggle)
Experiment tracking	Weights & Biases (`Generation`)

After training, LoRA adapters were merged into the base model weights using LLaMA-Factory's export pipeline (llamafactory-cli export) and pushed as a single standalone model.

Intended use

This model is designed for Python code generation from natural language instructions:

Completing partially written functions from their docstrings or signatures
Generating utility functions from plain-English descriptions
Coding assistants and IDE integrations
Educational tools for learning Python patterns
Automated code scaffolding in development workflows

Out-of-scope use

Languages other than Python (training data is Python-heavy; other languages may produce lower quality output)
Security-critical code generation without expert review
Generating code for harmful or malicious purposes

Limitations

Context window is limited to 1024 tokens — very long functions or multi-file contexts may be truncated
Training data was capped at 10,000 examples; broader or domain-specific coverage may improve performance on specialised tasks
Generated code should always be reviewed and tested before use in production
The model may produce plausible-looking but incorrect implementations for complex algorithmic problems
Performance on non-Python languages is not guaranteed

Citation

If you use this model, please cite the original DeepSeek-Coder work:

@misc{guo2024deepseekcoderlargelanguagemodel,
  title={DeepSeek-Coder: When the Large Language Model Meets Programming},
  author={Daya Guo et al.},
  year={2024},
  eprint={2401.14196},
  archivePrefix={arXiv}
}

Fine-tuned by AbdoSaad24 · Kaggle notebook: code-refining

Downloads last month: 19

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned

Base model

deepseek-ai/deepseek-coder-6.7b-instruct

Adapter

(391)

this model

Datasets used to train AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned

Paper for AbdoSaad24/deepseek-coder-6.7b-code-gen-finetuned

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 72

Evaluation results

Pass@1 on HumanEval
self-reported

evaluated on 20-problem subset during training