SecuCoder

SecuCoder is a fine-tuned version of Llama 3.1 8B Instruct trained to generate secure Python code and remediate security vulnerabilities. It is part of a research pipeline that combines supervised fine-tuning (SFT), structured prompting, and retrieval-augmented generation (RAG) to reduce the number of vulnerabilities in automatically generated Python code.


Model Description

Field Details
Base model meta-llama/Llama-3.1-8B-Instruct
Fine-tuning method QLoRA (NF4 4-bit) + LoRA adapters
Training dataset SecuCoder Messages Corpus
Training examples 5,708 (train) + 317 (validation)
Epochs 2
Format Merged safetensors (bfloat16)
Language English
Domain Python secure coding

Intended Use

SecuCoder is designed for:

  • Vulnerability remediation — given a Python snippet with a security flaw, produce a corrected version.
  • Secure code generation — generate Python code from a natural language specification, avoiding common weaknesses.
  • Vulnerability classification — identify whether a Python snippet is secure or vulnerable.

The model has been evaluated against the untuned Llama 3.1 8B Instruct baseline using static analysis tools (Bandit + Semgrep) and shows meaningful improvement in security metrics.

This model is intended for research and educational purposes. It should not be used as the sole security review mechanism in production systems.


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ivitopow/secucoder"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {
        "role": "system",
        "content": "You are a secure Python assistant. Help identify, explain, and fix security issues in Python code. Prefer safe, practical, and production-ready solutions."
    },
    {
        "role": "user",
        "content": "Fix the security vulnerability in this Python code.\n\n```python\nname = request.args.get('name')\nresp = make_response(\"Your name is \" + name)\n```\n\nCWE: CWE-079"
    }
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

output = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.1,
    top_p=0.9,
    do_sample=True,
)

response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Usage with Ollama

A quantized GGUF version (Q4_K_M, ~4.6 GB) is available at ivitopow/secucoder-GGUF:

ollama create secucoder -f Modelfile
ollama run secucoder

Training Details

Method

The model was trained using QLoRA (Quantized Low-Rank Adaptation): the base model is loaded in 4-bit NF4 precision via BitsAndBytes, and low-rank adapters are attached to all projection layers. After training, the adapters are merged back into the base model and saved as standard safetensors.

LoRA Configuration

Parameter Value
Rank (r) 16
Alpha 32
Dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters

Parameter Value
Epochs 2
Learning rate 2e-4
LR scheduler Cosine with 3% warmup
Optimizer paged_adamw_8bit
Gradient checkpointing Enabled
Precision bfloat16 compute, NF4 storage
Sequence length 2048 tokens

Training Data

Trained on the SecuCoder Messages Corpus, a dataset of 6,342 Python security examples in chat format covering:

  • Vulnerability fix (fix) — 4,037 examples across 20+ CWE categories
  • Security conversations (conversation) — 2,210 multi-turn examples
  • Vulnerability classification (classify) — 52 examples
  • Secure code generation (prompt_to_code) — 43 examples

Evaluation

SecuCoder was evaluated as part of a 5-variant ablation study. Each variant adds one technique over the previous one:

Variant Technique Overall Score
llama31_8b Baseline (no fine-tuning) 60.34
secucoder_v1 + SFT (LoRA, FP16) 60.43
secucoder_v1-q4 + Q4_K_M quantization 61.46
secucoder_v1-q4_prompting + Structured security prompt 64.46
secucoder_v1-q4_prompting_rag + RAG (OWASP, CWE, Python docs) 77.11

Overall score = mean sample_score over non-truncated samples (higher is better, max 100). The full SecuCoder system (secucoder_v1-q4_prompting_rag) achieves a +27.8% improvement over the untuned baseline.

Evaluation Methodology

Generated code was scanned with Bandit and Semgrep using weighted severity scores:

penalty = Σ(bandit_high × 2.0 + bandit_medium × 1.25 + bandit_low × 0.75)
        + Σ(semgrep_error × 5.0 + semgrep_warning × 3.0 + semgrep_info × 1.0)

sample_score = 100 / (1 + 8 × penalty_per_loc)

Samples with invalid syntax score 0. Truncated samples are excluded from the overall score.


Related Resources

Resource Link
Training dataset ivitopow/secucoder
GGUF / Ollama version ivitopow/secucoder-GGUF
Base model meta-llama/Llama-3.1-8B-Instruct

Limitations

  • The model only covers Python. It has not been evaluated on other languages.
  • Static analysis tools (Bandit, Semgrep) do not detect all vulnerability types. Logic flaws or runtime-dependent issues may not be caught.
  • The model was fine-tuned on a specific set of CWE categories. It may underperform on vulnerability types not well represented in the training data.
  • As with all generative models, outputs should be reviewed by a developer before use in production.

License

This model is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

It is built on top of Llama 3.1, which is subject to Meta's Llama 3 Community License. Please review both licenses before use.


Citation

@misc{secucoder2025,
  title     = {SecuCoder: Fine-tuning Llama 3.1 8B for Secure Python Code Generation},
  author    = {SecuCoder Project},
  year      = {2025},
  url       = {https://huggingface.co/ivitopow/secucoder},
  note      = {CC-BY-NC-SA-4.0}
}
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ivitopow/SecuCoder

Adapter
(1887)
this model
Adapters
1 model