README.md · ivitopow/SecuCoder at main

File size: 7,513 Bytes

---

license: cc-by-nc-sa-4.0
language:
  - en
base_model: meta-llama/Llama-3.1-8B-Instruct
tags:
  - code
  - security
  - python
  - lora
  - qlora
  - fine-tuned
  - cybersecurity
  - secure-coding
  - vulnerability
  - cwe
  - peft
  - transformers
task_categories:
  - text-generation
task_ids:
  - language-modeling
---


# SecuCoder

SecuCoder is a fine-tuned version of [Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) trained to generate secure Python code and remediate security vulnerabilities. It is part of a research pipeline that combines supervised fine-tuning (SFT), structured prompting, and retrieval-augmented generation (RAG) to reduce the number of vulnerabilities in automatically generated Python code.

---

## Model Description

| Field | Details |
|---|---|
| **Base model** | `meta-llama/Llama-3.1-8B-Instruct` |
| **Fine-tuning method** | QLoRA (NF4 4-bit) + LoRA adapters |
| **Training dataset** | [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder) |
| **Training examples** | 5,708 (train) + 317 (validation) |
| **Epochs** | 2 |
| **Format** | Merged safetensors (bfloat16) |
| **Language** | English |
| **Domain** | Python secure coding |

---

## Intended Use

SecuCoder is designed for:

- **Vulnerability remediation** — given a Python snippet with a security flaw, produce a corrected version.
- **Secure code generation** — generate Python code from a natural language specification, avoiding common weaknesses.
- **Vulnerability classification** — identify whether a Python snippet is secure or vulnerable.

The model has been evaluated against the untuned Llama 3.1 8B Instruct baseline using static analysis tools (Bandit + Semgrep) and shows meaningful improvement in security metrics.

> This model is intended for research and educational purposes. It should not be used as the sole security review mechanism in production systems.

---

## Usage

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

import torch



model_id = "ivitopow/secucoder"



tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(

    model_id,

    torch_dtype=torch.bfloat16,

    device_map="auto",

)



messages = [

    {

        "role": "system",

        "content": "You are a secure Python assistant. Help identify, explain, and fix security issues in Python code. Prefer safe, practical, and production-ready solutions."

    },

    {

        "role": "user",

        "content": "Fix the security vulnerability in this Python code.\n\n```python\nname = request.args.get('name')\nresp = make_response(\"Your name is \" + name)\n```\n\nCWE: CWE-079"

    }

]



input_ids = tokenizer.apply_chat_template(

    messages,

    tokenize=True,

    add_generation_prompt=True,

    return_tensors="pt"

).to(model.device)



output = model.generate(

    input_ids,

    max_new_tokens=512,

    temperature=0.1,

    top_p=0.9,

    do_sample=True,

)



response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)

print(response)

```

### Usage with Ollama

A quantized GGUF version (Q4_K_M, ~4.6 GB) is available at [`ivitopow/secucoder-GGUF`](https://huggingface.co/ivitopow/secucoder-GGUF):

```bash

ollama create secucoder -f Modelfile

ollama run secucoder

```

---

## Training Details

### Method

The model was trained using **QLoRA** (Quantized Low-Rank Adaptation): the base model is loaded in 4-bit NF4 precision via BitsAndBytes, and low-rank adapters are attached to all projection layers. After training, the adapters are merged back into the base model and saved as standard safetensors.

### LoRA Configuration

| Parameter | Value |
|---|---|
| Rank (`r`) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |

### Training Hyperparameters

| Parameter | Value |
|---|---|
| Epochs | 2 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine with 3% warmup |
| Optimizer | `paged_adamw_8bit` |
| Gradient checkpointing | Enabled |
| Precision | bfloat16 compute, NF4 storage |
| Sequence length | 2048 tokens |

### Training Data

Trained on the [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder), a dataset of 6,342 Python security examples in chat format covering:

- **Vulnerability fix** (`fix`) — 4,037 examples across 20+ CWE categories
- **Security conversations** (`conversation`) — 2,210 multi-turn examples
- **Vulnerability classification** (`classify`) — 52 examples
- **Secure code generation** (`prompt_to_code`) — 43 examples

---

## Evaluation

SecuCoder was evaluated as part of a 5-variant ablation study. Each variant adds one technique over the previous one:

| Variant | Technique | Overall Score |
|---|---|---|
| `llama31_8b` | Baseline (no fine-tuning) | 60.34 |
| `secucoder_v1` | + SFT (LoRA, FP16) | 60.43 |
| `secucoder_v1-q4` | + Q4_K_M quantization | 61.46 |
| `secucoder_v1-q4_prompting` | + Structured security prompt | 64.46 |
| `secucoder_v1-q4_prompting_rag` | + RAG (OWASP, CWE, Python docs) | **77.11** |

Overall score = mean `sample_score` over non-truncated samples (higher is better, max 100). The full SecuCoder system (`secucoder_v1-q4_prompting_rag`) achieves a **+27.8% improvement** over the untuned baseline.

### Evaluation Methodology

Generated code was scanned with **Bandit** and **Semgrep** using weighted severity scores:

```

penalty = Σ(bandit_high × 2.0 + bandit_medium × 1.25 + bandit_low × 0.75)

        + Σ(semgrep_error × 5.0 + semgrep_warning × 3.0 + semgrep_info × 1.0)



sample_score = 100 / (1 + 8 × penalty_per_loc)

```

Samples with invalid syntax score 0. Truncated samples are excluded from the overall score.

---

## Related Resources

| Resource | Link |
|---|---|
| Training dataset | [ivitopow/secucoder](https://huggingface.co/datasets/ivitopow/secucoder) |
| GGUF / Ollama version | [ivitopow/secucoder-GGUF](https://huggingface.co/ivitopow/secucoder-GGUF) |
| Base model | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |

---

## Limitations

- The model only covers **Python**. It has not been evaluated on other languages.
- Static analysis tools (Bandit, Semgrep) do not detect all vulnerability types. Logic flaws or runtime-dependent issues may not be caught.
- The model was fine-tuned on a specific set of CWE categories. It may underperform on vulnerability types not well represented in the training data.
- As with all generative models, outputs should be reviewed by a developer before use in production.

---

## License

This model is released under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/) license.

It is built on top of Llama 3.1, which is subject to [Meta's Llama 3 Community License](https://llama.meta.com/llama3/license/). Please review both licenses before use.

---

## Citation

```bibtex

@misc{secucoder2025,

  title     = {SecuCoder: Fine-tuning Llama 3.1 8B for Secure Python Code Generation},

  author    = {SecuCoder Project},

  year      = {2025},

  url       = {https://huggingface.co/ivitopow/secucoder},

  note      = {CC-BY-NC-SA-4.0}

}

```