SecuCoder / README.md
ivitopow's picture
Upload folder using huggingface_hub
bb854a3 verified
---
license: cc-by-nc-sa-4.0
language:
- en
base_model: meta-llama/Llama-3.1-8B-Instruct
tags:
- code
- security
- python
- lora
- qlora
- fine-tuned
- cybersecurity
- secure-coding
- vulnerability
- cwe
- peft
- transformers
task_categories:
- text-generation
task_ids:
- language-modeling
---
# SecuCoder
SecuCoder is a fine-tuned version of [Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) trained to generate secure Python code and remediate security vulnerabilities. It is part of a research pipeline that combines supervised fine-tuning (SFT), structured prompting, and retrieval-augmented generation (RAG) to reduce the number of vulnerabilities in automatically generated Python code.
---
## Model Description
| Field | Details |
|---|---|
| **Base model** | `meta-llama/Llama-3.1-8B-Instruct` |
| **Fine-tuning method** | QLoRA (NF4 4-bit) + LoRA adapters |
| **Training dataset** | [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder) |
| **Training examples** | 5,708 (train) + 317 (validation) |
| **Epochs** | 2 |
| **Format** | Merged safetensors (bfloat16) |
| **Language** | English |
| **Domain** | Python secure coding |
---
## Intended Use
SecuCoder is designed for:
- **Vulnerability remediation** — given a Python snippet with a security flaw, produce a corrected version.
- **Secure code generation** — generate Python code from a natural language specification, avoiding common weaknesses.
- **Vulnerability classification** — identify whether a Python snippet is secure or vulnerable.
The model has been evaluated against the untuned Llama 3.1 8B Instruct baseline using static analysis tools (Bandit + Semgrep) and shows meaningful improvement in security metrics.
> This model is intended for research and educational purposes. It should not be used as the sole security review mechanism in production systems.
---
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ivitopow/secucoder"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "system",
"content": "You are a secure Python assistant. Help identify, explain, and fix security issues in Python code. Prefer safe, practical, and production-ready solutions."
},
{
"role": "user",
"content": "Fix the security vulnerability in this Python code.\n\n```python\nname = request.args.get('name')\nresp = make_response(\"Your name is \" + name)\n```\n\nCWE: CWE-079"
}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
output = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.1,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
```
### Usage with Ollama
A quantized GGUF version (Q4_K_M, ~4.6 GB) is available at [`ivitopow/secucoder-GGUF`](https://huggingface.co/ivitopow/secucoder-GGUF):
```bash
ollama create secucoder -f Modelfile
ollama run secucoder
```
---
## Training Details
### Method
The model was trained using **QLoRA** (Quantized Low-Rank Adaptation): the base model is loaded in 4-bit NF4 precision via BitsAndBytes, and low-rank adapters are attached to all projection layers. After training, the adapters are merged back into the base model and saved as standard safetensors.
### LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (`r`) | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
### Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 2 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine with 3% warmup |
| Optimizer | `paged_adamw_8bit` |
| Gradient checkpointing | Enabled |
| Precision | bfloat16 compute, NF4 storage |
| Sequence length | 2048 tokens |
### Training Data
Trained on the [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder), a dataset of 6,342 Python security examples in chat format covering:
- **Vulnerability fix** (`fix`) — 4,037 examples across 20+ CWE categories
- **Security conversations** (`conversation`) — 2,210 multi-turn examples
- **Vulnerability classification** (`classify`) — 52 examples
- **Secure code generation** (`prompt_to_code`) — 43 examples
---
## Evaluation
SecuCoder was evaluated as part of a 5-variant ablation study. Each variant adds one technique over the previous one:
| Variant | Technique | Overall Score |
|---|---|---|
| `llama31_8b` | Baseline (no fine-tuning) | 60.34 |
| `secucoder_v1` | + SFT (LoRA, FP16) | 60.43 |
| `secucoder_v1-q4` | + Q4_K_M quantization | 61.46 |
| `secucoder_v1-q4_prompting` | + Structured security prompt | 64.46 |
| `secucoder_v1-q4_prompting_rag` | + RAG (OWASP, CWE, Python docs) | **77.11** |
Overall score = mean `sample_score` over non-truncated samples (higher is better, max 100). The full SecuCoder system (`secucoder_v1-q4_prompting_rag`) achieves a **+27.8% improvement** over the untuned baseline.
### Evaluation Methodology
Generated code was scanned with **Bandit** and **Semgrep** using weighted severity scores:
```
penalty = Σ(bandit_high × 2.0 + bandit_medium × 1.25 + bandit_low × 0.75)
+ Σ(semgrep_error × 5.0 + semgrep_warning × 3.0 + semgrep_info × 1.0)
sample_score = 100 / (1 + 8 × penalty_per_loc)
```
Samples with invalid syntax score 0. Truncated samples are excluded from the overall score.
---
## Related Resources
| Resource | Link |
|---|---|
| Training dataset | [ivitopow/secucoder](https://huggingface.co/datasets/ivitopow/secucoder) |
| GGUF / Ollama version | [ivitopow/secucoder-GGUF](https://huggingface.co/ivitopow/secucoder-GGUF) |
| Base model | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) |
---
## Limitations
- The model only covers **Python**. It has not been evaluated on other languages.
- Static analysis tools (Bandit, Semgrep) do not detect all vulnerability types. Logic flaws or runtime-dependent issues may not be caught.
- The model was fine-tuned on a specific set of CWE categories. It may underperform on vulnerability types not well represented in the training data.
- As with all generative models, outputs should be reviewed by a developer before use in production.
---
## License
This model is released under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/) license.
It is built on top of Llama 3.1, which is subject to [Meta's Llama 3 Community License](https://llama.meta.com/llama3/license/). Please review both licenses before use.
---
## Citation
```bibtex
@misc{secucoder2025,
title = {SecuCoder: Fine-tuning Llama 3.1 8B for Secure Python Code Generation},
author = {SecuCoder Project},
year = {2025},
url = {https://huggingface.co/ivitopow/secucoder},
note = {CC-BY-NC-SA-4.0}
}
```