README.md · ivitopow/SecuCoder at main

SecuCoder / README.md

ivitopow

Upload folder using huggingface_hub

bb854a3 verified 3 days ago

preview code

raw

history blame contribute delete

7.51 kB

	---
	license: cc-by-nc-sa-4.0
	language:
	- en
	base_model: meta-llama/Llama-3.1-8B-Instruct
	tags:
	- code
	- security
	- python
	- lora
	- qlora
	- fine-tuned
	- cybersecurity
	- secure-coding
	- vulnerability
	- cwe
	- peft
	- transformers
	task_categories:
	- text-generation
	task_ids:
	- language-modeling
	---

	# SecuCoder

	SecuCoder is a fine-tuned version of [Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) trained to generate secure Python code and remediate security vulnerabilities. It is part of a research pipeline that combines supervised fine-tuning (SFT), structured prompting, and retrieval-augmented generation (RAG) to reduce the number of vulnerabilities in automatically generated Python code.

	---

	## Model Description

	\| Field \| Details \|
	\|---\|---\|
	\| Base model \| `meta-llama/Llama-3.1-8B-Instruct` \|
	\| Fine-tuning method \| QLoRA (NF4 4-bit) + LoRA adapters \|
	\| Training dataset \| [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder) \|
	\| Training examples \| 5,708 (train) + 317 (validation) \|
	\| Epochs \| 2 \|
	\| Format \| Merged safetensors (bfloat16) \|
	\| Language \| English \|
	\| Domain \| Python secure coding \|

	---

	## Intended Use

	SecuCoder is designed for:

	- Vulnerability remediation — given a Python snippet with a security flaw, produce a corrected version.
	- Secure code generation — generate Python code from a natural language specification, avoiding common weaknesses.
	- Vulnerability classification — identify whether a Python snippet is secure or vulnerable.

	The model has been evaluated against the untuned Llama 3.1 8B Instruct baseline using static analysis tools (Bandit + Semgrep) and shows meaningful improvement in security metrics.

	> This model is intended for research and educational purposes. It should not be used as the sole security review mechanism in production systems.

	---

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "ivitopow/secucoder"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{
	"role": "system",
	"content": "You are a secure Python assistant. Help identify, explain, and fix security issues in Python code. Prefer safe, practical, and production-ready solutions."
	},
	{
	"role": "user",
	"content": "Fix the security vulnerability in this Python code.\n\n```python\nname = request.args.get('name')\nresp = make_response(\"Your name is \" + name)\n```\n\nCWE: CWE-079"
	}
	]

	input_ids = tokenizer.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=True,
	return_tensors="pt"
	).to(model.device)

	output = model.generate(
	input_ids,
	max_new_tokens=512,
	temperature=0.1,
	top_p=0.9,
	do_sample=True,
	)

	response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)
	print(response)
	```

	### Usage with Ollama

	A quantized GGUF version (Q4_K_M, ~4.6 GB) is available at [`ivitopow/secucoder-GGUF`](https://huggingface.co/ivitopow/secucoder-GGUF):

	```bash
	ollama create secucoder -f Modelfile
	ollama run secucoder
	```

	---

	## Training Details

	### Method

	The model was trained using QLoRA (Quantized Low-Rank Adaptation): the base model is loaded in 4-bit NF4 precision via BitsAndBytes, and low-rank adapters are attached to all projection layers. After training, the adapters are merged back into the base model and saved as standard safetensors.

	### LoRA Configuration

	\| Parameter \| Value \|
	\|---\|---\|
	\| Rank (`r`) \| 16 \|
	\| Alpha \| 32 \|
	\| Dropout \| 0.05 \|
	\| Target modules \| `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` \|

	### Training Hyperparameters

	\| Parameter \| Value \|
	\|---\|---\|
	\| Epochs \| 2 \|
	\| Learning rate \| 2e-4 \|
	\| LR scheduler \| Cosine with 3% warmup \|
	\| Optimizer \| `paged_adamw_8bit` \|
	\| Gradient checkpointing \| Enabled \|
	\| Precision \| bfloat16 compute, NF4 storage \|
	\| Sequence length \| 2048 tokens \|

	### Training Data

	Trained on the [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder), a dataset of 6,342 Python security examples in chat format covering:

	- Vulnerability fix (`fix`) — 4,037 examples across 20+ CWE categories
	- Security conversations (`conversation`) — 2,210 multi-turn examples
	- Vulnerability classification (`classify`) — 52 examples
	- Secure code generation (`prompt_to_code`) — 43 examples

	---

	## Evaluation

	SecuCoder was evaluated as part of a 5-variant ablation study. Each variant adds one technique over the previous one:

	\| Variant \| Technique \| Overall Score \|
	\|---\|---\|---\|
	\| `llama31_8b` \| Baseline (no fine-tuning) \| 60.34 \|
	\| `secucoder_v1` \| + SFT (LoRA, FP16) \| 60.43 \|
	\| `secucoder_v1-q4` \| + Q4_K_M quantization \| 61.46 \|
	\| `secucoder_v1-q4_prompting` \| + Structured security prompt \| 64.46 \|
	\| `secucoder_v1-q4_prompting_rag` \| + RAG (OWASP, CWE, Python docs) \| 77.11 \|

	Overall score = mean `sample_score` over non-truncated samples (higher is better, max 100). The full SecuCoder system (`secucoder_v1-q4_prompting_rag`) achieves a +27.8% improvement over the untuned baseline.

	### Evaluation Methodology

	Generated code was scanned with Bandit and Semgrep using weighted severity scores:

	```
	penalty = Σ(bandit_high × 2.0 + bandit_medium × 1.25 + bandit_low × 0.75)
	+ Σ(semgrep_error × 5.0 + semgrep_warning × 3.0 + semgrep_info × 1.0)

	sample_score = 100 / (1 + 8 × penalty_per_loc)
	```

	Samples with invalid syntax score 0. Truncated samples are excluded from the overall score.

	---

	## Related Resources

	\| Resource \| Link \|
	\|---\|---\|
	\| Training dataset \| [ivitopow/secucoder](https://huggingface.co/datasets/ivitopow/secucoder) \|
	\| GGUF / Ollama version \| [ivitopow/secucoder-GGUF](https://huggingface.co/ivitopow/secucoder-GGUF) \|
	\| Base model \| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) \|

	---

	## Limitations

	- The model only covers Python. It has not been evaluated on other languages.
	- Static analysis tools (Bandit, Semgrep) do not detect all vulnerability types. Logic flaws or runtime-dependent issues may not be caught.
	- The model was fine-tuned on a specific set of CWE categories. It may underperform on vulnerability types not well represented in the training data.
	- As with all generative models, outputs should be reviewed by a developer before use in production.

	---

	## License

	This model is released under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/) license.

	It is built on top of Llama 3.1, which is subject to [Meta's Llama 3 Community License](https://llama.meta.com/llama3/license/). Please review both licenses before use.

	---

	## Citation

	```bibtex
	@misc{secucoder2025,
	title = {SecuCoder: Fine-tuning Llama 3.1 8B for Secure Python Code Generation},
	author = {SecuCoder Project},
	year = {2025},
	url = {https://huggingface.co/ivitopow/secucoder},
	note = {CC-BY-NC-SA-4.0}
	}
	```