--- license: cc-by-nc-sa-4.0 language: - en base_model: meta-llama/Llama-3.1-8B-Instruct tags: - code - security - python - lora - qlora - fine-tuned - cybersecurity - secure-coding - vulnerability - cwe - peft - transformers task_categories: - text-generation task_ids: - language-modeling --- # SecuCoder SecuCoder is a fine-tuned version of [Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) trained to generate secure Python code and remediate security vulnerabilities. It is part of a research pipeline that combines supervised fine-tuning (SFT), structured prompting, and retrieval-augmented generation (RAG) to reduce the number of vulnerabilities in automatically generated Python code. --- ## Model Description | Field | Details | |---|---| | **Base model** | `meta-llama/Llama-3.1-8B-Instruct` | | **Fine-tuning method** | QLoRA (NF4 4-bit) + LoRA adapters | | **Training dataset** | [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder) | | **Training examples** | 5,708 (train) + 317 (validation) | | **Epochs** | 2 | | **Format** | Merged safetensors (bfloat16) | | **Language** | English | | **Domain** | Python secure coding | --- ## Intended Use SecuCoder is designed for: - **Vulnerability remediation** — given a Python snippet with a security flaw, produce a corrected version. - **Secure code generation** — generate Python code from a natural language specification, avoiding common weaknesses. - **Vulnerability classification** — identify whether a Python snippet is secure or vulnerable. The model has been evaluated against the untuned Llama 3.1 8B Instruct baseline using static analysis tools (Bandit + Semgrep) and shows meaningful improvement in security metrics. > This model is intended for research and educational purposes. It should not be used as the sole security review mechanism in production systems. --- ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "ivitopow/secucoder" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ { "role": "system", "content": "You are a secure Python assistant. Help identify, explain, and fix security issues in Python code. Prefer safe, practical, and production-ready solutions." }, { "role": "user", "content": "Fix the security vulnerability in this Python code.\n\n```python\nname = request.args.get('name')\nresp = make_response(\"Your name is \" + name)\n```\n\nCWE: CWE-079" } ] input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ).to(model.device) output = model.generate( input_ids, max_new_tokens=512, temperature=0.1, top_p=0.9, do_sample=True, ) response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True) print(response) ``` ### Usage with Ollama A quantized GGUF version (Q4_K_M, ~4.6 GB) is available at [`ivitopow/secucoder-GGUF`](https://huggingface.co/ivitopow/secucoder-GGUF): ```bash ollama create secucoder -f Modelfile ollama run secucoder ``` --- ## Training Details ### Method The model was trained using **QLoRA** (Quantized Low-Rank Adaptation): the base model is loaded in 4-bit NF4 precision via BitsAndBytes, and low-rank adapters are attached to all projection layers. After training, the adapters are merged back into the base model and saved as standard safetensors. ### LoRA Configuration | Parameter | Value | |---|---| | Rank (`r`) | 16 | | Alpha | 32 | | Dropout | 0.05 | | Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` | ### Training Hyperparameters | Parameter | Value | |---|---| | Epochs | 2 | | Learning rate | 2e-4 | | LR scheduler | Cosine with 3% warmup | | Optimizer | `paged_adamw_8bit` | | Gradient checkpointing | Enabled | | Precision | bfloat16 compute, NF4 storage | | Sequence length | 2048 tokens | ### Training Data Trained on the [SecuCoder Messages Corpus](https://huggingface.co/datasets/ivitopow/secucoder), a dataset of 6,342 Python security examples in chat format covering: - **Vulnerability fix** (`fix`) — 4,037 examples across 20+ CWE categories - **Security conversations** (`conversation`) — 2,210 multi-turn examples - **Vulnerability classification** (`classify`) — 52 examples - **Secure code generation** (`prompt_to_code`) — 43 examples --- ## Evaluation SecuCoder was evaluated as part of a 5-variant ablation study. Each variant adds one technique over the previous one: | Variant | Technique | Overall Score | |---|---|---| | `llama31_8b` | Baseline (no fine-tuning) | 60.34 | | `secucoder_v1` | + SFT (LoRA, FP16) | 60.43 | | `secucoder_v1-q4` | + Q4_K_M quantization | 61.46 | | `secucoder_v1-q4_prompting` | + Structured security prompt | 64.46 | | `secucoder_v1-q4_prompting_rag` | + RAG (OWASP, CWE, Python docs) | **77.11** | Overall score = mean `sample_score` over non-truncated samples (higher is better, max 100). The full SecuCoder system (`secucoder_v1-q4_prompting_rag`) achieves a **+27.8% improvement** over the untuned baseline. ### Evaluation Methodology Generated code was scanned with **Bandit** and **Semgrep** using weighted severity scores: ``` penalty = Σ(bandit_high × 2.0 + bandit_medium × 1.25 + bandit_low × 0.75) + Σ(semgrep_error × 5.0 + semgrep_warning × 3.0 + semgrep_info × 1.0) sample_score = 100 / (1 + 8 × penalty_per_loc) ``` Samples with invalid syntax score 0. Truncated samples are excluded from the overall score. --- ## Related Resources | Resource | Link | |---|---| | Training dataset | [ivitopow/secucoder](https://huggingface.co/datasets/ivitopow/secucoder) | | GGUF / Ollama version | [ivitopow/secucoder-GGUF](https://huggingface.co/ivitopow/secucoder-GGUF) | | Base model | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | --- ## Limitations - The model only covers **Python**. It has not been evaluated on other languages. - Static analysis tools (Bandit, Semgrep) do not detect all vulnerability types. Logic flaws or runtime-dependent issues may not be caught. - The model was fine-tuned on a specific set of CWE categories. It may underperform on vulnerability types not well represented in the training data. - As with all generative models, outputs should be reviewed by a developer before use in production. --- ## License This model is released under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. It is built on top of Llama 3.1, which is subject to [Meta's Llama 3 Community License](https://llama.meta.com/llama3/license/). Please review both licenses before use. --- ## Citation ```bibtex @misc{secucoder2025, title = {SecuCoder: Fine-tuning Llama 3.1 8B for Secure Python Code Generation}, author = {SecuCoder Project}, year = {2025}, url = {https://huggingface.co/ivitopow/secucoder}, note = {CC-BY-NC-SA-4.0} } ```