--- base_model: Qwen/Qwen2.5-Coder-7B-Instruct library_name: peft pipeline_tag: text-generation license: apache-2.0 language: - en tags: - code - python - cybersecurity - vulnerability-detection - vulnerability-repair - secure-code-generation datasets: - abkmystery/PySecPatch-72K --- # PySecPatch-7B PySecPatch-7B is a defensive Python security adapter for `Qwen/Qwen2.5-Coder-7B-Instruct`. It is designed for vulnerability triage, CWE classification, security explanation, and candidate secure-code generation. It should be used with human review and automated verification. **Author:** Ahmed Bin Khalid, Independent Researcher ([ORCID 0000-0002-0616-2604](https://orcid.org/0000-0002-0616-2604)) This repository contains the PEFT adapter, tokenizer files, training metadata, and chat template. The Qwen base weights are not redistributed. ## Evaluation ### Family-disjoint holdout The final adapter and unmodified base were evaluated on the same 3,200 records. | Metric | Base | PySecPatch | |---|---:|---:| | Classification accuracy | 4.81% | 90.72% | | Classification F1 | 9.18% | 93.40% | | Strict JSON | 91.53% | 99.44% | | Clean-negative preservation | 0.00% | 100.00% | | Security-control pass | 1.63% | 88.08% | | Normalized exact repair | 0.17% | 83.33% | | Parseable fixed code | 6.42% | 99.21% | Paired classification yielded 2,770 adapter-only correct predictions and 21 base-only correct predictions across 3,200 records (`log10(p) = -787.25`, exact two-sided McNemar). ### External and repository evaluation On the pinned SALLM scored subset, PySecPatch achieved 26.77% functional pass, 31.88% security-test pass, and 9.58% secure-functional pass. Four of 100 prompts lacked upstream fixtures. Repository-format holdout performance was 38.00% patch application and 34.38% security-control pass. On a frozen 24-case repository suite, vulnerability detection and clean preservation were perfect, but none of 12 vulnerable patches passed every acceptance gate. These results do not support autonomous repair or state-of-the-art claims. ## Training The adapter was trained in two consecutive QLoRA stages. Stage A used 8,400 train records from a 12,000-record corpus. Stage B continued from that adapter using 48,000 train records from a separate 60,000-record corpus. Validation, test, and holdout splits never entered optimization. Both corpora contain generated Python examples and are released under Apache-2.0. The combined dataset repository contains 72,000 records spanning 43 CWEs in the second stage and 35 CWEs in the first stage. ## Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = "Qwen/Qwen2.5-Coder-7B-Instruct" adapter = "abkmystery/PySecPatch-7B" tokenizer = AutoTokenizer.from_pretrained(adapter) model = AutoModelForCausalLM.from_pretrained(base, device_map="auto") model = PeftModel.from_pretrained(model, adapter) ``` Use the system instruction: ```text You are PySecPatch, a defensive Python secure coding model. Identify vulnerabilities, explain risk, and produce minimal safe patches. Return strict JSON only. ``` The expected response keys are `is_vulnerable`, `cwe`, `vuln_type`, `vulnerable_lines`, `explanation`, `fixed_code`, `patch_summary`, and `safe_test`. ## Intended Use - Defensive analysis of user-supplied Python code. - Candidate finding classification and CWE identification. - Security explanations and review assistance. - Candidate snippet repairs subject to tests and human review. - Research on security specialization and generalization. ## Limitations - External secure-functional generation is substantially weaker than controlled holdout performance. - Repository-level unified diffs often fail to apply or pass verification. - Line localization is moderate (`F1 = 0.4407`). - The training corpora are generated rather than mined from real repositories. - The model can miss vulnerabilities and can produce plausible but incomplete repairs. Do not use PySecPatch for autonomous deployment, unauthorized scanning, exploit development, or offensive automation. ## Reproducibility Adapter model SHA-256: ```text 4c2b5c7c0d2982b99de9c319e998274fc12f3aae5bf8d2c3b5db58c5864dc65b ``` Full evaluation reports, raw predictions, frozen hashes, and scripts are linked from the GitHub and archival releases. Training dataset: [`10.5281/zenodo.21016753`](https://doi.org/10.5281/zenodo.21016753). ## Citation ```bibtex @software{khalid2026pysecpatch, author = {Bin Khalid, Ahmed}, title = {PySecPatch: Defensive Python Vulnerability Triage and Repair Research Artifacts}, year = {2026}, version = {0.1.1}, url = {https://github.com/abkmystery/PySecPatch} } ``` Released under the Apache License 2.0. See `LICENSE`. Current software archive: [`10.5281/zenodo.21015885`](https://doi.org/10.5281/zenodo.21015885). All versions: [`10.5281/zenodo.21015503`](https://doi.org/10.5281/zenodo.21015503).