Text Generation
PEFT
Safetensors
English
code
python
cybersecurity
vulnerability-detection
vulnerability-repair
secure-code-generation
conversational
Instructions to use abkmystery/PySecPatch-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use abkmystery/PySecPatch-7B with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-7B-Instruct") model = PeftModel.from_pretrained(base_model, "abkmystery/PySecPatch-7B") - Notebooks
- Google Colab
- Kaggle
| base_model: Qwen/Qwen2.5-Coder-7B-Instruct | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - code | |
| - python | |
| - cybersecurity | |
| - vulnerability-detection | |
| - vulnerability-repair | |
| - secure-code-generation | |
| datasets: | |
| - abkmystery/PySecPatch-72K | |
| # PySecPatch-7B | |
| PySecPatch-7B is a defensive Python security adapter for `Qwen/Qwen2.5-Coder-7B-Instruct`. It is designed for vulnerability triage, CWE classification, security explanation, and candidate secure-code generation. It should be used with human review and automated verification. | |
| **Author:** Ahmed Bin Khalid, Independent Researcher ([ORCID 0000-0002-0616-2604](https://orcid.org/0000-0002-0616-2604)) | |
| This repository contains the PEFT adapter, tokenizer files, training metadata, and chat template. The Qwen base weights are not redistributed. | |
| ## Evaluation | |
| ### Family-disjoint holdout | |
| The final adapter and unmodified base were evaluated on the same 3,200 records. | |
| | Metric | Base | PySecPatch | | |
| |---|---:|---:| | |
| | Classification accuracy | 4.81% | 90.72% | | |
| | Classification F1 | 9.18% | 93.40% | | |
| | Strict JSON | 91.53% | 99.44% | | |
| | Clean-negative preservation | 0.00% | 100.00% | | |
| | Security-control pass | 1.63% | 88.08% | | |
| | Normalized exact repair | 0.17% | 83.33% | | |
| | Parseable fixed code | 6.42% | 99.21% | | |
| Paired classification yielded 2,770 adapter-only correct predictions and 21 base-only correct predictions across 3,200 records (`log10(p) = -787.25`, exact two-sided McNemar). | |
| ### External and repository evaluation | |
| On the pinned SALLM scored subset, PySecPatch achieved 26.77% functional pass, 31.88% security-test pass, and 9.58% secure-functional pass. Four of 100 prompts lacked upstream fixtures. | |
| Repository-format holdout performance was 38.00% patch application and 34.38% security-control pass. On a frozen 24-case repository suite, vulnerability detection and clean preservation were perfect, but none of 12 vulnerable patches passed every acceptance gate. These results do not support autonomous repair or state-of-the-art claims. | |
| ## Training | |
| The adapter was trained in two consecutive QLoRA stages. Stage A used 8,400 train records from a 12,000-record corpus. Stage B continued from that adapter using 48,000 train records from a separate 60,000-record corpus. Validation, test, and holdout splits never entered optimization. | |
| Both corpora contain generated Python examples and are released under Apache-2.0. The combined dataset repository contains 72,000 records spanning 43 CWEs in the second stage and 35 CWEs in the first stage. | |
| ## Usage | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| base = "Qwen/Qwen2.5-Coder-7B-Instruct" | |
| adapter = "abkmystery/PySecPatch-7B" | |
| tokenizer = AutoTokenizer.from_pretrained(adapter) | |
| model = AutoModelForCausalLM.from_pretrained(base, device_map="auto") | |
| model = PeftModel.from_pretrained(model, adapter) | |
| ``` | |
| Use the system instruction: | |
| ```text | |
| You are PySecPatch, a defensive Python secure coding model. Identify vulnerabilities, explain risk, and produce minimal safe patches. Return strict JSON only. | |
| ``` | |
| The expected response keys are `is_vulnerable`, `cwe`, `vuln_type`, `vulnerable_lines`, `explanation`, `fixed_code`, `patch_summary`, and `safe_test`. | |
| ## Intended Use | |
| - Defensive analysis of user-supplied Python code. | |
| - Candidate finding classification and CWE identification. | |
| - Security explanations and review assistance. | |
| - Candidate snippet repairs subject to tests and human review. | |
| - Research on security specialization and generalization. | |
| ## Limitations | |
| - External secure-functional generation is substantially weaker than controlled holdout performance. | |
| - Repository-level unified diffs often fail to apply or pass verification. | |
| - Line localization is moderate (`F1 = 0.4407`). | |
| - The training corpora are generated rather than mined from real repositories. | |
| - The model can miss vulnerabilities and can produce plausible but incomplete repairs. | |
| Do not use PySecPatch for autonomous deployment, unauthorized scanning, exploit development, or offensive automation. | |
| ## Reproducibility | |
| Adapter model SHA-256: | |
| ```text | |
| 4c2b5c7c0d2982b99de9c319e998274fc12f3aae5bf8d2c3b5db58c5864dc65b | |
| ``` | |
| Full evaluation reports, raw predictions, frozen hashes, and scripts are linked from the GitHub and archival releases. | |
| Training dataset: [`10.5281/zenodo.21016753`](https://doi.org/10.5281/zenodo.21016753). | |
| ## Citation | |
| ```bibtex | |
| @software{khalid2026pysecpatch, | |
| author = {Bin Khalid, Ahmed}, | |
| title = {PySecPatch: Defensive Python Vulnerability Triage and Repair Research Artifacts}, | |
| year = {2026}, | |
| version = {0.1.1}, | |
| url = {https://github.com/abkmystery/PySecPatch} | |
| } | |
| ``` | |
| Released under the Apache License 2.0. See `LICENSE`. | |
| Current software archive: [`10.5281/zenodo.21015885`](https://doi.org/10.5281/zenodo.21015885). All versions: [`10.5281/zenodo.21015503`](https://doi.org/10.5281/zenodo.21015503). | |