PyGuard V4 — Python Vulnerability Detector

Model Description

PyGuard V4 is a fine-tuned Microsoft CodeBERT model for detecting security vulnerabilities in Python code. It improves upon VUDENC (Wartschinski et al. 2022) by replacing Word2Vec+LSTM with CodeBERT.

Performance vs VUDENC

Metric	VUDENC (LSTM)	PyGuard V2 (CodeBERT)	Improvement
Precision	82-96%	100.00%	+4-18%
Recall	78-87%	100.00%	+13-22%
F1 Score	80-90%	100.00%	+10-20%
Accuracy	N/A	100.00%	—

Training Dataset

Source: VUDENC Dataset by Wartschinski et al. 2022
DOI: 10.5281/zenodo.3559841
Paper: Information and Software Technology Journal, 2022
Total samples: 2,457 (1,228 vulnerable + 1,229 safe)
Split: 80% train, 10% val, 10% test

Vulnerabilities Detected (7 CWEs)

CWE-89: SQL Injection
CWE-78: Command Injection
CWE-79: Cross-Site Scripting (XSS)
CWE-352: Cross-Site Request Forgery (CSRF)
CWE-94: Remote Code Execution
CWE-22: Path Disclosure/Traversal
CWE-601: Open Redirect

Architecture

Base model: microsoft/codebert-base
Classification head: Linear(768, 2) with Dropout(0.3)
Pooling: Mean pooling on last hidden state
Max sequence length: 256 tokens

Citation

@article{wartschinski2022vudenc,
  title={VUDENC: Vulnerability Detection with Deep Learning
         on a Natural Codebase for Python},
  author={Wartschinski, Laura and Noller, Yannic and
          Vogel, Thomas and Kehrer, Timo and Grunske, Lars},
  journal={Information and Software Technology},
  volume={144},
  year={2022}
}

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support