PyGuard V4 โ Python Vulnerability Detector
Model Description
PyGuard V4 is a fine-tuned Microsoft CodeBERT model for detecting security vulnerabilities in Python code. It improves upon VUDENC (Wartschinski et al. 2022) by replacing Word2Vec+LSTM with CodeBERT.
Performance vs VUDENC
| Metric | VUDENC (LSTM) | PyGuard V2 (CodeBERT) | Improvement |
|---|---|---|---|
| Precision | 82-96% | 100.00% | +4-18% |
| Recall | 78-87% | 100.00% | +13-22% |
| F1 Score | 80-90% | 100.00% | +10-20% |
| Accuracy | N/A | 100.00% | โ |
Training Dataset
- Source: VUDENC Dataset by Wartschinski et al. 2022
- DOI: 10.5281/zenodo.3559841
- Paper: Information and Software Technology Journal, 2022
- Total samples: 2,457 (1,228 vulnerable + 1,229 safe)
- Split: 80% train, 10% val, 10% test
Vulnerabilities Detected (7 CWEs)
- CWE-89: SQL Injection
- CWE-78: Command Injection
- CWE-79: Cross-Site Scripting (XSS)
- CWE-352: Cross-Site Request Forgery (CSRF)
- CWE-94: Remote Code Execution
- CWE-22: Path Disclosure/Traversal
- CWE-601: Open Redirect
Architecture
- Base model: microsoft/codebert-base
- Classification head: Linear(768, 2) with Dropout(0.3)
- Pooling: Mean pooling on last hidden state
- Max sequence length: 256 tokens
Citation
@article{wartschinski2022vudenc,
title={VUDENC: Vulnerability Detection with Deep Learning
on a Natural Codebase for Python},
author={Wartschinski, Laura and Noller, Yannic and
Vogel, Thomas and Kehrer, Timo and Grunske, Lars},
journal={Information and Software Technology},
volume={144},
year={2022}
}
- Downloads last month
- 28
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support