Update README with v2 features and metrics
Browse files
README.md
CHANGED
|
@@ -18,13 +18,31 @@ tags:
|
|
| 18 |
short_description: AI-powered code vulnerability detection with OWASP mapping
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# π Code Security Risk Analyzer
|
| 22 |
|
| 23 |
AI-powered multi-label vulnerability detection across **30 CWE categories** mapped to **OWASP Top 10 2021**. Supports Python, JavaScript, Java, C, C++, PHP, and Go.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Features
|
| 26 |
-
- **Detection Model:** [GraphCodeBERT classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier)
|
| 27 |
-
- **Fix Generator:** [CodeT5+](https://huggingface.co/ayshajavd/codet5p-vuln-fixer)
|
| 28 |
- **Structured Reports:** CWE ID, OWASP category, severity score, exploit likelihood, plain English explanation
|
| 29 |
- **Attack Chain Analysis:** Multi-vulnerability chaining analysis
|
| 30 |
- **REST API:** JSON endpoint for integration into CI/CD pipelines
|
|
@@ -35,10 +53,18 @@ AI-powered multi-label vulnerability detection across **30 CWE categories** mapp
|
|
| 35 |
from gradio_client import Client
|
| 36 |
|
| 37 |
client = Client("ayshajavd/code-security-analyzer")
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
```
|
| 40 |
|
| 41 |
## Models & Dataset
|
| 42 |
-
- [graphcodebert-vuln-classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier)
|
| 43 |
-
- [codet5p-vuln-fixer](https://huggingface.co/ayshajavd/codet5p-vuln-fixer)
|
| 44 |
-
- [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset)
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
short_description: AI-powered code vulnerability detection with OWASP mapping
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# π Code Security Risk Analyzer v2
|
| 22 |
|
| 23 |
AI-powered multi-label vulnerability detection across **30 CWE categories** mapped to **OWASP Top 10 2021**. Supports Python, JavaScript, Java, C, C++, PHP, and Go.
|
| 24 |
|
| 25 |
+
## v2 Improvements
|
| 26 |
+
- **Per-class threshold optimization** β each CWE has its own optimal detection threshold (not global 0.3)
|
| 27 |
+
- **Temperature-calibrated probabilities** β confidence scores are meaningful (0.8 β 80% true positive rate)
|
| 28 |
+
- **CWE-aware fix generation** β fixer model knows *what* vulnerability to fix
|
| 29 |
+
- **3.7x larger fixer model** β CodeT5+ 220M (was flan-t5-small 60M)
|
| 30 |
+
- **Asymmetric Loss training** β handles 90% safe class imbalance
|
| 31 |
+
|
| 32 |
+
## Model Performance
|
| 33 |
+
|
| 34 |
+
| Model | Metric | Score |
|
| 35 |
+
|-------|--------|-------|
|
| 36 |
+
| **Classifier** (GraphCodeBERT 125M) | Macro F1 | **0.476** (+311% vs baseline) |
|
| 37 |
+
| | Weighted F1 | **0.945** |
|
| 38 |
+
| | Safe Detection F1 | **0.982** |
|
| 39 |
+
| **Fixer** (CodeT5+ 220M) | BLEU | **81.0** |
|
| 40 |
+
| | ROUGE-L | **0.788** |
|
| 41 |
+
| | Eval Loss | **0.175** (3.1x better than v1) |
|
| 42 |
+
|
| 43 |
## Features
|
| 44 |
+
- **Detection Model:** [GraphCodeBERT classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) β 125M params, two-phase training with ASL loss
|
| 45 |
+
- **Fix Generator:** [CodeT5+ 220M](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) β CWE-aware input format, beam search generation
|
| 46 |
- **Structured Reports:** CWE ID, OWASP category, severity score, exploit likelihood, plain English explanation
|
| 47 |
- **Attack Chain Analysis:** Multi-vulnerability chaining analysis
|
| 48 |
- **REST API:** JSON endpoint for integration into CI/CD pipelines
|
|
|
|
| 53 |
from gradio_client import Client
|
| 54 |
|
| 55 |
client = Client("ayshajavd/code-security-analyzer")
|
| 56 |
+
|
| 57 |
+
# Get markdown report
|
| 58 |
+
report = client.predict(code="your code here", api_name="/analyze")
|
| 59 |
+
|
| 60 |
+
# Get structured JSON report
|
| 61 |
+
json_report = client.predict(code="your code here", api_name="/get_json_report")
|
| 62 |
```
|
| 63 |
|
| 64 |
## Models & Dataset
|
| 65 |
+
- [graphcodebert-vuln-classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) β Multi-label CWE detection
|
| 66 |
+
- [codet5p-vuln-fixer](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) β Vulnerability fix generation
|
| 67 |
+
- [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset) β 175K labeled samples
|
| 68 |
+
|
| 69 |
+
## Training Notebooks
|
| 70 |
+
All training code: [vuln-classifier-training-notebooks](https://huggingface.co/ayshajavd/vuln-classifier-training-notebooks)
|