File size: 2,783 Bytes
bd59201
6e9ff24
 
 
bd59201
 
fa93666
bd59201
6e9ff24
 
 
 
 
 
 
 
 
 
bd59201
 
4aeba64
6e9ff24
 
 
4aeba64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e9ff24
4aeba64
 
6e9ff24
 
 
 
 
 
 
 
 
 
4aeba64
 
 
 
 
 
6e9ff24
 
 
4aeba64
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: Code Security Risk Analyzer
emoji: πŸ”’
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- security
- vulnerability-detection
- owasp
- cwe
- code-analysis
- static-analysis
short_description: AI-powered code vulnerability detection with OWASP mapping
---

# πŸ”’ Code Security Risk Analyzer v2

AI-powered multi-label vulnerability detection across **30 CWE categories** mapped to **OWASP Top 10 2021**. Supports Python, JavaScript, Java, C, C++, PHP, and Go.

## v2 Improvements
- **Per-class threshold optimization** β€” each CWE has its own optimal detection threshold (not global 0.3)
- **Temperature-calibrated probabilities** β€” confidence scores are meaningful (0.8 β‰ˆ 80% true positive rate)
- **CWE-aware fix generation** β€” fixer model knows *what* vulnerability to fix
- **3.7x larger fixer model** β€” CodeT5+ 220M (was flan-t5-small 60M)
- **Asymmetric Loss training** β€” handles 90% safe class imbalance

## Model Performance

| Model | Metric | Score |
|-------|--------|-------|
| **Classifier** (GraphCodeBERT 125M) | Macro F1 | **0.476** (+311% vs baseline) |
| | Weighted F1 | **0.945** |
| | Safe Detection F1 | **0.982** |
| **Fixer** (CodeT5+ 220M) | BLEU | **81.0** |
| | ROUGE-L | **0.788** |
| | Eval Loss | **0.175** (3.1x better than v1) |

## Features
- **Detection Model:** [GraphCodeBERT classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) β€” 125M params, two-phase training with ASL loss
- **Fix Generator:** [CodeT5+ 220M](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) β€” CWE-aware input format, beam search generation
- **Structured Reports:** CWE ID, OWASP category, severity score, exploit likelihood, plain English explanation
- **Attack Chain Analysis:** Multi-vulnerability chaining analysis
- **REST API:** JSON endpoint for integration into CI/CD pipelines

## API Usage

```python
from gradio_client import Client

client = Client("ayshajavd/code-security-analyzer")

# Get markdown report
report = client.predict(code="your code here", api_name="/analyze")

# Get structured JSON report
json_report = client.predict(code="your code here", api_name="/get_json_report")
```

## Models & Dataset
- [graphcodebert-vuln-classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) β€” Multi-label CWE detection
- [codet5p-vuln-fixer](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) β€” Vulnerability fix generation
- [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset) β€” 175K labeled samples

## Training Notebooks
All training code: [vuln-classifier-training-notebooks](https://huggingface.co/ayshajavd/vuln-classifier-training-notebooks)