ayshajavd's picture
Update README with v2 features and metrics
4aeba64 verified
---
title: Code Security Risk Analyzer
emoji: πŸ”’
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- security
- vulnerability-detection
- owasp
- cwe
- code-analysis
- static-analysis
short_description: AI-powered code vulnerability detection with OWASP mapping
---
# πŸ”’ Code Security Risk Analyzer v2
AI-powered multi-label vulnerability detection across **30 CWE categories** mapped to **OWASP Top 10 2021**. Supports Python, JavaScript, Java, C, C++, PHP, and Go.
## v2 Improvements
- **Per-class threshold optimization** β€” each CWE has its own optimal detection threshold (not global 0.3)
- **Temperature-calibrated probabilities** β€” confidence scores are meaningful (0.8 β‰ˆ 80% true positive rate)
- **CWE-aware fix generation** β€” fixer model knows *what* vulnerability to fix
- **3.7x larger fixer model** β€” CodeT5+ 220M (was flan-t5-small 60M)
- **Asymmetric Loss training** β€” handles 90% safe class imbalance
## Model Performance
| Model | Metric | Score |
|-------|--------|-------|
| **Classifier** (GraphCodeBERT 125M) | Macro F1 | **0.476** (+311% vs baseline) |
| | Weighted F1 | **0.945** |
| | Safe Detection F1 | **0.982** |
| **Fixer** (CodeT5+ 220M) | BLEU | **81.0** |
| | ROUGE-L | **0.788** |
| | Eval Loss | **0.175** (3.1x better than v1) |
## Features
- **Detection Model:** [GraphCodeBERT classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) β€” 125M params, two-phase training with ASL loss
- **Fix Generator:** [CodeT5+ 220M](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) β€” CWE-aware input format, beam search generation
- **Structured Reports:** CWE ID, OWASP category, severity score, exploit likelihood, plain English explanation
- **Attack Chain Analysis:** Multi-vulnerability chaining analysis
- **REST API:** JSON endpoint for integration into CI/CD pipelines
## API Usage
```python
from gradio_client import Client
client = Client("ayshajavd/code-security-analyzer")
# Get markdown report
report = client.predict(code="your code here", api_name="/analyze")
# Get structured JSON report
json_report = client.predict(code="your code here", api_name="/get_json_report")
```
## Models & Dataset
- [graphcodebert-vuln-classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) β€” Multi-label CWE detection
- [codet5p-vuln-fixer](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) β€” Vulnerability fix generation
- [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset) β€” 175K labeled samples
## Training Notebooks
All training code: [vuln-classifier-training-notebooks](https://huggingface.co/ayshajavd/vuln-classifier-training-notebooks)