Spaces:

ayshajavd
/

code-security-analyzer

Running

File size: 2,783 Bytes

bd59201
6e9ff24
 
 
bd59201
 
fa93666
bd59201
6e9ff24
 
 
 
 
 
 
 
 
 
bd59201
 
4aeba64
6e9ff24
 
 
4aeba64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e9ff24
4aeba64
 
6e9ff24
 
 
 
 
 
 
 
 
 
4aeba64
 
 
 
 
 
6e9ff24
 
 
4aeba64

---
title: Code Security Risk Analyzer
emoji: 🔒
colorFrom: red
colorTo: purple
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
- security
- vulnerability-detection
- owasp
- cwe
- code-analysis
- static-analysis
short_description: AI-powered code vulnerability detection with OWASP mapping
---

# 🔒 Code Security Risk Analyzer v2

AI-powered multi-label vulnerability detection across **30 CWE categories** mapped to **OWASP Top 10 2021**. Supports Python, JavaScript, Java, C, C++, PHP, and Go.

## v2 Improvements
- **Per-class threshold optimization** — each CWE has its own optimal detection threshold (not global 0.3)
- **Temperature-calibrated probabilities** — confidence scores are meaningful (0.8 ≈ 80% true positive rate)
- **CWE-aware fix generation** — fixer model knows *what* vulnerability to fix
- **3.7x larger fixer model** — CodeT5+ 220M (was flan-t5-small 60M)
- **Asymmetric Loss training** — handles 90% safe class imbalance

## Model Performance

| Model | Metric | Score |
|-------|--------|-------|
| **Classifier** (GraphCodeBERT 125M) | Macro F1 | **0.476** (+311% vs baseline) |
| | Weighted F1 | **0.945** |
| | Safe Detection F1 | **0.982** |
| **Fixer** (CodeT5+ 220M) | BLEU | **81.0** |
| | ROUGE-L | **0.788** |
| | Eval Loss | **0.175** (3.1x better than v1) |

## Features
- **Detection Model:** [GraphCodeBERT classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) — 125M params, two-phase training with ASL loss
- **Fix Generator:** [CodeT5+ 220M](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) — CWE-aware input format, beam search generation
- **Structured Reports:** CWE ID, OWASP category, severity score, exploit likelihood, plain English explanation
- **Attack Chain Analysis:** Multi-vulnerability chaining analysis
- **REST API:** JSON endpoint for integration into CI/CD pipelines

## API Usage

```python
from gradio_client import Client

client = Client("ayshajavd/code-security-analyzer")

# Get markdown report
report = client.predict(code="your code here", api_name="/analyze")

# Get structured JSON report
json_report = client.predict(code="your code here", api_name="/get_json_report")
```

## Models & Dataset
- [graphcodebert-vuln-classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) — Multi-label CWE detection
- [codet5p-vuln-fixer](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) — Vulnerability fix generation
- [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset) — 175K labeled samples

## Training Notebooks
All training code: [vuln-classifier-training-notebooks](https://huggingface.co/ayshajavd/vuln-classifier-training-notebooks)