File size: 1,883 Bytes
411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 411d58a 3b0aa07 dd0fb08 3b0aa07 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
language:
- code
license: apache-2.0
tags:
- code
- security
- vulnerability-detection
- codebert
datasets:
- code_x_glue_cc_defect_detection
pipeline_tag: text-classification
widget:
- text: |
import java.sql.*;
public class Example {
public void query(String input) {
String sql = "SELECT * FROM users WHERE name = '" + input + "'";
}
}
---
# CodeBERT fine-tuned for Java Vulnerability Detection
CodeBERT model fine-tuned for detecting security vulnerabilities in Java code.
## Model Description
This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code.
## Intended Uses
- Detect security vulnerabilities in Java source code
- Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1)
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java")
model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java")
# run code
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length')
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits
print(np.argmax(logits.detach().numpy()))
```
## Training Data
Trained on CodeXGLUE Defect Detection dataset.
## Limitations
- Focused on Java code only
- May not detect all types of vulnerabilities |