File size: 1,883 Bytes
411d58a
3b0aa07
 
 
 
 
 
 
 
411d58a
3b0aa07
 
 
 
 
 
 
 
 
 
411d58a
 
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
411d58a
3b0aa07
 
411d58a
3b0aa07
 
 
411d58a
3b0aa07
 
411d58a
3b0aa07
411d58a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b0aa07
 
 
411d58a
3b0aa07
dd0fb08
3b0aa07
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
language:
- code
license: apache-2.0
tags:
- code
- security
- vulnerability-detection
- codebert
datasets:
- code_x_glue_cc_defect_detection
pipeline_tag: text-classification
widget:
- text: |
    import java.sql.*;
    public class Example {
        public void query(String input) {
            String sql = "SELECT * FROM users WHERE name = '" + input + "'";
        }
    }
---

# CodeBERT fine-tuned for Java Vulnerability Detection

CodeBERT model fine-tuned for detecting security vulnerabilities in Java code.

## Model Description

This model is fine-tuned from [microsoft/codebert-base](https://huggingface.co/microsoft/codebert-base) for binary classification of secure/insecure Java code.

## Intended Uses

- Detect security vulnerabilities in Java source code
- Binary classification: Safe (LABEL_0) vs Vulnerable (LABEL_1)

## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("mangsense/codebert_java")
model = AutoModelForSequenceClassification.from_pretrained("mangsense/codebert_java")

# run code
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
tokenizer = AutoTokenizer.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')
model = AutoModelForSequenceClassification.from_pretrained('mrm8488/codebert-base-finetuned-detect-insecure-code')

inputs = tokenizer("your code here", return_tensors="pt", truncation=True, padding='max_length')
labels = torch.tensor([1]).unsqueeze(0)  # Batch size 1
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits

print(np.argmax(logits.detach().numpy()))
```

## Training Data

Trained on CodeXGLUE Defect Detection dataset.

## Limitations

- Focused on Java code only
- May not detect all types of vulnerabilities