File size: 4,203 Bytes
bc2b585 876154b 8b2f4fa bc2b585 876154b 8b2f4fa 876154b 8b2f4fa 876154b 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa 876154b 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa bc2b585 8b2f4fa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | ---
language:
- code
library_name: transformers
pipeline_tag: text-classification
tags:
- code-review
- bug-detection
- codebert
- python
- security
- static-analysis
datasets:
- code_search_net
base_model: microsoft/codebert-base
metrics:
- f1
- accuracy
---
# 🔍 CodeSheriff Bug Classifier
A fine-tuned **CodeBERT** model that classifies Python code snippets into five bug categories. Built as the classification engine inside [CodeSheriff](https://github.com/jayansh21/CodeSheriff) — an AI system that automatically reviews GitHub pull requests.
**Base model:** `microsoft/codebert-base` · **Task:** 5-class sequence classification · **Language:** Python
---
## Labels
| ID | Label | Example |
|----|-------|---------|
| 0 | Clean | Well-formed code, no issues |
| 1 | Null Reference Risk | `result.fetchone().name` without a None check |
| 2 | Type Mismatch | `"Error: " + error_code` where `error_code` is an int |
| 3 | Security Vulnerability | `"SELECT * FROM users WHERE id = " + user_id` |
| 4 | Logic Flaw | `for i in range(len(items) + 1)` |
---
## Usage
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("jayansh21/codesheriff-bug-classifier")
model = AutoModelForSequenceClassification.from_pretrained("jayansh21/codesheriff-bug-classifier")
LABELS = {
0: "Clean",
1: "Null Reference Risk",
2: "Type Mismatch",
3: "Security Vulnerability",
4: "Logic Flaw"
}
code = """
def get_user(uid):
query = "SELECT * FROM users WHERE id=" + uid
return db.execute(query)
"""
inputs = tokenizer(code, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
pred = logits.argmax(dim=-1).item()
confidence = probs[0][pred].item()
print(f"{LABELS[pred]} ({confidence:.1%})")
# Security Vulnerability (99.3%)
````
---
## Training
**Dataset:** [CodeSearchNet](https://huggingface.co/datasets/code_search_net) Python split with heuristic labeling, augmented with seed templates for underrepresented classes. Final training set: 4,600 balanced samples across all five classes. Stratified 80/10/10 train/val/test split.
**Key hyperparameters:**
| Parameter | Value |
|-----------|-------|
| Epochs | 4 |
| Effective batch size | 16 (8 × 2 grad accum) |
| Learning rate | 2e-5 |
| Optimizer | AdamW + linear warmup |
| Max token length | 512 |
| Class weighting | Yes — balanced |
| Hardware | NVIDIA RTX 3050 (4GB) |
---
## Evaluation
Test set: 840 samples (stratified).
| Class | Precision | Recall | F1 | Support |
|-------|-----------|--------|----|---------|
| Clean | 0.92 | 0.88 | 0.90 | 450 |
| Null Reference Risk | 0.63 | 0.78 | 0.70 | 120 |
| Type Mismatch | 0.96 | 0.95 | 0.95 | 75 |
| Security Vulnerability | 0.99 | 0.92 | 0.95 | 75 |
| Logic Flaw | 0.96 | 0.97 | 0.97 | 120 |
| **Macro F1** | **0.89** | **0.90** | **0.89** | |
**Confusion matrix:**
```
Clean NullRef TypeMis SecVuln Logic
Actual Clean [ 394 52 1 1 2 ]
Actual NullRef [ 23 93 1 0 3 ]
Actual TypeMis [ 3 1 71 0 0 ]
Actual SecVuln [ 4 1 1 69 0 ]
Actual Logic [ 3 0 0 0 117 ]
```
Logic Flaw and Security Vulnerability are the strongest classes — both have clear lexical patterns. Null Reference Risk is the weakest (precision 0.63) because null-risk code closely resembles clean code structurally. Most misclassifications there are false positives rather than missed bugs.
---
## Limitations
- **Python only** — not trained on other languages
- **Function-level input** — works best on 5–50 line snippets
- **Heuristic labels** — training data was pattern-matched, not expert-annotated
- **Not a SAST replacement** — probabilistic classifier, not a sound static analysis tool
---
## Links
- GitHub: [jayansh21/CodeSheriff](https://github.com/jayansh21/CodeSheriff)
- Live demo: [huggingface.co/spaces/jayansh21/CodeSheriff](https://huggingface.co/spaces/jayansh21/CodeSheriff)
```
```` |