File size: 2,392 Bytes
258e939
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
language:
- en
base_model:
- roberta-base
pipeline_tag: text-classification
tags:
- security
- prompt
- cyber-security
- llm-security
- prompt-injection
- sql-injection
library_name: transformers
---

# SQL Injection Detector

A fine-tuned RoBERTa model for detecting SQL injection attacks in prompts before they reach an LLM.

## Overview

This model is part of [PromptWAF](https://github.com/edaerer/promptwaf) — a multi-layered ML-based Web Application Firewall designed to detect and block prompt injection attacks.

The model identifies prompts containing SQL command injection patterns (`'; DROP TABLE`, `OR 1=1`, `UNION SELECT`, etc.) commonly used to manipulate database queries through LLM interfaces.

## Model Details

- **Architecture**: RoBERTa (Base)
- **Task**: Binary Sequence Classification
- **Training Data**: Trained on a custom, internally curated SQL injection dataset
- **Labels**: 
  - `0` → Safe/Benign
  - `1` → SQL Injection Attack

## Usage

### With PromptWAF

```bash
# Automatically used in PromptWAF via .env configuration
SQL_INJECTION_MODEL_DIR=edaerer/promptwaf-sql-injection
```

### Standalone

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "edaerer/promptwaf-sql-injection"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "'; DROP TABLE users;--"
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

probabilities = torch.softmax(outputs.logits, dim=-1)
score = probabilities[0][1].item()  # Malicious score

print(f"SQL Injection Risk: {score:.2%}")
```

## Performance

- **Threshold**: 0.5 (adjustable in PromptWAF)
- **Input**: Max 256 tokens

## Integration

This model is designed to work seamlessly with:
- **PromptWAF** - The main security orchestrator
- **HuggingFace Transformers** - For inference
- Any standard sequence classification pipeline

## Citation

```bibtex
@software{promptwaf2026,
  author = {Erer, Eda and Odabasi, Talha},
  title  = {PromptWAF: A Multi-Layered ML Defense for LLM Prompt Security},
  year   = {2026},
  url    = {https://github.com/edaerer/promptwaf}
}
```

## License

Apache License 2.0

---

For more information, visit [PromptWAF GitHub Repository](https://github.com/edaerer/promptwaf)