File size: 3,597 Bytes
c2164a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# WAF-DistilBERT: Web Application Firewall using DistilBERT

## Model Description

WAF-DistilBERT is a fine-tuned version of DistilBERT, specifically trained to detect malicious web requests in real-time. This model serves as the core component of a Web Application Firewall (WAF) system.

## Intended Use

This model is designed for:
- Real-time detection of malicious web requests
- Integration into web application security systems
- Identifying common web attacks like SQL injection, XSS, and path traversal
- Enhancing existing security infrastructure

### Out-of-Scope Use Cases

This model should not be used as:
- The sole security measure for web applications
- A replacement for traditional WAF rule-based systems
- A tool for generating malicious payloads
- A security measure for non-HTTP traffic

## Training Data

The model was trained on the CSIC 2010 HTTP Dataset, which includes:
- Normal HTTP requests
- Various attack patterns including SQL injection, XSS, buffer overflow
- A balanced distribution of benign and malicious requests

### Training Procedure

- Base model: DistilBERT-base-uncased
- Training type: Fine-tuning
- Training hardware: NVIDIA GPU
- Number of epochs: 3
- Batch size: 32
- Learning rate: 2e-5
- Optimizer: AdamW
- Loss function: Binary Cross-Entropy

## Performance and Limitations

### Performance Metrics

- Accuracy: >95%
- F1-Score: >0.94
- False Positive Rate: <1%
- Average inference time: <100ms per request

### Limitations

- Limited to HTTP request analysis
- May require retraining for organization-specific traffic patterns
- Performance may vary for zero-day attacks
- Best used in conjunction with traditional security measures

## Bias and Risks

### Bias

The model may show bias towards:
- Common attack patterns in the training data
- English-language payloads
- HTTP requests following standard web frameworks

### Risks

- False positives may block legitimate traffic
- False negatives could allow attacks through
- May require regular updates to maintain effectiveness
- Resource consumption under high load

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("jacpacd/waf-distilbert")
model = AutoModelForSequenceClassification.from_pretrained("jacpacd/waf-distilbert")

# Prepare input
request = "GET /admin?id=1 OR 1=1"
inputs = tokenizer(request, return_tensors="pt", truncation=True, max_length=512)

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    prediction = torch.sigmoid(outputs.logits)

is_malicious = prediction.item() > 0.5
confidence = prediction.item()
```

## Environmental Impact

- Model Size: ~268MB
- Inference Energy Cost: Low (compared to larger models)
- Training Energy Cost: Moderate

## Technical Specifications

- Model Architecture: DistilBERT
- Language(s): English
- License: MIT
- Input Format: Text (HTTP requests)
- Output Format: Binary classification with confidence score
- Model Size: 268MB
- Number of Parameters: ~65M

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{waf-distilbert,
  author = {jacpacd},
  title = {WAF-DistilBERT: Web Application Firewall using DistilBERT},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face model repository},
  howpublished = {\url{https://huggingface.co/jacpacd/waf-distilbert}}
}
```

## Contact

For questions and feedback about the model, please:
- Open an issue on GitHub
- Contact through Hugging Face
- Submit pull requests for improvements