jacpacd commited on
Commit
c2164a2
·
verified ·
1 Parent(s): 5f82d39

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # WAF-DistilBERT: Web Application Firewall using DistilBERT
2
+
3
+ ## Model Description
4
+
5
+ WAF-DistilBERT is a fine-tuned version of DistilBERT, specifically trained to detect malicious web requests in real-time. This model serves as the core component of a Web Application Firewall (WAF) system.
6
+
7
+ ## Intended Use
8
+
9
+ This model is designed for:
10
+ - Real-time detection of malicious web requests
11
+ - Integration into web application security systems
12
+ - Identifying common web attacks like SQL injection, XSS, and path traversal
13
+ - Enhancing existing security infrastructure
14
+
15
+ ### Out-of-Scope Use Cases
16
+
17
+ This model should not be used as:
18
+ - The sole security measure for web applications
19
+ - A replacement for traditional WAF rule-based systems
20
+ - A tool for generating malicious payloads
21
+ - A security measure for non-HTTP traffic
22
+
23
+ ## Training Data
24
+
25
+ The model was trained on the CSIC 2010 HTTP Dataset, which includes:
26
+ - Normal HTTP requests
27
+ - Various attack patterns including SQL injection, XSS, buffer overflow
28
+ - A balanced distribution of benign and malicious requests
29
+
30
+ ### Training Procedure
31
+
32
+ - Base model: DistilBERT-base-uncased
33
+ - Training type: Fine-tuning
34
+ - Training hardware: NVIDIA GPU
35
+ - Number of epochs: 3
36
+ - Batch size: 32
37
+ - Learning rate: 2e-5
38
+ - Optimizer: AdamW
39
+ - Loss function: Binary Cross-Entropy
40
+
41
+ ## Performance and Limitations
42
+
43
+ ### Performance Metrics
44
+
45
+ - Accuracy: >95%
46
+ - F1-Score: >0.94
47
+ - False Positive Rate: <1%
48
+ - Average inference time: <100ms per request
49
+
50
+ ### Limitations
51
+
52
+ - Limited to HTTP request analysis
53
+ - May require retraining for organization-specific traffic patterns
54
+ - Performance may vary for zero-day attacks
55
+ - Best used in conjunction with traditional security measures
56
+
57
+ ## Bias and Risks
58
+
59
+ ### Bias
60
+
61
+ The model may show bias towards:
62
+ - Common attack patterns in the training data
63
+ - English-language payloads
64
+ - HTTP requests following standard web frameworks
65
+
66
+ ### Risks
67
+
68
+ - False positives may block legitimate traffic
69
+ - False negatives could allow attacks through
70
+ - May require regular updates to maintain effectiveness
71
+ - Resource consumption under high load
72
+
73
+ ## Usage
74
+
75
+ ```python
76
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
77
+ import torch
78
+
79
+ # Load model and tokenizer
80
+ tokenizer = AutoTokenizer.from_pretrained("jacpacd/waf-distilbert")
81
+ model = AutoModelForSequenceClassification.from_pretrained("jacpacd/waf-distilbert")
82
+
83
+ # Prepare input
84
+ request = "GET /admin?id=1 OR 1=1"
85
+ inputs = tokenizer(request, return_tensors="pt", truncation=True, max_length=512)
86
+
87
+ # Make prediction
88
+ with torch.no_grad():
89
+ outputs = model(**inputs)
90
+ prediction = torch.sigmoid(outputs.logits)
91
+
92
+ is_malicious = prediction.item() > 0.5
93
+ confidence = prediction.item()
94
+ ```
95
+
96
+ ## Environmental Impact
97
+
98
+ - Model Size: ~268MB
99
+ - Inference Energy Cost: Low (compared to larger models)
100
+ - Training Energy Cost: Moderate
101
+
102
+ ## Technical Specifications
103
+
104
+ - Model Architecture: DistilBERT
105
+ - Language(s): English
106
+ - License: MIT
107
+ - Input Format: Text (HTTP requests)
108
+ - Output Format: Binary classification with confidence score
109
+ - Model Size: 268MB
110
+ - Number of Parameters: ~65M
111
+
112
+ ## Citation
113
+
114
+ If you use this model in your research, please cite:
115
+
116
+ ```bibtex
117
+ @misc{waf-distilbert,
118
+ author = {jacpacd},
119
+ title = {WAF-DistilBERT: Web Application Firewall using DistilBERT},
120
+ year = {2025},
121
+ publisher = {Hugging Face},
122
+ journal = {Hugging Face model repository},
123
+ howpublished = {\url{https://huggingface.co/jacpacd/waf-distilbert}}
124
+ }
125
+ ```
126
+
127
+ ## Contact
128
+
129
+ For questions and feedback about the model, please:
130
+ - Open an issue on GitHub
131
+ - Contact through Hugging Face
132
+ - Submit pull requests for improvements