# WAF-DistilBERT: Web Application Firewall using DistilBERT ## Model Description WAF-DistilBERT is a fine-tuned version of DistilBERT, specifically trained to detect malicious web requests in real-time. This model serves as the core component of a Web Application Firewall (WAF) system. ## Intended Use This model is designed for: - Real-time detection of malicious web requests - Integration into web application security systems - Identifying common web attacks like SQL injection, XSS, and path traversal - Enhancing existing security infrastructure ### Out-of-Scope Use Cases This model should not be used as: - The sole security measure for web applications - A replacement for traditional WAF rule-based systems - A tool for generating malicious payloads - A security measure for non-HTTP traffic ## Training Data The model was trained on the CSIC 2010 HTTP Dataset, which includes: - Normal HTTP requests - Various attack patterns including SQL injection, XSS, buffer overflow - A balanced distribution of benign and malicious requests ### Training Procedure - Base model: DistilBERT-base-uncased - Training type: Fine-tuning - Training hardware: NVIDIA GPU - Number of epochs: 3 - Batch size: 32 - Learning rate: 2e-5 - Optimizer: AdamW - Loss function: Binary Cross-Entropy ## Performance and Limitations ### Performance Metrics - Accuracy: >95% - F1-Score: >0.94 - False Positive Rate: <1% - Average inference time: <100ms per request ### Limitations - Limited to HTTP request analysis - May require retraining for organization-specific traffic patterns - Performance may vary for zero-day attacks - Best used in conjunction with traditional security measures ## Bias and Risks ### Bias The model may show bias towards: - Common attack patterns in the training data - English-language payloads - HTTP requests following standard web frameworks ### Risks - False positives may block legitimate traffic - False negatives could allow attacks through - May require regular updates to maintain effectiveness - Resource consumption under high load ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("jacpacd/waf-distilbert") model = AutoModelForSequenceClassification.from_pretrained("jacpacd/waf-distilbert") # Prepare input request = "GET /admin?id=1 OR 1=1" inputs = tokenizer(request, return_tensors="pt", truncation=True, max_length=512) # Make prediction with torch.no_grad(): outputs = model(**inputs) prediction = torch.sigmoid(outputs.logits) is_malicious = prediction.item() > 0.5 confidence = prediction.item() ``` ## Environmental Impact - Model Size: ~268MB - Inference Energy Cost: Low (compared to larger models) - Training Energy Cost: Moderate ## Technical Specifications - Model Architecture: DistilBERT - Language(s): English - License: MIT - Input Format: Text (HTTP requests) - Output Format: Binary classification with confidence score - Model Size: 268MB - Number of Parameters: ~65M ## Citation If you use this model in your research, please cite: ```bibtex @misc{waf-distilbert, author = {jacpacd}, title = {WAF-DistilBERT: Web Application Firewall using DistilBERT}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face model repository}, howpublished = {\url{https://huggingface.co/jacpacd/waf-distilbert}} } ``` ## Contact For questions and feedback about the model, please: - Open an issue on GitHub - Contact through Hugging Face - Submit pull requests for improvements