π‘οΈ SQL Injection & XSS Attack Detection Models
Comprehensive machine learning model collection for detecting SQL Injection and Cross-Site Scripting (XSS) attacks.
π Model Statistics
| Category | Count | Best Accuracy |
|---|---|---|
| Classical ML | 9 | ~99.5% |
| Deep Learning | 5 | ~99.3% |
| Transformers | 1 | ~97.4% |
| Hybrid Models | 0 | ~99.6% |
| GNN Models | 2 | ~98.5% |
Total Models: 17
π Quick Start
from huggingface_hub import hf_hub_download
import pickle
# Download TF-IDF vectorizer
vectorizer_path = hf_hub_download(
repo_id="Dr-KeK/sqli-xss-models",
filename="features/tfidf_vectorizer.pkl"
)
# Download a model (e.g., XGBoost)
model_path = hf_hub_download(
repo_id="Dr-KeK/sqli-xss-models",
filename="models/classical_ml/XGBoost.pkl"
)
# Load and use
with open(vectorizer_path, 'rb') as f:
vectorizer = pickle.load(f)
with open(model_path, 'rb') as f:
model = pickle.load(f)
# Predict
query = "' OR '1'='1"
features = vectorizer.transform([query])
prediction = model.predict(features)
print("Attack detected!" if prediction[0] == 1 else "Safe query")
π Repository Structure
.
βββ models/
β βββ classical_ml/ # Sklearn models (XGBoost, RandomForest, etc.)
β βββ deep_learning/ # Keras models (MLP, CNN, LSTM, BiLSTM)
β βββ transformers/ # Fine-tuned DistilBERT, BERT
β βββ hybrid/ # Ensemble models
β βββ gnn/ # Graph Neural Networks
βββ features/
β βββ tfidf_vectorizer.pkl
β βββ word2vec.model
β βββ fasttext.model
βββ README.md
π¬ Preprocessing
All models use content-matching preprocessing:
- Number Generalization:
123βNUM - Keyword Preservation:
SELECTβSQL_SELECT - Special Character Mapping:
'βSQUOTE,=βEQUALS - TF-IDF Vectorization: 1000 features, bigrams
Example Preprocessing:
Input: ' OR '1'='1
Output: SQUOTE SQL_OR SQUOTE NUM SQUOTE EQUALS SQUOTE NUM
π― Performance
Classical ML Models
- XGBoost: 99.52% accuracy, 99.51% F1-score
- Random Forest: 99.48% accuracy
- Logistic Regression: 99.23% accuracy
Deep Learning Models
- BiLSTM: 99.27% accuracy, 99.26% F1-score
- CNN-LSTM: 99.04% accuracy
- LSTM: 99.19% accuracy
Transformer Models
- DistilBERT: 97.36% accuracy (fine-tuned)
πΎ Model Loading Examples
Classical ML (Scikit-learn)
import pickle
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="Dr-KeK/sqli-xss-models",
filename="models/classical_ml/RandomForest.pkl"
)
with open(model_path, 'rb') as f:
model = pickle.load(f)
Deep Learning (Keras)
from tensorflow import keras
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="Dr-KeK/sqli-xss-models",
filename="models/deep_learning/BiLSTM.h5"
)
model = keras.models.load_model(model_path)
Transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "DistilBERT"
tokenizer = AutoTokenizer.from_pretrained(
"Dr-KeK/sqli-xss-models",
subfolder=f"models/transformers/{model_name}"
)
model = AutoModelForSequenceClassification.from_pretrained(
"Dr-KeK/sqli-xss-models",
subfolder=f"models/transformers/{model_name}"
)
π Datasets Used
- XSS Dataset: 13,686 samples
- SQL Injection Train: 98,062 samples
- SQL Injection Test: 32,688 samples
- SQL Injection Validation: 32,687 samples
- Modified SQL Dataset: 30,919 samples
Total: 174,353 unique samples (after deduplication)
π Citation
If you use these models in your research, please cite:
@misc{sqli-xss-models-2026,
author = {Dr-KeK},
title = {SQL Injection & XSS Attack Detection Models},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Dr-KeK/sqli-xss-models}}
}
π License
MIT License - Free for academic and commercial use
π€ Contributing
Issues and pull requests welcome! For major changes, please open an issue first.
β οΈ Disclaimer
These models are for educational and research purposes. Always combine ML detection with other security measures (input validation, parameterized queries, CSP headers).
Built with: Scikit-learn, TensorFlow, PyTorch, Transformers, XGBoost
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support