🛡️ SQL Injection & XSS Attack Detection Models

Comprehensive machine learning model collection for detecting SQL Injection and Cross-Site Scripting (XSS) attacks.

📊 Model Statistics

Category	Count	Best Accuracy
Classical ML	9	~99.5%
Deep Learning	5	~99.3%
Transformers	1	~97.4%
Hybrid Models	0	~99.6%
GNN Models	2	~98.5%

Total Models: 17

🚀 Quick Start

from huggingface_hub import hf_hub_download
import pickle

# Download TF-IDF vectorizer
vectorizer_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="features/tfidf_vectorizer.pkl"
)

# Download a model (e.g., XGBoost)
model_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="models/classical_ml/XGBoost.pkl"
)

# Load and use
with open(vectorizer_path, 'rb') as f:
    vectorizer = pickle.load(f)

with open(model_path, 'rb') as f:
    model = pickle.load(f)

# Predict
query = "' OR '1'='1"
features = vectorizer.transform([query])
prediction = model.predict(features)
print("Attack detected!" if prediction[0] == 1 else "Safe query")

📁 Repository Structure

.
├── models/
│   ├── classical_ml/      # Sklearn models (XGBoost, RandomForest, etc.)
│   ├── deep_learning/     # Keras models (MLP, CNN, LSTM, BiLSTM)
│   ├── transformers/      # Fine-tuned DistilBERT, BERT
│   ├── hybrid/            # Ensemble models
│   └── gnn/               # Graph Neural Networks
├── features/
│   ├── tfidf_vectorizer.pkl
│   ├── word2vec.model
│   └── fasttext.model
└── README.md

🔬 Preprocessing

All models use content-matching preprocessing:

Number Generalization: 123 → NUM
Keyword Preservation: SELECT → SQL_SELECT
Special Character Mapping: ' → SQUOTE, = → EQUALS
TF-IDF Vectorization: 1000 features, bigrams

Example Preprocessing:

Input:  ' OR '1'='1
Output: SQUOTE SQL_OR SQUOTE NUM SQUOTE EQUALS SQUOTE NUM

🎯 Performance

Classical ML Models

XGBoost: 99.52% accuracy, 99.51% F1-score
Random Forest: 99.48% accuracy
Logistic Regression: 99.23% accuracy

Deep Learning Models

BiLSTM: 99.27% accuracy, 99.26% F1-score
CNN-LSTM: 99.04% accuracy
LSTM: 99.19% accuracy

Transformer Models

DistilBERT: 97.36% accuracy (fine-tuned)

💾 Model Loading Examples

Classical ML (Scikit-learn)

import pickle
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="models/classical_ml/RandomForest.pkl"
)
with open(model_path, 'rb') as f:
    model = pickle.load(f)

Deep Learning (Keras)

from tensorflow import keras
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="models/deep_learning/BiLSTM.h5"
)
model = keras.models.load_model(model_path)

Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "DistilBERT"
tokenizer = AutoTokenizer.from_pretrained(
    "Dr-KeK/sqli-xss-models",
    subfolder=f"models/transformers/{model_name}"
)
model = AutoModelForSequenceClassification.from_pretrained(
    "Dr-KeK/sqli-xss-models",
    subfolder=f"models/transformers/{model_name}"
)

📚 Datasets Used

XSS Dataset: 13,686 samples
SQL Injection Train: 98,062 samples
SQL Injection Test: 32,688 samples
SQL Injection Validation: 32,687 samples
Modified SQL Dataset: 30,919 samples

Total: 174,353 unique samples (after deduplication)

🏆 Citation

If you use these models in your research, please cite:

@misc{sqli-xss-models-2026,
  author = {Dr-KeK},
  title = {SQL Injection & XSS Attack Detection Models},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Dr-KeK/sqli-xss-models}}
}

📄 License

MIT License - Free for academic and commercial use

🤝 Contributing

Issues and pull requests welcome! For major changes, please open an issue first.

⚠️ Disclaimer

These models are for educational and research purposes. Always combine ML detection with other security measures (input validation, parameterized queries, CSP headers).

Built with: Scikit-learn, TensorFlow, PyTorch, Transformers, XGBoost

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support