πŸ›‘οΈ SQL Injection & XSS Attack Detection Models

Comprehensive machine learning model collection for detecting SQL Injection and Cross-Site Scripting (XSS) attacks.

πŸ“Š Model Statistics

Category Count Best Accuracy
Classical ML 9 ~99.5%
Deep Learning 5 ~99.3%
Transformers 1 ~97.4%
Hybrid Models 0 ~99.6%
GNN Models 2 ~98.5%

Total Models: 17

πŸš€ Quick Start

from huggingface_hub import hf_hub_download
import pickle

# Download TF-IDF vectorizer
vectorizer_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="features/tfidf_vectorizer.pkl"
)

# Download a model (e.g., XGBoost)
model_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="models/classical_ml/XGBoost.pkl"
)

# Load and use
with open(vectorizer_path, 'rb') as f:
    vectorizer = pickle.load(f)

with open(model_path, 'rb') as f:
    model = pickle.load(f)

# Predict
query = "' OR '1'='1"
features = vectorizer.transform([query])
prediction = model.predict(features)
print("Attack detected!" if prediction[0] == 1 else "Safe query")

πŸ“ Repository Structure

.
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ classical_ml/      # Sklearn models (XGBoost, RandomForest, etc.)
β”‚   β”œβ”€β”€ deep_learning/     # Keras models (MLP, CNN, LSTM, BiLSTM)
β”‚   β”œβ”€β”€ transformers/      # Fine-tuned DistilBERT, BERT
β”‚   β”œβ”€β”€ hybrid/            # Ensemble models
β”‚   └── gnn/               # Graph Neural Networks
β”œβ”€β”€ features/
β”‚   β”œβ”€β”€ tfidf_vectorizer.pkl
β”‚   β”œβ”€β”€ word2vec.model
β”‚   └── fasttext.model
└── README.md

πŸ”¬ Preprocessing

All models use content-matching preprocessing:

  1. Number Generalization: 123 β†’ NUM
  2. Keyword Preservation: SELECT β†’ SQL_SELECT
  3. Special Character Mapping: ' β†’ SQUOTE, = β†’ EQUALS
  4. TF-IDF Vectorization: 1000 features, bigrams

Example Preprocessing:

Input:  ' OR '1'='1
Output: SQUOTE SQL_OR SQUOTE NUM SQUOTE EQUALS SQUOTE NUM

🎯 Performance

Classical ML Models

  • XGBoost: 99.52% accuracy, 99.51% F1-score
  • Random Forest: 99.48% accuracy
  • Logistic Regression: 99.23% accuracy

Deep Learning Models

  • BiLSTM: 99.27% accuracy, 99.26% F1-score
  • CNN-LSTM: 99.04% accuracy
  • LSTM: 99.19% accuracy

Transformer Models

  • DistilBERT: 97.36% accuracy (fine-tuned)

πŸ’Ύ Model Loading Examples

Classical ML (Scikit-learn)

import pickle
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="models/classical_ml/RandomForest.pkl"
)
with open(model_path, 'rb') as f:
    model = pickle.load(f)

Deep Learning (Keras)

from tensorflow import keras
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="Dr-KeK/sqli-xss-models",
    filename="models/deep_learning/BiLSTM.h5"
)
model = keras.models.load_model(model_path)

Transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "DistilBERT"
tokenizer = AutoTokenizer.from_pretrained(
    "Dr-KeK/sqli-xss-models",
    subfolder=f"models/transformers/{model_name}"
)
model = AutoModelForSequenceClassification.from_pretrained(
    "Dr-KeK/sqli-xss-models",
    subfolder=f"models/transformers/{model_name}"
)

πŸ“š Datasets Used

  • XSS Dataset: 13,686 samples
  • SQL Injection Train: 98,062 samples
  • SQL Injection Test: 32,688 samples
  • SQL Injection Validation: 32,687 samples
  • Modified SQL Dataset: 30,919 samples

Total: 174,353 unique samples (after deduplication)

πŸ† Citation

If you use these models in your research, please cite:

@misc{sqli-xss-models-2026,
  author = {Dr-KeK},
  title = {SQL Injection & XSS Attack Detection Models},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Dr-KeK/sqli-xss-models}}
}

πŸ“„ License

MIT License - Free for academic and commercial use

🀝 Contributing

Issues and pull requests welcome! For major changes, please open an issue first.

⚠️ Disclaimer

These models are for educational and research purposes. Always combine ML detection with other security measures (input validation, parameterized queries, CSP headers).


Built with: Scikit-learn, TensorFlow, PyTorch, Transformers, XGBoost

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support