--- license: mit language: en tags: - cybersecurity - malicious-url-detection - url-classification - machine-learning - phishing-detection pipeline_tag: text-classification --- # Malicious URL Detection Models This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories: - **benign** - **defacement** - **malware** - **phishing** ## Model Performance Summary The following table summarizes the accuracy of each model on the test dataset: | Model | Accuracy | |-------|----------| | **Extra Trees Classifier** | **97%** | | **Random Forest** | **97%** | | **Decision Tree** | **96%** | | **MLP Classifier** | **96%** | | **XGBoost** | **96%** | | **Gradient Boosting Classifier** | **94%** | | **Logistic Regression** | **87%** | | **SGD Classifier** | **87%** | | **Adaboost** | **85%** | | **Gaussian Naive Bayes** | **80%** | ## Detailed Performance Reports ### Adaboost - **Accuracy:** 0.85 - **Report:** ``` precision recall f1-score support benign 0.90 0.97 0.93 85778 defacement 0.82 0.76 0.79 19104 malware 0.55 0.74 0.63 6521 phishing 0.68 0.42 0.52 18836 accuracy 0.85 130239 macro avg 0.74 0.72 0.72 130239 weighted avg 0.84 0.85 0.84 130239 ``` ### Decision Tree - **Accuracy:** 0.96 - **Report:** ``` precision recall f1-score support benign 0.97 0.98 0.98 85778 defacement 0.98 0.99 0.98 19104 malware 0.95 0.94 0.95 6521 phishing 0.87 0.85 0.86 18836 accuracy 0.96 130239 macro avg 0.95 0.94 0.94 130239 weighted avg 0.96 0.96 0.96 130239 ``` ### Extra Trees Classifier - **Accuracy:** 0.97 - **Report:** ``` precision recall f1-score support benign 0.97 0.98 0.98 85778 defacement 0.98 0.99 0.99 19104 malware 0.98 0.94 0.96 6521 phishing 0.91 0.86 0.88 18836 accuracy 0.97 130239 macro avg 0.96 0.95 0.95 130239 weighted avg 0.97 0.97 0.97 130239 ``` ### Gaussian Naive Bayes - **Accuracy:** 0.80 - **Report:** ``` precision recall f1-score support benign 0.86 0.90 0.88 85778 defacement 0.67 0.99 0.80 19104 malware 0.63 0.69 0.66 6521 phishing 0.68 0.19 0.29 18836 accuracy 0.80 130239 macro avg 0.71 0.69 0.66 130239 weighted avg 0.80 0.80 0.77 130239 ``` ### Gradient Boosting Classifier - **Accuracy:** 0.94 - **Report:** ``` precision recall f1-score support benign 0.96 0.99 0.97 85778 defacement 0.92 0.97 0.94 19104 malware 0.94 0.80 0.87 6521 phishing 0.89 0.78 0.83 18836 accuracy 0.94 130239 macro avg 0.93 0.88 0.90 130239 weighted avg 0.94 0.94 0.94 130239 ``` ### Logistic Regression - **Accuracy:** 0.87 - **Report:** ``` precision recall f1-score support benign 0.89 0.97 0.93 85778 defacement 0.85 0.95 0.90 19104 malware 0.81 0.69 0.74 6521 phishing 0.77 0.42 0.55 18836 accuracy 0.87 130239 macro avg 0.83 0.76 0.78 130239 weighted avg 0.87 0.87 0.86 130239 ``` ### MLP Classifier - **Accuracy:** 0.96 - **Report:** ``` precision recall f1-score support benign 0.97 0.98 0.98 85778 defacement 0.97 0.97 0.97 19104 malware 0.95 0.90 0.92 6521 phishing 0.88 0.83 0.86 18836 accuracy 0.96 130239 macro avg 0.94 0.92 0.93 130239 weighted avg 0.96 0.96 0.96 130239 ``` ### Random Forest - **Accuracy:** 0.97 - **Report:** ``` precision recall f1-score support benign 0.98 0.98 0.98 85778 defacement 0.98 0.99 0.99 19104 malware 0.98 0.94 0.96 6521 phishing 0.91 0.87 0.89 18836 accuracy 0.97 130239 macro avg 0.96 0.95 0.95 130239 weighted avg 0.97 0.97 0.97 130239 ``` ### SGD Classifier - **Accuracy:** 0.87 - **Report:** ``` precision recall f1-score support benign 0.89 0.96 0.93 85778 defacement 0.83 0.95 0.89 19104 malware 0.79 0.71 0.75 6521 phishing 0.74 0.40 0.52 18836 accuracy 0.87 130239 macro avg 0.81 0.76 0.77 130239 weighted avg 0.86 0.87 0.85 130239 ``` ### XGBoost - **Accuracy:** 0.96 - **Report:** ``` precision recall f1-score support benign 0.97 0.99 0.98 85778 defacement 0.97 0.99 0.98 19104 malware 0.98 0.92 0.95 6521 phishing 0.91 0.84 0.88 18836 accuracy 0.96 130239 macro avg 0.96 0.93 0.95 130239 weighted avg 0.96 0.96 0.96 130239 ``` ## Usage To load a model in Python, you can use `joblib` or `pickle`. ### Using joblib ```python import joblib # Load the model model = joblib.load('models/random_forest.pkl') # Make predictions prediction = model.predict(X_test) ``` ### Using pickle ```python import pickle # Load the model with open('models/random_forest.pkl', 'rb') as f: model = pickle.load(f) # Make predictions prediction = model.predict(X_test) ```