| | --- |
| | license: mit |
| | language: en |
| | tags: |
| | - cybersecurity |
| | - malicious-url-detection |
| | - url-classification |
| | - machine-learning |
| | - phishing-detection |
| | pipeline_tag: text-classification |
| | --- |
| | # Malicious URL Detection Models |
| |
|
| | This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories: |
| | - **benign** |
| | - **defacement** |
| | - **malware** |
| | - **phishing** |
| |
|
| | ## Model Performance Summary |
| |
|
| | The following table summarizes the accuracy of each model on the test dataset: |
| |
|
| | | Model | Accuracy | |
| | |-------|----------| |
| | | **Extra Trees Classifier** | **97%** | |
| | | **Random Forest** | **97%** | |
| | | **Decision Tree** | **96%** | |
| | | **MLP Classifier** | **96%** | |
| | | **XGBoost** | **96%** | |
| | | **Gradient Boosting Classifier** | **94%** | |
| | | **Logistic Regression** | **87%** | |
| | | **SGD Classifier** | **87%** | |
| | | **Adaboost** | **85%** | |
| | | **Gaussian Naive Bayes** | **80%** | |
| |
|
| | ## Detailed Performance Reports |
| |
|
| | ### Adaboost |
| | - **Accuracy:** 0.85 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.90 0.97 0.93 85778 |
| | defacement 0.82 0.76 0.79 19104 |
| | malware 0.55 0.74 0.63 6521 |
| | phishing 0.68 0.42 0.52 18836 |
| | |
| | accuracy 0.85 130239 |
| | macro avg 0.74 0.72 0.72 130239 |
| | weighted avg 0.84 0.85 0.84 130239 |
| | ``` |
| |
|
| | ### Decision Tree |
| | - **Accuracy:** 0.96 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.97 0.98 0.98 85778 |
| | defacement 0.98 0.99 0.98 19104 |
| | malware 0.95 0.94 0.95 6521 |
| | phishing 0.87 0.85 0.86 18836 |
| | |
| | accuracy 0.96 130239 |
| | macro avg 0.95 0.94 0.94 130239 |
| | weighted avg 0.96 0.96 0.96 130239 |
| | ``` |
| |
|
| | ### Extra Trees Classifier |
| | - **Accuracy:** 0.97 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.97 0.98 0.98 85778 |
| | defacement 0.98 0.99 0.99 19104 |
| | malware 0.98 0.94 0.96 6521 |
| | phishing 0.91 0.86 0.88 18836 |
| | |
| | accuracy 0.97 130239 |
| | macro avg 0.96 0.95 0.95 130239 |
| | weighted avg 0.97 0.97 0.97 130239 |
| | ``` |
| |
|
| | ### Gaussian Naive Bayes |
| | - **Accuracy:** 0.80 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.86 0.90 0.88 85778 |
| | defacement 0.67 0.99 0.80 19104 |
| | malware 0.63 0.69 0.66 6521 |
| | phishing 0.68 0.19 0.29 18836 |
| | |
| | accuracy 0.80 130239 |
| | macro avg 0.71 0.69 0.66 130239 |
| | weighted avg 0.80 0.80 0.77 130239 |
| | ``` |
| |
|
| | ### Gradient Boosting Classifier |
| | - **Accuracy:** 0.94 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.96 0.99 0.97 85778 |
| | defacement 0.92 0.97 0.94 19104 |
| | malware 0.94 0.80 0.87 6521 |
| | phishing 0.89 0.78 0.83 18836 |
| | |
| | accuracy 0.94 130239 |
| | macro avg 0.93 0.88 0.90 130239 |
| | weighted avg 0.94 0.94 0.94 130239 |
| | ``` |
| |
|
| | ### Logistic Regression |
| | - **Accuracy:** 0.87 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.89 0.97 0.93 85778 |
| | defacement 0.85 0.95 0.90 19104 |
| | malware 0.81 0.69 0.74 6521 |
| | phishing 0.77 0.42 0.55 18836 |
| | |
| | accuracy 0.87 130239 |
| | macro avg 0.83 0.76 0.78 130239 |
| | weighted avg 0.87 0.87 0.86 130239 |
| | ``` |
| |
|
| | ### MLP Classifier |
| | - **Accuracy:** 0.96 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.97 0.98 0.98 85778 |
| | defacement 0.97 0.97 0.97 19104 |
| | malware 0.95 0.90 0.92 6521 |
| | phishing 0.88 0.83 0.86 18836 |
| | |
| | accuracy 0.96 130239 |
| | macro avg 0.94 0.92 0.93 130239 |
| | weighted avg 0.96 0.96 0.96 130239 |
| | ``` |
| |
|
| | ### Random Forest |
| | - **Accuracy:** 0.97 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.98 0.98 0.98 85778 |
| | defacement 0.98 0.99 0.99 19104 |
| | malware 0.98 0.94 0.96 6521 |
| | phishing 0.91 0.87 0.89 18836 |
| | |
| | accuracy 0.97 130239 |
| | macro avg 0.96 0.95 0.95 130239 |
| | weighted avg 0.97 0.97 0.97 130239 |
| | ``` |
| |
|
| | ### SGD Classifier |
| | - **Accuracy:** 0.87 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.89 0.96 0.93 85778 |
| | defacement 0.83 0.95 0.89 19104 |
| | malware 0.79 0.71 0.75 6521 |
| | phishing 0.74 0.40 0.52 18836 |
| | |
| | accuracy 0.87 130239 |
| | macro avg 0.81 0.76 0.77 130239 |
| | weighted avg 0.86 0.87 0.85 130239 |
| | ``` |
| |
|
| | ### XGBoost |
| | - **Accuracy:** 0.96 |
| | - **Report:** |
| | ``` |
| | precision recall f1-score support |
| | |
| | benign 0.97 0.99 0.98 85778 |
| | defacement 0.97 0.99 0.98 19104 |
| | malware 0.98 0.92 0.95 6521 |
| | phishing 0.91 0.84 0.88 18836 |
| | |
| | accuracy 0.96 130239 |
| | macro avg 0.96 0.93 0.95 130239 |
| | weighted avg 0.96 0.96 0.96 130239 |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | To load a model in Python, you can use `joblib` or `pickle`. |
| |
|
| | ### Using joblib |
| |
|
| | ```python |
| | import joblib |
| | |
| | # Load the model |
| | model = joblib.load('models/random_forest.pkl') |
| | |
| | # Make predictions |
| | prediction = model.predict(X_test) |
| | ``` |
| |
|
| | ### Using pickle |
| |
|
| | ```python |
| | import pickle |
| | |
| | # Load the model |
| | with open('models/random_forest.pkl', 'rb') as f: |
| | model = pickle.load(f) |
| | |
| | # Make predictions |
| | prediction = model.predict(X_test) |
| | ``` |
| |
|