File size: 6,387 Bytes

---
license: mit
language: en
tags:
  - cybersecurity
  - malicious-url-detection
  - url-classification
  - machine-learning
  - phishing-detection
pipeline_tag: text-classification
---
# Malicious URL Detection Models

This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories:
- **benign**
- **defacement**
- **malware**
- **phishing**

## Model Performance Summary

The following table summarizes the accuracy of each model on the test dataset:

| Model | Accuracy |
|-------|----------|
| **Extra Trees Classifier** | **97%** |
| **Random Forest** | **97%** |
| **Decision Tree** | **96%** |
| **MLP Classifier** | **96%** |
| **XGBoost** | **96%** |
| **Gradient Boosting Classifier** | **94%** |
| **Logistic Regression** | **87%** |
| **SGD Classifier** | **87%** |
| **Adaboost** | **85%** |
| **Gaussian Naive Bayes** | **80%** |

## Detailed Performance Reports

### Adaboost
- **Accuracy:** 0.85
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.90      0.97      0.93     85778
  defacement       0.82      0.76      0.79     19104
     malware       0.55      0.74      0.63      6521
    phishing       0.68      0.42      0.52     18836

    accuracy                           0.85    130239
   macro avg       0.74      0.72      0.72    130239
weighted avg       0.84      0.85      0.84    130239
```

### Decision Tree
- **Accuracy:** 0.96
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.98      0.99      0.98     19104
     malware       0.95      0.94      0.95      6521
    phishing       0.87      0.85      0.86     18836

    accuracy                           0.96    130239
   macro avg       0.95      0.94      0.94    130239
weighted avg       0.96      0.96      0.96    130239
```

### Extra Trees Classifier
- **Accuracy:** 0.97
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.98      0.99      0.99     19104
     malware       0.98      0.94      0.96      6521
    phishing       0.91      0.86      0.88     18836

    accuracy                           0.97    130239
   macro avg       0.96      0.95      0.95    130239
weighted avg       0.97      0.97      0.97    130239
```

### Gaussian Naive Bayes
- **Accuracy:** 0.80
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.86      0.90      0.88     85778
  defacement       0.67      0.99      0.80     19104
     malware       0.63      0.69      0.66      6521
    phishing       0.68      0.19      0.29     18836

    accuracy                           0.80    130239
   macro avg       0.71      0.69      0.66    130239
weighted avg       0.80      0.80      0.77    130239
```

### Gradient Boosting Classifier
- **Accuracy:** 0.94
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.96      0.99      0.97     85778
  defacement       0.92      0.97      0.94     19104
     malware       0.94      0.80      0.87      6521
    phishing       0.89      0.78      0.83     18836

    accuracy                           0.94    130239
   macro avg       0.93      0.88      0.90    130239
weighted avg       0.94      0.94      0.94    130239
```

### Logistic Regression
- **Accuracy:** 0.87
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.89      0.97      0.93     85778
  defacement       0.85      0.95      0.90     19104
     malware       0.81      0.69      0.74      6521
    phishing       0.77      0.42      0.55     18836

    accuracy                           0.87    130239
   macro avg       0.83      0.76      0.78    130239
weighted avg       0.87      0.87      0.86    130239
```

### MLP Classifier
- **Accuracy:** 0.96
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.97      0.98      0.98     85778
  defacement       0.97      0.97      0.97     19104
     malware       0.95      0.90      0.92      6521
    phishing       0.88      0.83      0.86     18836

    accuracy                           0.96    130239
   macro avg       0.94      0.92      0.93    130239
weighted avg       0.96      0.96      0.96    130239
```

### Random Forest
- **Accuracy:** 0.97
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.98      0.98      0.98     85778
  defacement       0.98      0.99      0.99     19104
     malware       0.98      0.94      0.96      6521
    phishing       0.91      0.87      0.89     18836

    accuracy                           0.97    130239
   macro avg       0.96      0.95      0.95    130239
weighted avg       0.97      0.97      0.97    130239
```

### SGD Classifier
- **Accuracy:** 0.87
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.89      0.96      0.93     85778
  defacement       0.83      0.95      0.89     19104
     malware       0.79      0.71      0.75      6521
    phishing       0.74      0.40      0.52     18836

    accuracy                           0.87    130239
   macro avg       0.81      0.76      0.77    130239
weighted avg       0.86      0.87      0.85    130239
```

### XGBoost
- **Accuracy:** 0.96
- **Report:**
```
              precision    recall  f1-score   support

      benign       0.97      0.99      0.98     85778
  defacement       0.97      0.99      0.98     19104
     malware       0.98      0.92      0.95      6521
    phishing       0.91      0.84      0.88     18836

    accuracy                           0.96    130239
   macro avg       0.96      0.93      0.95    130239
weighted avg       0.96      0.96      0.96    130239
```

## Usage

To load a model in Python, you can use `joblib` or `pickle`.

### Using joblib

```python
import joblib

# Load the model
model = joblib.load('models/random_forest.pkl')

# Make predictions
prediction = model.predict(X_test)
```

### Using pickle

```python
import pickle

# Load the model
with open('models/random_forest.pkl', 'rb') as f:
    model = pickle.load(f)

# Make predictions
prediction = model.predict(X_test)
```