khagu's picture
Update README.md
7b54266 verified
---
license: mit
language: en
tags:
- cybersecurity
- malicious-url-detection
- url-classification
- machine-learning
- phishing-detection
pipeline_tag: text-classification
---
# Malicious URL Detection Models
This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories:
- **benign**
- **defacement**
- **malware**
- **phishing**
## Model Performance Summary
The following table summarizes the accuracy of each model on the test dataset:
| Model | Accuracy |
|-------|----------|
| **Extra Trees Classifier** | **97%** |
| **Random Forest** | **97%** |
| **Decision Tree** | **96%** |
| **MLP Classifier** | **96%** |
| **XGBoost** | **96%** |
| **Gradient Boosting Classifier** | **94%** |
| **Logistic Regression** | **87%** |
| **SGD Classifier** | **87%** |
| **Adaboost** | **85%** |
| **Gaussian Naive Bayes** | **80%** |
## Detailed Performance Reports
### Adaboost
- **Accuracy:** 0.85
- **Report:**
```
precision recall f1-score support
benign 0.90 0.97 0.93 85778
defacement 0.82 0.76 0.79 19104
malware 0.55 0.74 0.63 6521
phishing 0.68 0.42 0.52 18836
accuracy 0.85 130239
macro avg 0.74 0.72 0.72 130239
weighted avg 0.84 0.85 0.84 130239
```
### Decision Tree
- **Accuracy:** 0.96
- **Report:**
```
precision recall f1-score support
benign 0.97 0.98 0.98 85778
defacement 0.98 0.99 0.98 19104
malware 0.95 0.94 0.95 6521
phishing 0.87 0.85 0.86 18836
accuracy 0.96 130239
macro avg 0.95 0.94 0.94 130239
weighted avg 0.96 0.96 0.96 130239
```
### Extra Trees Classifier
- **Accuracy:** 0.97
- **Report:**
```
precision recall f1-score support
benign 0.97 0.98 0.98 85778
defacement 0.98 0.99 0.99 19104
malware 0.98 0.94 0.96 6521
phishing 0.91 0.86 0.88 18836
accuracy 0.97 130239
macro avg 0.96 0.95 0.95 130239
weighted avg 0.97 0.97 0.97 130239
```
### Gaussian Naive Bayes
- **Accuracy:** 0.80
- **Report:**
```
precision recall f1-score support
benign 0.86 0.90 0.88 85778
defacement 0.67 0.99 0.80 19104
malware 0.63 0.69 0.66 6521
phishing 0.68 0.19 0.29 18836
accuracy 0.80 130239
macro avg 0.71 0.69 0.66 130239
weighted avg 0.80 0.80 0.77 130239
```
### Gradient Boosting Classifier
- **Accuracy:** 0.94
- **Report:**
```
precision recall f1-score support
benign 0.96 0.99 0.97 85778
defacement 0.92 0.97 0.94 19104
malware 0.94 0.80 0.87 6521
phishing 0.89 0.78 0.83 18836
accuracy 0.94 130239
macro avg 0.93 0.88 0.90 130239
weighted avg 0.94 0.94 0.94 130239
```
### Logistic Regression
- **Accuracy:** 0.87
- **Report:**
```
precision recall f1-score support
benign 0.89 0.97 0.93 85778
defacement 0.85 0.95 0.90 19104
malware 0.81 0.69 0.74 6521
phishing 0.77 0.42 0.55 18836
accuracy 0.87 130239
macro avg 0.83 0.76 0.78 130239
weighted avg 0.87 0.87 0.86 130239
```
### MLP Classifier
- **Accuracy:** 0.96
- **Report:**
```
precision recall f1-score support
benign 0.97 0.98 0.98 85778
defacement 0.97 0.97 0.97 19104
malware 0.95 0.90 0.92 6521
phishing 0.88 0.83 0.86 18836
accuracy 0.96 130239
macro avg 0.94 0.92 0.93 130239
weighted avg 0.96 0.96 0.96 130239
```
### Random Forest
- **Accuracy:** 0.97
- **Report:**
```
precision recall f1-score support
benign 0.98 0.98 0.98 85778
defacement 0.98 0.99 0.99 19104
malware 0.98 0.94 0.96 6521
phishing 0.91 0.87 0.89 18836
accuracy 0.97 130239
macro avg 0.96 0.95 0.95 130239
weighted avg 0.97 0.97 0.97 130239
```
### SGD Classifier
- **Accuracy:** 0.87
- **Report:**
```
precision recall f1-score support
benign 0.89 0.96 0.93 85778
defacement 0.83 0.95 0.89 19104
malware 0.79 0.71 0.75 6521
phishing 0.74 0.40 0.52 18836
accuracy 0.87 130239
macro avg 0.81 0.76 0.77 130239
weighted avg 0.86 0.87 0.85 130239
```
### XGBoost
- **Accuracy:** 0.96
- **Report:**
```
precision recall f1-score support
benign 0.97 0.99 0.98 85778
defacement 0.97 0.99 0.98 19104
malware 0.98 0.92 0.95 6521
phishing 0.91 0.84 0.88 18836
accuracy 0.96 130239
macro avg 0.96 0.93 0.95 130239
weighted avg 0.96 0.96 0.96 130239
```
## Usage
To load a model in Python, you can use `joblib` or `pickle`.
### Using joblib
```python
import joblib
# Load the model
model = joblib.load('models/random_forest.pkl')
# Make predictions
prediction = model.predict(X_test)
```
### Using pickle
```python
import pickle
# Load the model
with open('models/random_forest.pkl', 'rb') as f:
model = pickle.load(f)
# Make predictions
prediction = model.predict(X_test)
```