Update README.md

7b54266 verified 3 months ago

6.39 kB

	---
	license: mit
	language: en
	tags:
	- cybersecurity
	- malicious-url-detection
	- url-classification
	- machine-learning
	- phishing-detection
	pipeline_tag: text-classification
	---
	# Malicious URL Detection Models

	This directory contains trained machine learning models for detecting malicious URLs. The models are trained to classify URLs into four categories:
	- benign
	- defacement
	- malware
	- phishing

	## Model Performance Summary

	The following table summarizes the accuracy of each model on the test dataset:

	\| Model \| Accuracy \|
	\|-------\|----------\|
	\| Extra Trees Classifier \| 97% \|
	\| Random Forest \| 97% \|
	\| Decision Tree \| 96% \|
	\| MLP Classifier \| 96% \|
	\| XGBoost \| 96% \|
	\| Gradient Boosting Classifier \| 94% \|
	\| Logistic Regression \| 87% \|
	\| SGD Classifier \| 87% \|
	\| Adaboost \| 85% \|
	\| Gaussian Naive Bayes \| 80% \|

	## Detailed Performance Reports

	### Adaboost
	- Accuracy: 0.85
	- Report:
	```
	precision recall f1-score support

	benign 0.90 0.97 0.93 85778
	defacement 0.82 0.76 0.79 19104
	malware 0.55 0.74 0.63 6521
	phishing 0.68 0.42 0.52 18836

	accuracy 0.85 130239
	macro avg 0.74 0.72 0.72 130239
	weighted avg 0.84 0.85 0.84 130239
	```

	### Decision Tree
	- Accuracy: 0.96
	- Report:
	```
	precision recall f1-score support

	benign 0.97 0.98 0.98 85778
	defacement 0.98 0.99 0.98 19104
	malware 0.95 0.94 0.95 6521
	phishing 0.87 0.85 0.86 18836

	accuracy 0.96 130239
	macro avg 0.95 0.94 0.94 130239
	weighted avg 0.96 0.96 0.96 130239
	```

	### Extra Trees Classifier
	- Accuracy: 0.97
	- Report:
	```
	precision recall f1-score support

	benign 0.97 0.98 0.98 85778
	defacement 0.98 0.99 0.99 19104
	malware 0.98 0.94 0.96 6521
	phishing 0.91 0.86 0.88 18836

	accuracy 0.97 130239
	macro avg 0.96 0.95 0.95 130239
	weighted avg 0.97 0.97 0.97 130239
	```

	### Gaussian Naive Bayes
	- Accuracy: 0.80
	- Report:
	```
	precision recall f1-score support

	benign 0.86 0.90 0.88 85778
	defacement 0.67 0.99 0.80 19104
	malware 0.63 0.69 0.66 6521
	phishing 0.68 0.19 0.29 18836

	accuracy 0.80 130239
	macro avg 0.71 0.69 0.66 130239
	weighted avg 0.80 0.80 0.77 130239
	```

	### Gradient Boosting Classifier
	- Accuracy: 0.94
	- Report:
	```
	precision recall f1-score support

	benign 0.96 0.99 0.97 85778
	defacement 0.92 0.97 0.94 19104
	malware 0.94 0.80 0.87 6521
	phishing 0.89 0.78 0.83 18836

	accuracy 0.94 130239
	macro avg 0.93 0.88 0.90 130239
	weighted avg 0.94 0.94 0.94 130239
	```

	### Logistic Regression
	- Accuracy: 0.87
	- Report:
	```
	precision recall f1-score support

	benign 0.89 0.97 0.93 85778
	defacement 0.85 0.95 0.90 19104
	malware 0.81 0.69 0.74 6521
	phishing 0.77 0.42 0.55 18836

	accuracy 0.87 130239
	macro avg 0.83 0.76 0.78 130239
	weighted avg 0.87 0.87 0.86 130239
	```

	### MLP Classifier
	- Accuracy: 0.96
	- Report:
	```
	precision recall f1-score support

	benign 0.97 0.98 0.98 85778
	defacement 0.97 0.97 0.97 19104
	malware 0.95 0.90 0.92 6521
	phishing 0.88 0.83 0.86 18836

	accuracy 0.96 130239
	macro avg 0.94 0.92 0.93 130239
	weighted avg 0.96 0.96 0.96 130239
	```

	### Random Forest
	- Accuracy: 0.97
	- Report:
	```
	precision recall f1-score support

	benign 0.98 0.98 0.98 85778
	defacement 0.98 0.99 0.99 19104
	malware 0.98 0.94 0.96 6521
	phishing 0.91 0.87 0.89 18836

	accuracy 0.97 130239
	macro avg 0.96 0.95 0.95 130239
	weighted avg 0.97 0.97 0.97 130239
	```

	### SGD Classifier
	- Accuracy: 0.87
	- Report:
	```
	precision recall f1-score support

	benign 0.89 0.96 0.93 85778
	defacement 0.83 0.95 0.89 19104
	malware 0.79 0.71 0.75 6521
	phishing 0.74 0.40 0.52 18836

	accuracy 0.87 130239
	macro avg 0.81 0.76 0.77 130239
	weighted avg 0.86 0.87 0.85 130239
	```

	### XGBoost
	- Accuracy: 0.96
	- Report:
	```
	precision recall f1-score support

	benign 0.97 0.99 0.98 85778
	defacement 0.97 0.99 0.98 19104
	malware 0.98 0.92 0.95 6521
	phishing 0.91 0.84 0.88 18836

	accuracy 0.96 130239
	macro avg 0.96 0.93 0.95 130239
	weighted avg 0.96 0.96 0.96 130239
	```

	## Usage

	To load a model in Python, you can use `joblib` or `pickle`.

	### Using joblib

	```python
	import joblib

	# Load the model
	model = joblib.load('models/random_forest.pkl')

	# Make predictions
	prediction = model.predict(X_test)
	```

	### Using pickle

	```python
	import pickle

	# Load the model
	with open('models/random_forest.pkl', 'rb') as f:
	model = pickle.load(f)

	# Make predictions
	prediction = model.predict(X_test)
	```