nhellyercreek
/

url-phishing-classifier

Text Classification

phishing-detection

url-classification

Model card Files Files and versions

url-phishing-classifier / README.md

nhellyercreek's picture

Upload URL Phishing Classifier model

a333021 verified 3 days ago

|

history blame contribute delete

2.83 kB

	---
	license: mit
	tags:
	- phishing-detection
	- url-classification
	- text-classification
	- roberta
	task: text-classification
	datasets:
	- custom
	---

	# Url Phishing Classifier

	This model is fine-tuned for URL phishing classification. It classifies URLs as phishing (1) or safe (0).

	## Model Description

	This model is based on roberta-base and has been fine-tuned for phishing detection tasks.

	## Training Details

	- Base Model: roberta-base
	- Training Samples: 1629193
	- Validation Samples: 325839
	- Test Samples: 217226
	- Epochs: 5
	- Batch Size: 24
	- Learning Rate: 2e-05
	- Max Length: 256








	## Evaluation Results

	### Test Set Metrics

	- Loss: 0.1483
	- Accuracy: 0.9463
	- F1: 0.9262
	- Precision: 0.9259
	- Recall: 0.9264
	- Roc Auc: 0.9890
	- True Positives: 73116.0000
	- True Negatives: 132450.0000
	- False Positives: 5851.0000
	- False Negatives: 5809.0000
	- Runtime: 142.5284
	- Samples Per Second: 1524.0900
	- Steps Per Second: 31.7550
	- Epoch: 5.0000

	### Validation Set Metrics

	- Loss: 0.1483
	- Accuracy: 0.9455
	- F1: 0.9250
	- Precision: 0.9246
	- Recall: 0.9255
	- Roc Auc: 0.9888
	- True Positives: 109566.0000
	- True Negatives: 198511.0000
	- False Positives: 8940.0000
	- False Negatives: 8822.0000
	- Runtime: 195.9861
	- Samples Per Second: 1662.5610
	- Steps Per Second: 34.6400
	- Epoch: 5.0000


	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_name = "nhellyercreek/url-phishing-classifier"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Example inference
	text = "Your email or URL text here"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

	# Get prediction
	predicted_class = predictions.argmax().item()
	confidence = predictions[0][predicted_class].item()

	print(f"Predicted class: {predicted_class} (phishing=1, safe=0)")
	print(f"Confidence: {confidence:.4f}")
	```

	## Limitations

	This model was trained on specific datasets and may not generalize to all types of phishing attempts. Always use additional security measures in production environments.

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{nhellyercreek_url_phishing_classifier,
	title={Url Phishing Classifier},
	author={Your Name},
	year={2024},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/nhellyercreek/url-phishing-classifier}}
	}
	```