url-phishing-classifier / README.md

nhellyercreek

Upload URL Phishing Classifier model

a333021 verified 2 days ago

preview code

raw

history blame contribute delete

2.83 kB

metadata

license: mit
tags:
  - phishing-detection
  - url-classification
  - text-classification
  - roberta
task: text-classification
datasets:
  - custom

Url Phishing Classifier

This model is fine-tuned for URL phishing classification. It classifies URLs as phishing (1) or safe (0).

Model Description

This model is based on roberta-base and has been fine-tuned for phishing detection tasks.

Training Details

Base Model: roberta-base
Training Samples: 1629193
Validation Samples: 325839
Test Samples: 217226
Epochs: 5
Batch Size: 24
Learning Rate: 2e-05
Max Length: 256

Evaluation Results

Test Set Metrics

Loss: 0.1483
Accuracy: 0.9463
F1: 0.9262
Precision: 0.9259
Recall: 0.9264
Roc Auc: 0.9890
True Positives: 73116.0000
True Negatives: 132450.0000
False Positives: 5851.0000
False Negatives: 5809.0000
Runtime: 142.5284
Samples Per Second: 1524.0900
Steps Per Second: 31.7550
Epoch: 5.0000

Validation Set Metrics

Loss: 0.1483
Accuracy: 0.9455
F1: 0.9250
Precision: 0.9246
Recall: 0.9255
Roc Auc: 0.9888
True Positives: 109566.0000
True Negatives: 198511.0000
False Positives: 8940.0000
False Negatives: 8822.0000
Runtime: 195.9861
Samples Per Second: 1662.5610
Steps Per Second: 34.6400
Epoch: 5.0000

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "nhellyercreek/url-phishing-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "Your email or URL text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get prediction
predicted_class = predictions.argmax().item()
confidence = predictions[0][predicted_class].item()

print(f"Predicted class: {predicted_class} (phishing=1, safe=0)")
print(f"Confidence: {confidence:.4f}")

Limitations

This model was trained on specific datasets and may not generalize to all types of phishing attempts. Always use additional security measures in production environments.

Citation

If you use this model, please cite:

@misc{nhellyercreek_url_phishing_classifier,
  title={Url Phishing Classifier},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/nhellyercreek/url-phishing-classifier}}
}