nhellyercreek's picture
Upload URL Phishing Classifier model
a333021 verified
metadata
license: mit
tags:
  - phishing-detection
  - url-classification
  - text-classification
  - roberta
task: text-classification
datasets:
  - custom

Url Phishing Classifier

This model is fine-tuned for URL phishing classification. It classifies URLs as phishing (1) or safe (0).

Model Description

This model is based on roberta-base and has been fine-tuned for phishing detection tasks.

Training Details

  • Base Model: roberta-base
  • Training Samples: 1629193
  • Validation Samples: 325839
  • Test Samples: 217226
  • Epochs: 5
  • Batch Size: 24
  • Learning Rate: 2e-05
  • Max Length: 256

Evaluation Results

Test Set Metrics

  • Loss: 0.1483
  • Accuracy: 0.9463
  • F1: 0.9262
  • Precision: 0.9259
  • Recall: 0.9264
  • Roc Auc: 0.9890
  • True Positives: 73116.0000
  • True Negatives: 132450.0000
  • False Positives: 5851.0000
  • False Negatives: 5809.0000
  • Runtime: 142.5284
  • Samples Per Second: 1524.0900
  • Steps Per Second: 31.7550
  • Epoch: 5.0000

Validation Set Metrics

  • Loss: 0.1483
  • Accuracy: 0.9455
  • F1: 0.9250
  • Precision: 0.9246
  • Recall: 0.9255
  • Roc Auc: 0.9888
  • True Positives: 109566.0000
  • True Negatives: 198511.0000
  • False Positives: 8940.0000
  • False Negatives: 8822.0000
  • Runtime: 195.9861
  • Samples Per Second: 1662.5610
  • Steps Per Second: 34.6400
  • Epoch: 5.0000

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "nhellyercreek/url-phishing-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "Your email or URL text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get prediction
predicted_class = predictions.argmax().item()
confidence = predictions[0][predicted_class].item()

print(f"Predicted class: {predicted_class} (phishing=1, safe=0)")
print(f"Confidence: {confidence:.4f}")

Limitations

This model was trained on specific datasets and may not generalize to all types of phishing attempts. Always use additional security measures in production environments.

Citation

If you use this model, please cite:

@misc{nhellyercreek_url_phishing_classifier,
  title={Url Phishing Classifier},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/nhellyercreek/url-phishing-classifier}}
}