Url Phishing Classifier

This model is fine-tuned for URL phishing classification. It classifies URLs as phishing (1) or safe (0).

Model Description

This model is based on roberta-base and has been fine-tuned for phishing detection tasks.

Training Details

  • Base Model: roberta-base
  • Training Samples: 1629193
  • Validation Samples: 325839
  • Test Samples: 217226
  • Epochs: 5
  • Batch Size: 24
  • Learning Rate: 2e-05
  • Max Length: 256

Evaluation Results

Test Set Metrics

  • Loss: 0.1483
  • Accuracy: 0.9463
  • F1: 0.9262
  • Precision: 0.9259
  • Recall: 0.9264
  • Roc Auc: 0.9890
  • True Positives: 73116.0000
  • True Negatives: 132450.0000
  • False Positives: 5851.0000
  • False Negatives: 5809.0000
  • Runtime: 142.5284
  • Samples Per Second: 1524.0900
  • Steps Per Second: 31.7550
  • Epoch: 5.0000

Validation Set Metrics

  • Loss: 0.1483
  • Accuracy: 0.9455
  • F1: 0.9250
  • Precision: 0.9246
  • Recall: 0.9255
  • Roc Auc: 0.9888
  • True Positives: 109566.0000
  • True Negatives: 198511.0000
  • False Positives: 8940.0000
  • False Negatives: 8822.0000
  • Runtime: 195.9861
  • Samples Per Second: 1662.5610
  • Steps Per Second: 34.6400
  • Epoch: 5.0000

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "nhellyercreek/url-phishing-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "Your email or URL text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get prediction
predicted_class = predictions.argmax().item()
confidence = predictions[0][predicted_class].item()

print(f"Predicted class: {predicted_class} (phishing=1, safe=0)")
print(f"Confidence: {confidence:.4f}")

Limitations

This model was trained on specific datasets and may not generalize to all types of phishing attempts. Always use additional security measures in production environments.

Citation

If you use this model, please cite:

@misc{nhellyercreek_url_phishing_classifier,
  title={Url Phishing Classifier},
  author={Your Name},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/nhellyercreek/url-phishing-classifier}}
}
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support