--- license: mit tags: - phishing-detection - url-classification - text-classification - roberta task: text-classification datasets: - custom --- # Url Phishing Classifier This model is fine-tuned for URL phishing classification. It classifies URLs as phishing (1) or safe (0). ## Model Description This model is based on **roberta-base** and has been fine-tuned for phishing detection tasks. ## Training Details - **Base Model**: roberta-base - **Training Samples**: 1629193 - **Validation Samples**: 325839 - **Test Samples**: 217226 - **Epochs**: 5 - **Batch Size**: 24 - **Learning Rate**: 2e-05 - **Max Length**: 256 ## Evaluation Results ### Test Set Metrics - **Loss**: 0.1483 - **Accuracy**: 0.9463 - **F1**: 0.9262 - **Precision**: 0.9259 - **Recall**: 0.9264 - **Roc Auc**: 0.9890 - **True Positives**: 73116.0000 - **True Negatives**: 132450.0000 - **False Positives**: 5851.0000 - **False Negatives**: 5809.0000 - **Runtime**: 142.5284 - **Samples Per Second**: 1524.0900 - **Steps Per Second**: 31.7550 - **Epoch**: 5.0000 ### Validation Set Metrics - **Loss**: 0.1483 - **Accuracy**: 0.9455 - **F1**: 0.9250 - **Precision**: 0.9246 - **Recall**: 0.9255 - **Roc Auc**: 0.9888 - **True Positives**: 109566.0000 - **True Negatives**: 198511.0000 - **False Positives**: 8940.0000 - **False Negatives**: 8822.0000 - **Runtime**: 195.9861 - **Samples Per Second**: 1662.5610 - **Steps Per Second**: 34.6400 - **Epoch**: 5.0000 ## Usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "nhellyercreek/url-phishing-classifier" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example inference text = "Your email or URL text here" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) # Get prediction predicted_class = predictions.argmax().item() confidence = predictions[0][predicted_class].item() print(f"Predicted class: {predicted_class} (phishing=1, safe=0)") print(f"Confidence: {confidence:.4f}") ``` ## Limitations This model was trained on specific datasets and may not generalize to all types of phishing attempts. Always use additional security measures in production environments. ## Citation If you use this model, please cite: ```bibtex @misc{nhellyercreek_url_phishing_classifier, title={Url Phishing Classifier}, author={Your Name}, year={2024}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/nhellyercreek/url-phishing-classifier}} } ```