Perth0603's picture
Upload model_card.md with huggingface_hub
3d48ccd verified
metadata
language: en
license: other
tags:
  - security
  - phishing-detection
  - url-classification
  - xgboost

Random Forest / XGBoost Model for URL Phishing Detection

Model Details

  • Architecture: Gradient-boosted decision trees (XGBoost)
  • Input: Single URL string (no external queries)
  • Features: Lexical and structural URL features (lengths, symbol counts, digit ratio, IPv4 pattern, common phishing tokens, scheme/TLD heuristics)
  • Training data: PhiUSIIL_Phishing_URL_Dataset.csv
  • Intended use: Binary classification (phishing vs. legitimate)

Metrics (test)

  • Accuracy: 0.9952
  • Precision: 0.9928
  • Recall: 0.9989
  • F1: 0.9958
  • ROC-AUC: 0.9976

Usage

See README.md and inference.py for loading and predict_url().

Limitations and Biases

  • URL-only features can be evaded by sophisticated attackers.
  • Dataset shifts and novel TLDs may degrade performance.
  • Always validate on your own traffic before deployment.

License

Provided for research/educational purposes. Ensure compliance with local laws and organizational policies.