--- language: en license: other tags: - security - phishing-detection - url-classification - xgboost --- # Random Forest / XGBoost Model for URL Phishing Detection ## Model Details - Architecture: Gradient-boosted decision trees (XGBoost) - Input: Single URL string (no external queries) - Features: Lexical and structural URL features (lengths, symbol counts, digit ratio, IPv4 pattern, common phishing tokens, scheme/TLD heuristics) - Training data: `PhiUSIIL_Phishing_URL_Dataset.csv` - Intended use: Binary classification (phishing vs. legitimate) ## Metrics (test) - Accuracy: 0.9952 - Precision: 0.9928 - Recall: 0.9989 - F1: 0.9958 - ROC-AUC: 0.9976 ## Usage See `README.md` and `inference.py` for loading and `predict_url()`. ## Limitations and Biases - URL-only features can be evaded by sophisticated attackers. - Dataset shifts and novel TLDs may degrade performance. - Always validate on your own traffic before deployment. ## License Provided for research/educational purposes. Ensure compliance with local laws and organizational policies.