Perth0603
/

Random-Forest-Model-for-PhishingDetection

Model card Files Files and versions

Random-Forest-Model-for-PhishingDetection / model_card.md

Perth0603's picture

Upload model_card.md with huggingface_hub

3d48ccd verified 4 months ago

|

history blame contribute delete

1.11 kB

	---
	language: en
	license: other
	tags:
	- security
	- phishing-detection
	- url-classification
	- xgboost
	---

	# Random Forest / XGBoost Model for URL Phishing Detection

	## Model Details
	- Architecture: Gradient-boosted decision trees (XGBoost)
	- Input: Single URL string (no external queries)
	- Features: Lexical and structural URL features (lengths, symbol counts, digit ratio, IPv4 pattern, common phishing tokens, scheme/TLD heuristics)
	- Training data: `PhiUSIIL_Phishing_URL_Dataset.csv`
	- Intended use: Binary classification (phishing vs. legitimate)

	## Metrics (test)
	- Accuracy: 0.9952
	- Precision: 0.9928
	- Recall: 0.9989
	- F1: 0.9958
	- ROC-AUC: 0.9976

	## Usage
	See `README.md` and `inference.py` for loading and `predict_url()`.

	## Limitations and Biases
	- URL-only features can be evaded by sophisticated attackers.
	- Dataset shifts and novel TLDs may degrade performance.
	- Always validate on your own traffic before deployment.

	## License
	Provided for research/educational purposes. Ensure compliance with local laws and organizational policies.