Perth0603
/

Random-Forest-Model-for-PhishingDetection

Joblib

Model card Files Files and versions

xet

Community

Perth0603 commited on Oct 1, 2025

Commit

b0ecf99

verified ·

1 Parent(s): ab41a73

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+Random Forest / XGBoost Model for URL Phishing Detection
+This repository contains a trained tree-based classifier for detecting phishing URLs. The model was trained from the `PhiUSIIL_Phishing_URL_Dataset.csv` with lightweight, URL-only lexical and structural features. On the held-out test split it achieved high accuracy and F1.
+Highlights
+- Backend: gradient-boosted trees via XGBoost (uses GPU if available; falls back to CPU).
+- Input: raw URL string only (no external DNS/WHOIS calls needed).
+- Features: length, character counts, digit ratio, IPv4 presence, common phishing tokens, scheme/TLD heuristics.
+Test metrics (from notebook)
+- accuracy: 0.9952
+- precision: 0.9928
+- recall: 0.9989
+- f1: 0.9958
+- roc_auc: 0.9976
+Files
+- `rf_url_phishing_xgboost_bst.joblib`: joblib bundle with the trained model and metadata.
+- `inference.py`: helpers to load the bundle and run `predict_url()`.
+- `requirements.txt`: minimal dependencies for local inference.
+Quick start (local)
+1) Install dependencies
+```bash
+pip install -r requirements.txt
+```
+2) Predict a single URL
+```python
+from inference import load_bundle, predict_url
+bundle = load_bundle("rf_url_phishing_xgboost_bst.joblib")
+result = predict_url(
+    url="http://secure-login-account-update.example.com/session?id=123",
+    bundle=bundle,
+    threshold=0.5,
+)
+print(result)
+```
+Bundle contents
+The joblib bundle contains:
+- `model`: trained XGBoost booster
+- `feature_cols`: ordered list of feature names expected by the model
+- `url_col`: original URL column name
+- `label_col`: label column name used in training
+- `model_type`: string identifying the backend (here: `xgboost_bst`)
+License
+This model is provided for research and educational purposes only. Evaluate thoroughly before use in production.