File size: 1,108 Bytes
3d48ccd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---

language: en
license: other
tags:
  - security
  - phishing-detection
  - url-classification
  - xgboost
---


# Random Forest / XGBoost Model for URL Phishing Detection

## Model Details
- Architecture: Gradient-boosted decision trees (XGBoost)
- Input: Single URL string (no external queries)
- Features: Lexical and structural URL features (lengths, symbol counts, digit ratio, IPv4 pattern, common phishing tokens, scheme/TLD heuristics)
- Training data: `PhiUSIIL_Phishing_URL_Dataset.csv`
- Intended use: Binary classification (phishing vs. legitimate)

## Metrics (test)
- Accuracy: 0.9952
- Precision: 0.9928
- Recall: 0.9989
- F1: 0.9958
- ROC-AUC: 0.9976

## Usage
See `README.md` and `inference.py` for loading and `predict_url()`.

## Limitations and Biases
- URL-only features can be evaded by sophisticated attackers.
- Dataset shifts and novel TLDs may degrade performance.
- Always validate on your own traffic before deployment.

## License
Provided for research/educational purposes. Ensure compliance with local laws and organizational policies.