|
|
--- |
|
|
tags: |
|
|
- fasttext |
|
|
- phishing |
|
|
- domain-classification |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- mstfknn/phishing-domain-list-2m-plus |
|
|
--- |
|
|
|
|
|
# Phishing Detection Model (FastText) |
|
|
|
|
|
This is a lightweight FastText model trained to classify domain names as either phishing or clean. It uses supervised learning with `wordNgrams=2` for better n-gram feature coverage. |
|
|
|
|
|
## Installation |
|
|
|
|
|
Option 1: From Source |
|
|
``` |
|
|
git clone https://github.com/facebookresearch/fastText.git |
|
|
cd fastText |
|
|
mkdir build && cd build |
|
|
cmake .. |
|
|
make |
|
|
``` |
|
|
Option 2: Using pip (limited support) |
|
|
``` |
|
|
pip install fasttext |
|
|
``` |
|
|
β οΈ The pip version does not support all features. Compiling from source is recommended. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```bash |
|
|
# Predict a single domain |
|
|
echo "carreeffoursa.site" | ./fasttext predict phishing_model.bin - |
|
|
``` |
|
|
|
|
|
## Training Info |
|
|
|
|
|
- Framework: FastText |
|
|
- Labels: `__label__phishing`, `__label__clean` |
|
|
- Epochs: 10 |
|
|
- Learning rate: 0.5 |
|
|
- wordNgrams: 2 |
|
|
|
|
|
## π Training Data |
|
|
|
|
|
The model was trained on [mstfknn/phishing-domain-list-2m-plus](https://huggingface.co/datasets/mstfknn/phishing-domain-list-2m-plus), a dataset consisting of 2.000,000 domain names labeled as either phishing or clean. |
|
|
|
|
|
|
|
|
## Example |
|
|
|
|
|
Input: |
|
|
``` |
|
|
carreeffoursa.site |
|
|
``` |
|
|
|
|
|
Output: |
|
|
``` |
|
|
__label__phishing |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
MIT |
|
|
|
|
|
--- |
|
|
## π Links |
|
|
- π» [GitHub Repository](https://github.com/mstfknn/phishing-fasttext-model) |
|
|
- π³ [Docker Hub Image](https://hub.docker.com/r/mstfknn/phishing-fasttext) |