File size: 1,452 Bytes

0fa218c
0be6ad3
 
 
 
 
 
 
57fd5e4
 
0be6ad3
 
 
0fa218c
0be6ad3
0fa218c
28ec951
 
 
2ae385d
28ec951
 
 
 
 
2ae385d
28ec951
2ae385d
871700c
 
28ec951
 
 
d62e6eb
0be6ad3
 
 
 
0fa218c
0be6ad3
0fa218c
0be6ad3
 
 
 
 
0fa218c
3ab1aa0
a433b7f
d3d4bf5
 
a433b7f
0be6ad3
0fa218c
0be6ad3
 
 
 
0fa218c
0be6ad3
 
 
 
0fa218c
0be6ad3
0fa218c
0be6ad3
4ae712e

---
tags:
- fasttext
- phishing
- domain-classification
license: mit
language:
- en
datasets:
  - mstfknn/phishing-domain-list-2m-plus
---

# Phishing Detection Model (FastText)

This is a lightweight FastText model trained to classify domain names as either phishing or clean. It uses supervised learning with `wordNgrams=2` for better n-gram feature coverage.

## Installation

Option 1: From Source
```
git clone https://github.com/facebookresearch/fastText.git
cd fastText
mkdir build && cd build
cmake ..
make
```
Option 2: Using pip (limited support)
```
pip install fasttext
```
⚠️ The pip version does not support all features. Compiling from source is recommended.

## Usage

```bash
# Predict a single domain
echo "carreeffoursa.site" | ./fasttext predict phishing_model.bin -
```

## Training Info

- Framework: FastText
- Labels: `__label__phishing`, `__label__clean`
- Epochs: 10
- Learning rate: 0.5
- wordNgrams: 2

## 📊 Training Data

The model was trained on [mstfknn/phishing-domain-list-2m-plus](https://huggingface.co/datasets/mstfknn/phishing-domain-list-2m-plus), a dataset consisting of 2.000,000 domain names labeled as either phishing or clean.


## Example

Input:
```
carreeffoursa.site
```

Output:
```
__label__phishing
```

## License

MIT

---
## 🔗 Links
- 💻 [GitHub Repository](https://github.com/mstfknn/phishing-fasttext-model)
- 🐳 [Docker Hub Image](https://hub.docker.com/r/mstfknn/phishing-fasttext)