Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,44 +1,44 @@
|
|
| 1 |
-
# 🛡️ Phishing Domain Classifier (FastText)
|
| 2 |
-
|
| 3 |
-
This repository contains a **FastText-based supervised classification model** trained to detect phishing domains.
|
| 4 |
-
|
| 5 |
-
## 🚀 Model Overview
|
| 6 |
-
|
| 7 |
-
- **Algorithm**: Facebook's [fastText](https://fasttext.cc/)
|
| 8 |
-
- **Task**: Binary classification (`phishing` vs `clean`)
|
| 9 |
-
- **Input format**: Domain names (e.g., `paypal-login.su`)
|
| 10 |
-
- **Labels**: `__label__phishing`, `__label__clean`
|
| 11 |
-
- **Features**:
|
| 12 |
-
- Fast and lightweight
|
| 13 |
-
- Trained with `wordNgrams = 2`
|
| 14 |
-
- 10 epochs
|
| 15 |
-
|
| 16 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
-
phishing_model.bin # Trained model file (binary format)
|
| 22 |
-
phishing_model.vec # Vector embeddings
|
| 23 |
-
fasttext_train.txt # Training data file
|
| 24 |
-
README.md # Documentation
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
cd fastText
|
| 33 |
-
mkdir build && cd build
|
| 34 |
-
cmake ..
|
| 35 |
-
make
|
| 36 |
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
|
| 42 |
|
| 43 |
-
|
| 44 |
-
echo "carreeffoursa.site" | ./fasttext predict phishing_model.bin -
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
tags:
|
| 3 |
+
- fasttext
|
| 4 |
+
- phishing
|
| 5 |
+
- domain-classification
|
| 6 |
+
license: mit
|
| 7 |
+
language:
|
| 8 |
+
- en
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Phishing Detection Model (FastText)
|
| 12 |
|
| 13 |
+
This is a lightweight FastText model trained to classify domain names as either phishing or clean. It uses supervised learning with `wordNgrams=2` for better n-gram feature coverage.
|
| 14 |
|
| 15 |
+
## Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
+
```bash
|
| 18 |
+
# Predict a single domain
|
| 19 |
+
echo "carreeffoursa.site" | ./fasttext predict phishing_model.bin -
|
| 20 |
+
```
|
| 21 |
|
| 22 |
+
## Training Info
|
| 23 |
|
| 24 |
+
- Framework: FastText
|
| 25 |
+
- Labels: `__label__phishing`, `__label__clean`
|
| 26 |
+
- Epochs: 10
|
| 27 |
+
- Learning rate: 0.5
|
| 28 |
+
- wordNgrams: 2
|
| 29 |
|
| 30 |
+
## Example
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
Input:
|
| 33 |
+
```
|
| 34 |
+
carreeffoursa.site
|
| 35 |
+
```
|
| 36 |
|
| 37 |
+
Output:
|
| 38 |
+
```
|
| 39 |
+
__label__phishing
|
| 40 |
+
```
|
| 41 |
|
| 42 |
+
## License
|
| 43 |
|
| 44 |
+
MIT
|
|
|