|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- email |
|
|
- spam |
|
|
- spamdetection |
|
|
--- |
|
|
|
|
|
# π© Spam Detection Neural Network (PyTorch) |
|
|
|
|
|
[](https://www.python.org/) |
|
|
[](https://pytorch.org/) |
|
|
[](LICENSE) |
|
|
|
|
|
A **simple, real-world spam detection neural network** built from scratch in **PyTorch**. |
|
|
This model classifies SMS / short text messages as **Spam** or **Ham (Not Spam)**. |
|
|
|
|
|
The project is **small, easy to understand, and perfect for learning**. |
|
|
You can fork it, fine-tune it, and use it as a **starting point for your own projects**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Model Overview |
|
|
|
|
|
- **Framework:** PyTorch |
|
|
- **Architecture:** Fully Connected Neural Network (MLP) |
|
|
- **Input:** Bag-of-Words text vectors |
|
|
- **Output:** Binary classification (Spam / Ham) |
|
|
- **Training:** From scratch, small dataset (~5,500 messages) |
|
|
|
|
|
> β οΈ Note: The dataset is intentionally small to keep things simple. |
|
|
> You are encouraged to **fork the repo, add more data, and fine-tune the model**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Repository Structure |
|
|
|
|
|
``` |
|
|
|
|
|
. |
|
|
βββ spam_nn.pth # Trained PyTorch model weights |
|
|
βββ vectorizer.pkl # CountVectorizer for text preprocessing |
|
|
βββ model.py # Neural network architecture |
|
|
βββ config.json # Model configuration |
|
|
βββ inference.py # Inference / prediction script |
|
|
βββ README.md # Documentation |
|
|
|
|
|
```` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage |
|
|
|
|
|
### Load Model |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from model import SpamNN |
|
|
import pickle |
|
|
|
|
|
# Load model architecture + weights |
|
|
model = SpamNN() |
|
|
model.load_state_dict(torch.load("spam_nn.pth")) |
|
|
model.eval() |
|
|
|
|
|
# Load vectorizer |
|
|
with open("vectorizer.pkl", "rb") as f: |
|
|
vectorizer = pickle.load(f) |
|
|
```` |
|
|
|
|
|
### Predict Messages |
|
|
|
|
|
```python |
|
|
def predict(text): |
|
|
vec = vectorizer.transform([text]).toarray() |
|
|
vec = torch.tensor(vec, dtype=torch.float32) |
|
|
|
|
|
with torch.no_grad(): |
|
|
output = model(vec) |
|
|
|
|
|
return "Spam" if output.item() > 0.35 else "Ham" |
|
|
|
|
|
# Example |
|
|
print(predict("Congratulations! You won $1000. Click now!")) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Training & Fine-Tuning |
|
|
|
|
|
The model can be **improved and fine-tuned** by: |
|
|
|
|
|
* Adding more data (larger SMS datasets) |
|
|
* Increasing n-grams (`ngram_range=(1,2)`) |
|
|
* Adjusting class weights in `BCEWithLogitsLoss` |
|
|
* Training with more epochs |
|
|
* Using embeddings or LSTM for contextual understanding |
|
|
|
|
|
π‘ **Fork this repo and experiment freely**. Make it your own! |
|
|
|
|
|
--- |
|
|
|
|
|
## π Support the Project |
|
|
|
|
|
If this project is helpful: |
|
|
|
|
|
β **Give this repository a star** |
|
|
π΄ **Fork it and improve it** |
|
|
π’ **Share it with others learning PyTorch** |
|
|
|
|
|
> Following and starring helps me keep releasing open-source projects! |
|
|
|
|
|
--- |
|
|
|
|
|
## π Source Code & Updates |
|
|
|
|
|
For the **full source code, training scripts, and future updates**, |
|
|
please visit the **GitHub repository** linked to this project. |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is **open-source** and intended for **educational purposes**. |
|
|
MIT License applies. |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Hugging Face Friendly |
|
|
|
|
|
You can also **upload this model to Hugging Face Model Hub**. |
|
|
Include `spam_nn.pth`, `vectorizer.pkl`, `config.json`, and `inference.py` to make it **ready for inference online**. |