File size: 3,317 Bytes
d8d50b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: mit
language:
- en
tags:
- email
- spam
- spamdetection
---
 
 # πŸ“© Spam Detection Neural Network (PyTorch)

[![Python](https://img.shields.io/badge/python-3.10-blue.svg)](https://www.python.org/)
[![PyTorch](https://img.shields.io/badge/pytorch-2.1-red.svg)](https://pytorch.org/)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

A **simple, real-world spam detection neural network** built from scratch in **PyTorch**.  
This model classifies SMS / short text messages as **Spam** or **Ham (Not Spam)**.

The project is **small, easy to understand, and perfect for learning**.  
You can fork it, fine-tune it, and use it as a **starting point for your own projects**.

---

## 🧠 Model Overview

- **Framework:** PyTorch  
- **Architecture:** Fully Connected Neural Network (MLP)  
- **Input:** Bag-of-Words text vectors  
- **Output:** Binary classification (Spam / Ham)  
- **Training:** From scratch, small dataset (~5,500 messages)  

> ⚠️ Note: The dataset is intentionally small to keep things simple.  
> You are encouraged to **fork the repo, add more data, and fine-tune the model**.

---

## πŸ“‚ Repository Structure

```

.
β”œβ”€β”€ spam_nn.pth        # Trained PyTorch model weights
β”œβ”€β”€ vectorizer.pkl     # CountVectorizer for text preprocessing
β”œβ”€β”€ model.py           # Neural network architecture
β”œβ”€β”€ config.json        # Model configuration
β”œβ”€β”€ inference.py       # Inference / prediction script
β”œβ”€β”€ README.md          # Documentation

````

---

## πŸš€ Usage

### Load Model

```python
import torch
from model import SpamNN
import pickle

# Load model architecture + weights
model = SpamNN()
model.load_state_dict(torch.load("spam_nn.pth"))
model.eval()

# Load vectorizer
with open("vectorizer.pkl", "rb") as f:
    vectorizer = pickle.load(f)
````

### Predict Messages

```python
def predict(text):
    vec = vectorizer.transform([text]).toarray()
    vec = torch.tensor(vec, dtype=torch.float32)
    
    with torch.no_grad():
        output = model(vec)
    
    return "Spam" if output.item() > 0.35 else "Ham"

# Example
print(predict("Congratulations! You won $1000. Click now!"))
```

---

## πŸ”§ Training & Fine-Tuning

The model can be **improved and fine-tuned** by:

* Adding more data (larger SMS datasets)
* Increasing n-grams (`ngram_range=(1,2)`)
* Adjusting class weights in `BCEWithLogitsLoss`
* Training with more epochs
* Using embeddings or LSTM for contextual understanding

πŸ’‘ **Fork this repo and experiment freely**. Make it your own!

---

## 🌟 Support the Project

If this project is helpful:

⭐ **Give this repository a star**
🍴 **Fork it and improve it**
πŸ“’ **Share it with others learning PyTorch**

> Following and starring helps me keep releasing open-source projects!

---

## πŸ“Œ Source Code & Updates

For the **full source code, training scripts, and future updates**,
please visit the **GitHub repository** linked to this project.

---

## πŸ“œ License

This project is **open-source** and intended for **educational purposes**.
MIT License applies.

---

## πŸ€— Hugging Face Friendly

You can also **upload this model to Hugging Face Model Hub**.
Include `spam_nn.pth`, `vectorizer.pkl`, `config.json`, and `inference.py` to make it **ready for inference online**.