File size: 4,503 Bytes
665f8c1
24debe0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
665f8c1
24debe0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---

language: en
license: mit
tags:
- spam-detection
- text-classification
- sms
- bert
- transformers
datasets:
- sms-spam-collection
metrics:
- accuracy
- precision
- recall
- f1
widget:
- text: "Congratulations! You've won a $1000 gift card. Click here to claim now!"
  example_title: "Spam Example"
- text: "Hey, are we still meeting for lunch tomorrow at 12?"
  example_title: "Ham Example"
- text: "URGENT! Your account has been suspended. Verify now to restore access."
  example_title: "Spam Example 2"
- text: "Thanks for your help today. I really appreciate it!"
  example_title: "Ham Example 2"
---


# SMS Spam Detection with BERT

🎯 A high-performance SMS spam classifier built with BERT achieving **99.16% accuracy**.

## Model Description

This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:
- **HAM** (legitimate message)
- **SPAM** (unwanted/spam message)

## Performance Metrics

| Metric | Score |
|--------|-------|
| **Accuracy** | 99.16% |
| **Precision** | 97.30% |
| **Recall** | 96.43% |
| **F1-Score** | 96.86% |

## Quick Start

### Using Transformers Pipeline

```python

from transformers import pipeline



# Load the model

classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")



# Classify a message

result = classifier("Congratulations! You've won a $1000 gift card!")

print(result)

# Output: [{'label': 'SPAM', 'score': 0.9987}]

```

### Using AutoModel and AutoTokenizer

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch



# Load model and tokenizer

model_name = "niru-nny/SMS_Spam_Detection"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)



# Prepare input

text = "Hey, are we still meeting for lunch tomorrow?"

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)



# Get prediction

with torch.no_grad():

    outputs = model(**inputs)

    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

    predicted_class = torch.argmax(predictions, dim=-1).item()



# Map to label

labels = ["HAM", "SPAM"]

print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")

```

## Training Details

### Dataset
- **Source:** SMS Spam Collection Dataset
- **Total Messages:** 5,574
- **Ham Messages:** 4,827 (86.6%)
- **Spam Messages:** 747 (13.4%)

### Training Configuration
- **Base Model:** `bert-base-uncased`
- **Max Sequence Length:** 128 tokens
- **Batch Size:** 16
- **Learning Rate:** 2e-5
- **Epochs:** 3
- **Optimizer:** AdamW

### Data Split
- **Training:** 80%
- **Validation:** 20%

## Model Architecture

```

Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)

```

## Use Cases**Spam Filtering**: Automatically filter spam messages in messaging applications  
✅ **SMS Gateway Protection**: Protect users from phishing and scam attempts  
✅ **Content Moderation**: Pre-screen messages in communication platforms  
✅ **Fraud Detection**: Identify suspicious messages in financial apps  

## Limitations

- Model is trained specifically on English SMS messages
- May not generalize well to other languages or message formats
- Performance may vary on messages with heavy slang or abbreviations
- Trained on historical data; new spam patterns may emerge

## Ethical Considerations

⚠️ **Privacy**: Ensure compliance with data protection regulations when processing user messages  
⚠️ **False Positives**: Important legitimate messages might be incorrectly flagged as spam  
⚠️ **Bias**: Model may reflect biases present in training data  

## Citation

If you use this model, please cite:

```bibtex

@model{sms_spam_detection_bert_2026,

  title={SMS Spam Detection with BERT},

  author={niru-nny},

  year={2026},

  url={https://huggingface.co/niru-nny/SMS_Spam_Detection}

}

```

## License

MIT License

## Contact

For questions or feedback, please open an issue on the [model repository](https://huggingface.co/niru-nny/SMS_Spam_Detection/discussions).

---

**Model Card:** For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.