SMS_Spam_Detection / README.md
niru-nny's picture
Upload folder using huggingface_hub
24debe0 verified
---
language: en
license: mit
tags:
- spam-detection
- text-classification
- sms
- bert
- transformers
datasets:
- sms-spam-collection
metrics:
- accuracy
- precision
- recall
- f1
widget:
- text: "Congratulations! You've won a $1000 gift card. Click here to claim now!"
example_title: "Spam Example"
- text: "Hey, are we still meeting for lunch tomorrow at 12?"
example_title: "Ham Example"
- text: "URGENT! Your account has been suspended. Verify now to restore access."
example_title: "Spam Example 2"
- text: "Thanks for your help today. I really appreciate it!"
example_title: "Ham Example 2"
---
# SMS Spam Detection with BERT
🎯 A high-performance SMS spam classifier built with BERT achieving **99.16% accuracy**.
## Model Description
This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:
- **HAM** (legitimate message)
- **SPAM** (unwanted/spam message)
## Performance Metrics
| Metric | Score |
|--------|-------|
| **Accuracy** | 99.16% |
| **Precision** | 97.30% |
| **Recall** | 96.43% |
| **F1-Score** | 96.86% |
## Quick Start
### Using Transformers Pipeline
```python
from transformers import pipeline
# Load the model
classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")
# Classify a message
result = classifier("Congratulations! You've won a $1000 gift card!")
print(result)
# Output: [{'label': 'SPAM', 'score': 0.9987}]
```
### Using AutoModel and AutoTokenizer
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "niru-nny/SMS_Spam_Detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare input
text = "Hey, are we still meeting for lunch tomorrow?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
# Map to label
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")
```
## Training Details
### Dataset
- **Source:** SMS Spam Collection Dataset
- **Total Messages:** 5,574
- **Ham Messages:** 4,827 (86.6%)
- **Spam Messages:** 747 (13.4%)
### Training Configuration
- **Base Model:** `bert-base-uncased`
- **Max Sequence Length:** 128 tokens
- **Batch Size:** 16
- **Learning Rate:** 2e-5
- **Epochs:** 3
- **Optimizer:** AdamW
### Data Split
- **Training:** 80%
- **Validation:** 20%
## Model Architecture
```
Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)
```
## Use Cases
**Spam Filtering**: Automatically filter spam messages in messaging applications
**SMS Gateway Protection**: Protect users from phishing and scam attempts
**Content Moderation**: Pre-screen messages in communication platforms
**Fraud Detection**: Identify suspicious messages in financial apps
## Limitations
- Model is trained specifically on English SMS messages
- May not generalize well to other languages or message formats
- Performance may vary on messages with heavy slang or abbreviations
- Trained on historical data; new spam patterns may emerge
## Ethical Considerations
⚠️ **Privacy**: Ensure compliance with data protection regulations when processing user messages
⚠️ **False Positives**: Important legitimate messages might be incorrectly flagged as spam
⚠️ **Bias**: Model may reflect biases present in training data
## Citation
If you use this model, please cite:
```bibtex
@model{sms_spam_detection_bert_2026,
title={SMS Spam Detection with BERT},
author={niru-nny},
year={2026},
url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
}
```
## License
MIT License
## Contact
For questions or feedback, please open an issue on the [model repository](https://huggingface.co/niru-nny/SMS_Spam_Detection/discussions).
---
**Model Card:** For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.