---
language: en
license: mit
tags:
- spam-detection
- text-classification
- sms
- bert
- transformers
datasets:
- sms-spam-collection
metrics:
- accuracy
- precision
- recall
- f1
widget:
- text: "Congratulations! You've won a $1000 gift card. Click here to claim now!"
  example_title: "Spam Example"
- text: "Hey, are we still meeting for lunch tomorrow at 12?"
  example_title: "Ham Example"
- text: "URGENT! Your account has been suspended. Verify now to restore access."
  example_title: "Spam Example 2"
- text: "Thanks for your help today. I really appreciate it!"
  example_title: "Ham Example 2"
---

# SMS Spam Detection with BERT

🎯 A high-performance SMS spam classifier built with BERT achieving **99.16% accuracy**.

## Model Description

This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:
- **HAM** (legitimate message)
- **SPAM** (unwanted/spam message)

## Performance Metrics

| Metric | Score |
|--------|-------|
| **Accuracy** | 99.16% |
| **Precision** | 97.30% |
| **Recall** | 96.43% |
| **F1-Score** | 96.86% |

## Quick Start

### Using Transformers Pipeline

```python
from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")

# Classify a message
result = classifier("Congratulations! You've won a $1000 gift card!")
print(result)
# Output: [{'label': 'SPAM', 'score': 0.9987}]
```

### Using AutoModel and AutoTokenizer

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "niru-nny/SMS_Spam_Detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Hey, are we still meeting for lunch tomorrow?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# Map to label
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")
```

## Training Details

### Dataset
- **Source:** SMS Spam Collection Dataset
- **Total Messages:** 5,574
- **Ham Messages:** 4,827 (86.6%)
- **Spam Messages:** 747 (13.4%)

### Training Configuration
- **Base Model:** `bert-base-uncased`
- **Max Sequence Length:** 128 tokens
- **Batch Size:** 16
- **Learning Rate:** 2e-5
- **Epochs:** 3
- **Optimizer:** AdamW

### Data Split
- **Training:** 80%
- **Validation:** 20%

## Model Architecture

```
Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)
```

## Use Cases

✅ **Spam Filtering**: Automatically filter spam messages in messaging applications  
✅ **SMS Gateway Protection**: Protect users from phishing and scam attempts  
✅ **Content Moderation**: Pre-screen messages in communication platforms  
✅ **Fraud Detection**: Identify suspicious messages in financial apps  

## Limitations

- Model is trained specifically on English SMS messages
- May not generalize well to other languages or message formats
- Performance may vary on messages with heavy slang or abbreviations
- Trained on historical data; new spam patterns may emerge

## Ethical Considerations

⚠️ **Privacy**: Ensure compliance with data protection regulations when processing user messages  
⚠️ **False Positives**: Important legitimate messages might be incorrectly flagged as spam  
⚠️ **Bias**: Model may reflect biases present in training data  

## Citation

If you use this model, please cite:

```bibtex
@model{sms_spam_detection_bert_2026,
  title={SMS Spam Detection with BERT},
  author={niru-nny},
  year={2026},
  url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
}
```

## License

MIT License

## Contact

For questions or feedback, please open an issue on the [model repository](https://huggingface.co/niru-nny/SMS_Spam_Detection/discussions).

---

**Model Card:** For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.