SMS_Spam_Detection / README.md
niru-nny's picture
Upload folder using huggingface_hub
24debe0 verified
metadata
language: en
license: mit
tags:
  - spam-detection
  - text-classification
  - sms
  - bert
  - transformers
datasets:
  - sms-spam-collection
metrics:
  - accuracy
  - precision
  - recall
  - f1
widget:
  - text: Congratulations! You've won a $1000 gift card. Click here to claim now!
    example_title: Spam Example
  - text: Hey, are we still meeting for lunch tomorrow at 12?
    example_title: Ham Example
  - text: URGENT! Your account has been suspended. Verify now to restore access.
    example_title: Spam Example 2
  - text: Thanks for your help today. I really appreciate it!
    example_title: Ham Example 2

SMS Spam Detection with BERT

🎯 A high-performance SMS spam classifier built with BERT achieving 99.16% accuracy.

Model Description

This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:

  • HAM (legitimate message)
  • SPAM (unwanted/spam message)

Performance Metrics

Metric Score
Accuracy 99.16%
Precision 97.30%
Recall 96.43%
F1-Score 96.86%

Quick Start

Using Transformers Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")

# Classify a message
result = classifier("Congratulations! You've won a $1000 gift card!")
print(result)
# Output: [{'label': 'SPAM', 'score': 0.9987}]

Using AutoModel and AutoTokenizer

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "niru-nny/SMS_Spam_Detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Hey, are we still meeting for lunch tomorrow?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# Map to label
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")

Training Details

Dataset

  • Source: SMS Spam Collection Dataset
  • Total Messages: 5,574
  • Ham Messages: 4,827 (86.6%)
  • Spam Messages: 747 (13.4%)

Training Configuration

  • Base Model: bert-base-uncased
  • Max Sequence Length: 128 tokens
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Epochs: 3
  • Optimizer: AdamW

Data Split

  • Training: 80%
  • Validation: 20%

Model Architecture

Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)

Use Cases

Spam Filtering: Automatically filter spam messages in messaging applications
SMS Gateway Protection: Protect users from phishing and scam attempts
Content Moderation: Pre-screen messages in communication platforms
Fraud Detection: Identify suspicious messages in financial apps

Limitations

  • Model is trained specifically on English SMS messages
  • May not generalize well to other languages or message formats
  • Performance may vary on messages with heavy slang or abbreviations
  • Trained on historical data; new spam patterns may emerge

Ethical Considerations

⚠️ Privacy: Ensure compliance with data protection regulations when processing user messages
⚠️ False Positives: Important legitimate messages might be incorrectly flagged as spam
⚠️ Bias: Model may reflect biases present in training data

Citation

If you use this model, please cite:

@model{sms_spam_detection_bert_2026,
  title={SMS Spam Detection with BERT},
  author={niru-nny},
  year={2026},
  url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
}

License

MIT License

Contact

For questions or feedback, please open an issue on the model repository.


Model Card: For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.