Upload folder using huggingface_hub

24debe0 verified 25 days ago

4.5 kB

language: en
license: mit
tags:
  - spam-detection
  - text-classification
  - sms
  - bert
  - transformers
datasets:
  - sms-spam-collection
metrics:
  - accuracy
  - precision
  - recall
  - f1
widget:
  - text: Congratulations! You've won a $1000 gift card. Click here to claim now!
    example_title: Spam Example
  - text: Hey, are we still meeting for lunch tomorrow at 12?
    example_title: Ham Example
  - text: URGENT! Your account has been suspended. Verify now to restore access.
    example_title: Spam Example 2
  - text: Thanks for your help today. I really appreciate it!
    example_title: Ham Example 2

SMS Spam Detection with BERT

🎯 A high-performance SMS spam classifier built with BERT achieving 99.16% accuracy.

Model Description

This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:

HAM (legitimate message)
SPAM (unwanted/spam message)

Performance Metrics

Metric	Score
Accuracy	99.16%
Precision	97.30%
Recall	96.43%
F1-Score	96.86%

Quick Start

Using Transformers Pipeline

from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")

# Classify a message
result = classifier("Congratulations! You've won a $1000 gift card!")
print(result)
# Output: [{'label': 'SPAM', 'score': 0.9987}]

Using AutoModel and AutoTokenizer

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "niru-nny/SMS_Spam_Detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Hey, are we still meeting for lunch tomorrow?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# Map to label
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")

Training Details

Dataset

Source: SMS Spam Collection Dataset
Total Messages: 5,574
Ham Messages: 4,827 (86.6%)
Spam Messages: 747 (13.4%)

Training Configuration

Base Model: bert-base-uncased
Max Sequence Length: 128 tokens
Batch Size: 16
Learning Rate: 2e-5
Epochs: 3
Optimizer: AdamW

Data Split

Training: 80%
Validation: 20%

Model Architecture

Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)

Use Cases

✅ Spam Filtering: Automatically filter spam messages in messaging applications
✅ SMS Gateway Protection: Protect users from phishing and scam attempts
✅ Content Moderation: Pre-screen messages in communication platforms
✅ Fraud Detection: Identify suspicious messages in financial apps

Limitations

Model is trained specifically on English SMS messages
May not generalize well to other languages or message formats
Performance may vary on messages with heavy slang or abbreviations
Trained on historical data; new spam patterns may emerge

Ethical Considerations

⚠️ Privacy: Ensure compliance with data protection regulations when processing user messages
⚠️ False Positives: Important legitimate messages might be incorrectly flagged as spam
⚠️ Bias: Model may reflect biases present in training data

Citation

If you use this model, please cite:

@model{sms_spam_detection_bert_2026,
  title={SMS Spam Detection with BERT},
  author={niru-nny},
  year={2026},
  url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
}

License

MIT License

Contact

For questions or feedback, please open an issue on the model repository.

Model Card: For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.