| | ---
|
| | language: en
|
| | license: mit
|
| | tags:
|
| | - spam-detection
|
| | - text-classification
|
| | - sms
|
| | - bert
|
| | - transformers
|
| | datasets:
|
| | - sms-spam-collection
|
| | metrics:
|
| | - accuracy
|
| | - precision
|
| | - recall
|
| | - f1
|
| | widget:
|
| | - text: "Congratulations! You've won a $1000 gift card. Click here to claim now!"
|
| | example_title: "Spam Example"
|
| | - text: "Hey, are we still meeting for lunch tomorrow at 12?"
|
| | example_title: "Ham Example"
|
| | - text: "URGENT! Your account has been suspended. Verify now to restore access."
|
| | example_title: "Spam Example 2"
|
| | - text: "Thanks for your help today. I really appreciate it!"
|
| | example_title: "Ham Example 2"
|
| | ---
|
| |
|
| | # SMS Spam Detection with BERT
|
| |
|
| | 🎯 A high-performance SMS spam classifier built with BERT achieving **99.16% accuracy**.
|
| |
|
| | ## Model Description
|
| |
|
| | This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:
|
| | - **HAM** (legitimate message)
|
| | - **SPAM** (unwanted/spam message)
|
| |
|
| | ## Performance Metrics
|
| |
|
| | | Metric | Score |
|
| | |--------|-------|
|
| | | **Accuracy** | 99.16% |
|
| | | **Precision** | 97.30% |
|
| | | **Recall** | 96.43% |
|
| | | **F1-Score** | 96.86% |
|
| |
|
| | ## Quick Start
|
| |
|
| | ### Using Transformers Pipeline
|
| |
|
| | ```python
|
| | from transformers import pipeline
|
| |
|
| | # Load the model
|
| | classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")
|
| |
|
| | # Classify a message
|
| | result = classifier("Congratulations! You've won a $1000 gift card!")
|
| | print(result)
|
| | # Output: [{'label': 'SPAM', 'score': 0.9987}]
|
| | ```
|
| |
|
| | ### Using AutoModel and AutoTokenizer
|
| |
|
| | ```python
|
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| | import torch
|
| |
|
| | # Load model and tokenizer
|
| | model_name = "niru-nny/SMS_Spam_Detection"
|
| | tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| | model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
| |
|
| | # Prepare input
|
| | text = "Hey, are we still meeting for lunch tomorrow?"
|
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
|
| |
|
| | # Get prediction
|
| | with torch.no_grad():
|
| | outputs = model(**inputs)
|
| | predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
|
| | predicted_class = torch.argmax(predictions, dim=-1).item()
|
| |
|
| | # Map to label
|
| | labels = ["HAM", "SPAM"]
|
| | print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")
|
| | ```
|
| |
|
| | ## Training Details
|
| |
|
| | ### Dataset
|
| | - **Source:** SMS Spam Collection Dataset
|
| | - **Total Messages:** 5,574
|
| | - **Ham Messages:** 4,827 (86.6%)
|
| | - **Spam Messages:** 747 (13.4%)
|
| |
|
| | ### Training Configuration
|
| | - **Base Model:** `bert-base-uncased`
|
| | - **Max Sequence Length:** 128 tokens
|
| | - **Batch Size:** 16
|
| | - **Learning Rate:** 2e-5
|
| | - **Epochs:** 3
|
| | - **Optimizer:** AdamW
|
| |
|
| | ### Data Split
|
| | - **Training:** 80%
|
| | - **Validation:** 20%
|
| |
|
| | ## Model Architecture
|
| |
|
| | ```
|
| | Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)
|
| | ```
|
| |
|
| | ## Use Cases
|
| |
|
| | ✅ **Spam Filtering**: Automatically filter spam messages in messaging applications
|
| | ✅ **SMS Gateway Protection**: Protect users from phishing and scam attempts
|
| | ✅ **Content Moderation**: Pre-screen messages in communication platforms
|
| | ✅ **Fraud Detection**: Identify suspicious messages in financial apps
|
| |
|
| | ## Limitations
|
| |
|
| | - Model is trained specifically on English SMS messages
|
| | - May not generalize well to other languages or message formats
|
| | - Performance may vary on messages with heavy slang or abbreviations
|
| | - Trained on historical data; new spam patterns may emerge
|
| |
|
| | ## Ethical Considerations
|
| |
|
| | ⚠️ **Privacy**: Ensure compliance with data protection regulations when processing user messages
|
| | ⚠️ **False Positives**: Important legitimate messages might be incorrectly flagged as spam
|
| | ⚠️ **Bias**: Model may reflect biases present in training data
|
| |
|
| | ## Citation
|
| |
|
| | If you use this model, please cite:
|
| |
|
| | ```bibtex
|
| | @model{sms_spam_detection_bert_2026,
|
| | title={SMS Spam Detection with BERT},
|
| | author={niru-nny},
|
| | year={2026},
|
| | url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
|
| | }
|
| | ```
|
| |
|
| | ## License
|
| |
|
| | MIT License
|
| |
|
| | ## Contact
|
| |
|
| | For questions or feedback, please open an issue on the [model repository](https://huggingface.co/niru-nny/SMS_Spam_Detection/discussions).
|
| |
|
| | ---
|
| |
|
| | **Model Card:** For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.
|
| |
|