--- language: en license: mit tags: - spam-detection - text-classification - sms - bert - transformers datasets: - sms-spam-collection metrics: - accuracy - precision - recall - f1 widget: - text: "Congratulations! You've won a $1000 gift card. Click here to claim now!" example_title: "Spam Example" - text: "Hey, are we still meeting for lunch tomorrow at 12?" example_title: "Ham Example" - text: "URGENT! Your account has been suspended. Verify now to restore access." example_title: "Spam Example 2" - text: "Thanks for your help today. I really appreciate it!" example_title: "Ham Example 2" --- # SMS Spam Detection with BERT 🎯 A high-performance SMS spam classifier built with BERT achieving **99.16% accuracy**. ## Model Description This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either: - **HAM** (legitimate message) - **SPAM** (unwanted/spam message) ## Performance Metrics | Metric | Score | |--------|-------| | **Accuracy** | 99.16% | | **Precision** | 97.30% | | **Recall** | 96.43% | | **F1-Score** | 96.86% | ## Quick Start ### Using Transformers Pipeline ```python from transformers import pipeline # Load the model classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection") # Classify a message result = classifier("Congratulations! You've won a $1000 gift card!") print(result) # Output: [{'label': 'SPAM', 'score': 0.9987}] ``` ### Using AutoModel and AutoTokenizer ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # Load model and tokenizer model_name = "niru-nny/SMS_Spam_Detection" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Prepare input text = "Hey, are we still meeting for lunch tomorrow?" inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) # Get prediction with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions, dim=-1).item() # Map to label labels = ["HAM", "SPAM"] print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})") ``` ## Training Details ### Dataset - **Source:** SMS Spam Collection Dataset - **Total Messages:** 5,574 - **Ham Messages:** 4,827 (86.6%) - **Spam Messages:** 747 (13.4%) ### Training Configuration - **Base Model:** `bert-base-uncased` - **Max Sequence Length:** 128 tokens - **Batch Size:** 16 - **Learning Rate:** 2e-5 - **Epochs:** 3 - **Optimizer:** AdamW ### Data Split - **Training:** 80% - **Validation:** 20% ## Model Architecture ``` Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM) ``` ## Use Cases ✅ **Spam Filtering**: Automatically filter spam messages in messaging applications ✅ **SMS Gateway Protection**: Protect users from phishing and scam attempts ✅ **Content Moderation**: Pre-screen messages in communication platforms ✅ **Fraud Detection**: Identify suspicious messages in financial apps ## Limitations - Model is trained specifically on English SMS messages - May not generalize well to other languages or message formats - Performance may vary on messages with heavy slang or abbreviations - Trained on historical data; new spam patterns may emerge ## Ethical Considerations ⚠️ **Privacy**: Ensure compliance with data protection regulations when processing user messages ⚠️ **False Positives**: Important legitimate messages might be incorrectly flagged as spam ⚠️ **Bias**: Model may reflect biases present in training data ## Citation If you use this model, please cite: ```bibtex @model{sms_spam_detection_bert_2026, title={SMS Spam Detection with BERT}, author={niru-nny}, year={2026}, url={https://huggingface.co/niru-nny/SMS_Spam_Detection} } ``` ## License MIT License ## Contact For questions or feedback, please open an issue on the [model repository](https://huggingface.co/niru-nny/SMS_Spam_Detection/discussions). --- **Model Card:** For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.