Upload folder using huggingface_hub

24debe0 verified 25 days ago

4.5 kB

	---
	language: en
	license: mit
	tags:
	- spam-detection
	- text-classification
	- sms
	- bert
	- transformers
	datasets:
	- sms-spam-collection
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	widget:
	- text: "Congratulations! You've won a $1000 gift card. Click here to claim now!"
	example_title: "Spam Example"
	- text: "Hey, are we still meeting for lunch tomorrow at 12?"
	example_title: "Ham Example"
	- text: "URGENT! Your account has been suspended. Verify now to restore access."
	example_title: "Spam Example 2"
	- text: "Thanks for your help today. I really appreciate it!"
	example_title: "Ham Example 2"
	---

	# SMS Spam Detection with BERT

	🎯 A high-performance SMS spam classifier built with BERT achieving 99.16% accuracy.

	## Model Description

	This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:
	- HAM (legitimate message)
	- SPAM (unwanted/spam message)

	## Performance Metrics

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 99.16% \|
	\| Precision \| 97.30% \|
	\| Recall \| 96.43% \|
	\| F1-Score \| 96.86% \|

	## Quick Start

	### Using Transformers Pipeline

	```python
	from transformers import pipeline

	# Load the model
	classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")

	# Classify a message
	result = classifier("Congratulations! You've won a $1000 gift card!")
	print(result)
	# Output: [{'label': 'SPAM', 'score': 0.9987}]
	```

	### Using AutoModel and AutoTokenizer

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	model_name = "niru-nny/SMS_Spam_Detection"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Prepare input
	text = "Hey, are we still meeting for lunch tomorrow?"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

	# Get prediction
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(predictions, dim=-1).item()

	# Map to label
	labels = ["HAM", "SPAM"]
	print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")
	```

	## Training Details

	### Dataset
	- Source: SMS Spam Collection Dataset
	- Total Messages: 5,574
	- Ham Messages: 4,827 (86.6%)
	- Spam Messages: 747 (13.4%)

	### Training Configuration
	- Base Model: `bert-base-uncased`
	- Max Sequence Length: 128 tokens
	- Batch Size: 16
	- Learning Rate: 2e-5
	- Epochs: 3
	- Optimizer: AdamW

	### Data Split
	- Training: 80%
	- Validation: 20%

	## Model Architecture

	```
	Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)
	```

	## Use Cases

	✅ Spam Filtering: Automatically filter spam messages in messaging applications
	✅ SMS Gateway Protection: Protect users from phishing and scam attempts
	✅ Content Moderation: Pre-screen messages in communication platforms
	✅ Fraud Detection: Identify suspicious messages in financial apps

	## Limitations

	- Model is trained specifically on English SMS messages
	- May not generalize well to other languages or message formats
	- Performance may vary on messages with heavy slang or abbreviations
	- Trained on historical data; new spam patterns may emerge

	## Ethical Considerations

	⚠️ Privacy: Ensure compliance with data protection regulations when processing user messages
	⚠️ False Positives: Important legitimate messages might be incorrectly flagged as spam
	⚠️ Bias: Model may reflect biases present in training data

	## Citation

	If you use this model, please cite:

	```bibtex
	@model{sms_spam_detection_bert_2026,
	title={SMS Spam Detection with BERT},
	author={niru-nny},
	year={2026},
	url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
	}
	```

	## License

	MIT License

	## Contact

	For questions or feedback, please open an issue on the [model repository](https://huggingface.co/niru-nny/SMS_Spam_Detection/discussions).

	---

	Model Card: For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.