AventIQ-AI
/

SMS-Spam-Detection-Model

Safetensors

distilbert

Model card Files Files and versions

xet

Community

AmanSengar commited on May 23, 2025

Commit

4405582

verified ·

1 Parent(s): 4163b86

Create README.md

Browse files

Files changed (1) hide show

README.md +116 -0

README.md ADDED Viewed

	@@ -0,0 +1,116 @@

+**🧠 SMSDetection-DistilBERT-SMS**
+A DistilBERT-based binary classifier fine-tuned on the SMS Spam Collection dataset. It classifies messages as either **spam** or **ham** (not spam). This model is suitable for real-world applications like mobile SMS spam filters, automated customer message triage, and telecom fraud detection.
+---
+✨ **Model Highlights**
+- 📌  Based on `distilbert-base-uncased`
+- 🔍 Fine-tuned on the SMS Spam Collection dataset
+- ⚡ Supports binary classification: Spam vs Not Spam
+- 💾 Lightweight and optimized for both CPU and GPU environments
+---
+🧠 Intended Uses
+- ✅ Mobile SMS spam filtering
+- ✅ Telecom customer service automation
+- ✅ Fraudulent message detection
+- ✅ User inbox categorization
+- ✅ Regulatory compliance monitoring
+---
+- 🚫 Limitations
+- ❌ Trained on English SMS messages only
+- ❌ May underperform on emails, social media texts, or non-English content
+- ❌ Not designed for multilingual datasets
+- ❌ Slight performance dip expected for long messages (>128 tokens)
+---
+🏋️‍♂️ Training Details
+| Field          | Value                          |
+| -------------- | ------------------------------ |
+| **Base Model** | `distilbert-base-uncased`      |
+| **Dataset**    |SMS Spam Collection (UCI)       |
+| **Framework**  | PyTorch with 🤗 Transformers   |
+| **Epochs**     | 3                              |
+| **Batch Size** | 16                             |
+| **Max Length** | 128 tokens                     |
+| **Optimizer**  | AdamW                          |
+| **Loss**       | CrossEntropyLoss (token-level) |
+| **Device**     | Trained on CUDA-enabled GPU    |
+---
+📊 Evaluation Metrics
+| Metric                                          | Score |
+| ----------------------------------------------- | ----- |
+| Accuracy                                        | 0.99  |
+| F1-Score                                        | 0.96  |
+| Precision                                       | 0.98  |
+| Recall                                          | 0.93  |
+---
+---
+🚀 Usage
+```python
+from transformers import BertTokenizerFast, BertForTokenClassification
+from transformers import pipeline
+import torch
+model_name = "AventIQ-AI/SMS-Spam-Detection-Model"
+tokenizer = BertTokenizerFast.from_pretrained(model_name)
+model = BertForTokenClassification.from_pretrained(model_name)
+model.eval()
+# Inference
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+def predict_sms(text):
+    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    with torch.no_grad():
+        outputs = model(**inputs)
+        logits = outputs.logits
+        predicted = torch.argmax(logits, dim=1).item()
+    return "spam" if predicted == 1 else "ham"
+# Test example
+print(predict_sms("You've won $1,000,000! Call now to claim your prize!"))
+```
+---
+- 🧩 Quantization
+- Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices.
+----
+🗂 Repository Structure
+```
+.
+├── model/               # Quantized model files
+├── tokenizer_config/    # Tokenizer and vocab files
+├── model.safensors/     # Fine-tuned model in safetensors format
+├── README.md            # Model card
+```
+---
+🤝 Contributing
+Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.