niru-nny commited on
Commit
24debe0
·
verified ·
1 Parent(s): a357728

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +156 -1
  2. app.py +104 -0
README.md CHANGED
@@ -1,3 +1,158 @@
1
  ---
2
- license: bsd-3-clause
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - spam-detection
6
+ - text-classification
7
+ - sms
8
+ - bert
9
+ - transformers
10
+ datasets:
11
+ - sms-spam-collection
12
+ metrics:
13
+ - accuracy
14
+ - precision
15
+ - recall
16
+ - f1
17
+ widget:
18
+ - text: "Congratulations! You've won a $1000 gift card. Click here to claim now!"
19
+ example_title: "Spam Example"
20
+ - text: "Hey, are we still meeting for lunch tomorrow at 12?"
21
+ example_title: "Ham Example"
22
+ - text: "URGENT! Your account has been suspended. Verify now to restore access."
23
+ example_title: "Spam Example 2"
24
+ - text: "Thanks for your help today. I really appreciate it!"
25
+ example_title: "Ham Example 2"
26
  ---
27
+
28
+ # SMS Spam Detection with BERT
29
+
30
+ 🎯 A high-performance SMS spam classifier built with BERT achieving **99.16% accuracy**.
31
+
32
+ ## Model Description
33
+
34
+ This model is a fine-tuned BERT classifier designed to detect spam messages in SMS text. It can classify messages as either:
35
+ - **HAM** (legitimate message)
36
+ - **SPAM** (unwanted/spam message)
37
+
38
+ ## Performance Metrics
39
+
40
+ | Metric | Score |
41
+ |--------|-------|
42
+ | **Accuracy** | 99.16% |
43
+ | **Precision** | 97.30% |
44
+ | **Recall** | 96.43% |
45
+ | **F1-Score** | 96.86% |
46
+
47
+ ## Quick Start
48
+
49
+ ### Using Transformers Pipeline
50
+
51
+ ```python
52
+ from transformers import pipeline
53
+
54
+ # Load the model
55
+ classifier = pipeline("text-classification", model="niru-nny/SMS_Spam_Detection")
56
+
57
+ # Classify a message
58
+ result = classifier("Congratulations! You've won a $1000 gift card!")
59
+ print(result)
60
+ # Output: [{'label': 'SPAM', 'score': 0.9987}]
61
+ ```
62
+
63
+ ### Using AutoModel and AutoTokenizer
64
+
65
+ ```python
66
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
67
+ import torch
68
+
69
+ # Load model and tokenizer
70
+ model_name = "niru-nny/SMS_Spam_Detection"
71
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
72
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
73
+
74
+ # Prepare input
75
+ text = "Hey, are we still meeting for lunch tomorrow?"
76
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
77
+
78
+ # Get prediction
79
+ with torch.no_grad():
80
+ outputs = model(**inputs)
81
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
82
+ predicted_class = torch.argmax(predictions, dim=-1).item()
83
+
84
+ # Map to label
85
+ labels = ["HAM", "SPAM"]
86
+ print(f"Prediction: {labels[predicted_class]} (confidence: {predictions[0][predicted_class]:.4f})")
87
+ ```
88
+
89
+ ## Training Details
90
+
91
+ ### Dataset
92
+ - **Source:** SMS Spam Collection Dataset
93
+ - **Total Messages:** 5,574
94
+ - **Ham Messages:** 4,827 (86.6%)
95
+ - **Spam Messages:** 747 (13.4%)
96
+
97
+ ### Training Configuration
98
+ - **Base Model:** `bert-base-uncased`
99
+ - **Max Sequence Length:** 128 tokens
100
+ - **Batch Size:** 16
101
+ - **Learning Rate:** 2e-5
102
+ - **Epochs:** 3
103
+ - **Optimizer:** AdamW
104
+
105
+ ### Data Split
106
+ - **Training:** 80%
107
+ - **Validation:** 20%
108
+
109
+ ## Model Architecture
110
+
111
+ ```
112
+ Input Text → BERT Tokenizer → BERT Encoder (12 layers) → [CLS] Token → Classification Head → Output (HAM/SPAM)
113
+ ```
114
+
115
+ ## Use Cases
116
+
117
+ ✅ **Spam Filtering**: Automatically filter spam messages in messaging applications
118
+ ✅ **SMS Gateway Protection**: Protect users from phishing and scam attempts
119
+ ✅ **Content Moderation**: Pre-screen messages in communication platforms
120
+ ✅ **Fraud Detection**: Identify suspicious messages in financial apps
121
+
122
+ ## Limitations
123
+
124
+ - Model is trained specifically on English SMS messages
125
+ - May not generalize well to other languages or message formats
126
+ - Performance may vary on messages with heavy slang or abbreviations
127
+ - Trained on historical data; new spam patterns may emerge
128
+
129
+ ## Ethical Considerations
130
+
131
+ ⚠️ **Privacy**: Ensure compliance with data protection regulations when processing user messages
132
+ ⚠️ **False Positives**: Important legitimate messages might be incorrectly flagged as spam
133
+ ⚠️ **Bias**: Model may reflect biases present in training data
134
+
135
+ ## Citation
136
+
137
+ If you use this model, please cite:
138
+
139
+ ```bibtex
140
+ @model{sms_spam_detection_bert_2026,
141
+ title={SMS Spam Detection with BERT},
142
+ author={niru-nny},
143
+ year={2026},
144
+ url={https://huggingface.co/niru-nny/SMS_Spam_Detection}
145
+ }
146
+ ```
147
+
148
+ ## License
149
+
150
+ MIT License
151
+
152
+ ## Contact
153
+
154
+ For questions or feedback, please open an issue on the [model repository](https://huggingface.co/niru-nny/SMS_Spam_Detection/discussions).
155
+
156
+ ---
157
+
158
+ **Model Card:** For detailed information about model development, evaluation, and responsible AI considerations, see the complete model card in the repository.
app.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
3
+ import torch
4
+
5
+ # Load model and tokenizer
6
+ model_name = "niru-nny/SMS_Spam_Detection"
7
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
8
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
9
+
10
+ def classify_message(text):
11
+ """
12
+ Classify an SMS message as HAM or SPAM
13
+
14
+ Args:
15
+ text: Input SMS message text
16
+
17
+ Returns:
18
+ Dictionary with classification results
19
+ """
20
+ if not text or text.strip() == "":
21
+ return {"Error": "Please enter a message"}
22
+
23
+ # Tokenize input
24
+ inputs = tokenizer(
25
+ text,
26
+ return_tensors="pt",
27
+ truncation=True,
28
+ padding=True,
29
+ max_length=128
30
+ )
31
+
32
+ # Get prediction
33
+ with torch.no_grad():
34
+ outputs = model(**inputs)
35
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
36
+
37
+ # Extract probabilities
38
+ ham_prob = predictions[0][0].item()
39
+ spam_prob = predictions[0][1].item()
40
+
41
+ return {
42
+ "HAM (Legitimate)": ham_prob,
43
+ "SPAM": spam_prob
44
+ }
45
+
46
+ # Example messages
47
+ examples = [
48
+ ["Congratulations! You've won a $1000 gift card. Click here to claim now!"],
49
+ ["Hey, are we still meeting for lunch tomorrow at 12?"],
50
+ ["URGENT! Your account has been suspended. Verify now to restore access."],
51
+ ["Thanks for your help today. I really appreciate it!"],
52
+ ["FREE entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121"],
53
+ ["I'll call you later tonight after work."],
54
+ ["WINNER!! As a valued customer, you have been selected to receive £900 prize reward!"],
55
+ ["Can you pick up some milk on your way home?"],
56
+ ]
57
+
58
+ # Create Gradio interface
59
+ demo = gr.Interface(
60
+ fn=classify_message,
61
+ inputs=gr.Textbox(
62
+ lines=5,
63
+ placeholder="Enter SMS message here...",
64
+ label="SMS Message"
65
+ ),
66
+ outputs=gr.Label(num_top_classes=2, label="Classification Results"),
67
+ title="📱 SMS Spam Detection",
68
+ description="""
69
+ This classifier uses a fine-tuned BERT model to detect spam in SMS messages.
70
+
71
+ **Performance:** 99.16% Accuracy | 97.30% Precision | 96.43% Recall
72
+
73
+ Simply enter an SMS message below and the model will classify it as either legitimate (HAM) or spam (SPAM).
74
+ """,
75
+ examples=examples,
76
+ theme=gr.themes.Soft(),
77
+ article="""
78
+ ### About This Model
79
+
80
+ This model is trained on the SMS Spam Collection dataset and achieves state-of-the-art performance in spam detection.
81
+
82
+ **Model:** `niru-nny/SMS_Spam_Detection`
83
+ **Base Architecture:** BERT (bert-base-uncased)
84
+ **Dataset:** SMS Spam Collection (5,574 messages)
85
+
86
+ ### Use Cases
87
+ - 📧 Spam filtering in messaging apps
88
+ - 🛡️ Protection against phishing attempts
89
+ - 🔍 Content moderation
90
+ - 💰 Fraud detection
91
+
92
+ ### Tips for Best Results
93
+ - The model works best with English text messages
94
+ - Keep messages under 128 words for optimal performance
95
+ - The model is trained on SMS-style language (abbreviations, slang included)
96
+
97
+ ---
98
+
99
+ **License:** MIT | [Model Card](https://huggingface.co/niru-nny/SMS_Spam_Detection) | [GitHub](https://github.com/niru-nny)
100
+ """
101
+ )
102
+
103
+ if __name__ == "__main__":
104
+ demo.launch()