sibikrish commited on
Commit
56c33f2
·
verified ·
1 Parent(s): cdaa963

Upload 7 files

Browse files
Files changed (7) hide show
  1. README.md +219 -13
  2. about.md +64 -0
  3. app.py +81 -0
  4. docker_app.py +111 -0
  5. feedback.csv +15 -0
  6. requirements.txt +12 -0
  7. streamlit_app.py +136 -0
README.md CHANGED
@@ -1,13 +1,219 @@
1
- ---
2
- title: Sms Spam Detection
3
- emoji: 👀
4
- colorFrom: purple
5
- colorTo: yellow
6
- sdk: streamlit
7
- sdk_version: 1.36.0
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- ---
12
-
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <a href = "https://github.com/Sibikrish3000/sms-spam-detection" > <img src = "https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/spam.png?raw=true" alt = "sms spam image" width=500 height=280> </a>
3
+ </p>
4
+ <h1 align="center"> SMS Spam Detection Web Application </h1>
5
+
6
+ <p align="center">
7
+ This application leverages machine learning to detect spam messages
8
+ </p>
9
+
10
+ <p align="center">
11
+ <a href="https://github.com/Sibikrish3000/sms-spam-detection/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Sibikrish3000/sms-spam-detection" alt="GitHub license"></a>
12
+ <a href="https://github.com/Sibikrish3000/sms-spam-detection/stargazers"><img src="https://img.shields.io/github/stars/Sibikrish3000/sms-spam-detection?style=social" alt="GitHub stars"></a>
13
+ <a href="https://github.com/Sibikrish3000/sms-spam-detection/issues"><img src="https://img.shields.io/github/issues/Sibikrish3000/sms-spam-detection" alt="GitHub issues">
14
+ </p>
15
+ <p align="center">
16
+ <a href="https://scikit-learn.org/"><img src=https://img.shields.io/badge/sklearn-darkorange.svg?style=flat&logo=scikit-learn&logoColor=white alt="sklearn"></a>
17
+ <a href="https://www.python.org"><img src="https://img.shields.io/badge/Python-yellow.svg?style=flat&logo=python&logoColor=white" alt="language"></a>
18
+ <a href="https://fastapi.tiangolo.com/" ><img src="https://img.shields.io/badge/FastAPI-darkgreen.svg?style=flat&logo=fastapi&logoColor=white " alt="fastapi"></a> <a href="https://hub.docker.com/repository/docker/sibikrish3000/sms-spam-detection/"><img src="https://img.shields.io/badge/Docker-blue?style=flat&logo=docker&logoColor=white" alt= "docker"></a>
19
+ <a href="https://www.streamlit.io"><img src="https://img.shields.io/badge/Streamlit-e63946?style=flat&logo=streamlit&logoColor=white" alt="streamlit"></a>
20
+ </p>
21
+
22
+
23
+ This repository contains a web application for detecting spam SMS messages. The application uses machine learning models (Extra Trees and Bernoulli Naive Bayes) to classify messages as spam or not spam. The app also allows users to provide feedback on the classification results, which can be used to retrain the models periodically.
24
+
25
+ [Dataset](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset)
26
+ ## Try on Streamlit
27
+ <p>
28
+ <a href="https://www.streamlit.io"><img src="https://img.shields.io/badge/Streamlit-e63946?style=flat&logo=streamlit&logoColor=linear-gradient(360deg, #f093fb 0%, #f5576c 100%)" alt="streamlit" width="160" height="50" ></a>
29
+ </p>
30
+
31
+ ## Try on Huggingface Space
32
+ <p>
33
+ <a href="https://huggingface.co/spaces/sibikrish/sms-spam-detection?theme=dark"><img src="https://img.shields.io/badge/Huggingface-white?style=flat&logo=huggingface&logoSize=amd" alt="huggingface" width="160" height="50" ></a>
34
+ </p>
35
+
36
+
37
+
38
+ ### Features
39
+
40
+ - **Prediction**: Classify SMS messages as spam or not spam using Extra Trees or Bernoulli Naive Bayes models.
41
+ - **Feedback**: Users can provide feedback on the predictions to improve model performance.
42
+ - **Continuous Training**: The application supports periodic retraining of models using the feedback data.
43
+
44
+ ## Project Structure
45
+
46
+ ```
47
+ /sms-spam-detection
48
+
49
+ ├──/model
50
+ │ ├── BernoulliNB.pkl
51
+ │ └── Extra_Tree.pkl
52
+
53
+ ├──/static
54
+ │ └──/images
55
+
56
+ ├── app.py
57
+ ├── streamlit_app.py
58
+ ├── docker_app.py
59
+ ├── Dockerfile
60
+ ├── Dockerfile.fastapi
61
+ ├── docker-compose.yml
62
+ ├── requirements.txt
63
+ ````
64
+
65
+ - `app.py`: Defines the FastAPI application.
66
+ - `streamlit_app.py`: Defines the streamlit webapp.
67
+ - `docker_app.py`: streamlit webapp for docker
68
+ - `Dockerfile`: Dockerfile for building the Docker image.
69
+ - `docker-compose.yml`: Docker Compose file for orchestrating the services.
70
+ - `requirements.txt`: List of dependencies.
71
+ - `model/`: Directory containing pre-trained machine learning models.
72
+ - `static/`: Directory containing static files such as images used in the interface.
73
+
74
+
75
+
76
+ ### Installation
77
+
78
+ 1. **Clone the repository**:
79
+ ```sh
80
+ git clone https://github.com/Sibikrish3000/sms-spam-detection.git
81
+ cd sms-spam-detection
82
+ ```
83
+
84
+ 2. **Install the required packages**:
85
+ ```sh
86
+ pip install -r requirements.txt
87
+ ```
88
+
89
+ 3. **Download NLTK data**:
90
+ ```
91
+ python -m nltk.downloader punkt
92
+ python -m nltk.downloader stopwords
93
+ ```
94
+
95
+ ## Run Locally
96
+
97
+ 1. **Start the FastAPI Server**:
98
+ ```sh
99
+ uvicorn app:app --host 0.0.0.0 --port 8000 --reload
100
+ ```
101
+
102
+ 2. **Run the Streamlit Application**:
103
+ ```sh
104
+ streamlit run streamlit_app.py
105
+ ```
106
+ ### Using Docker Compose
107
+
108
+ 1. Build and start the containers:
109
+ ```sh
110
+ docker network create AIservice
111
+ ```
112
+ ```sh
113
+ docker-compose up --build
114
+ ```
115
+
116
+ 2. Access the streamlit webapp at [http://localhost:8501](http://localhost:8080).
117
+
118
+ ### Using Docker image
119
+
120
+ ```sh
121
+ docker network create AIservice
122
+ ```
123
+ ```sh
124
+ docker pull sibikrish/sms-spam-detection:latest
125
+ docker run sibikrish/sms-spam-detection:latest #or
126
+ docker run -d -p 8501:8501 sibikrish/sms-spam-detection:latest
127
+ ```
128
+ ## Development
129
+ ### Running in a Gitpod Cloud Environment
130
+
131
+ **Click the button below to start a new development environment:**
132
+
133
+ [![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/Sibikrish3000/sms-spam-detection)
134
+
135
+ ### Usage
136
+
137
+ - **Enter SMS Message**: Input the SMS message you want to classify.
138
+ - **Select Model**: Choose between Extra Trees and Bernoulli Naive Bayes models.
139
+ - **Predict**: Click the "Predict" button to see the classification result.
140
+ - **Feedback**: Provide feedback on the prediction by marking the message as spam or not spam and submit.
141
+
142
+ ### Continuous Training (CT) in MLOps
143
+
144
+ Continuous Training (CT) ensures that the machine learning models stay up-to-date with new data and feedback. Here are some suggestions for implementing CT for this application:
145
+
146
+ #### Online Learning
147
+
148
+ Online learning is suitable for scenarios where data arrives continuously, and the model needs to update frequently.
149
+
150
+ - **Implementation**: Implement online learning techniques where models are updated incrementally as new labeled data arrives.
151
+ Use techniques like stochastic gradient descent or mini-batch learning to update models in real-time based on user feedback. Use the `partial_fit()` method available in some scikit-learn models
152
+ (e.g., SGDClassifier,BernoulliNB) to update the model incrementally.
153
+ - **Benefits**: The model updates with each new feedback, allowing it to adapt quickly to new patterns.
154
+ - **Challenges**: May require more careful tuning and monitoring to ensure model stability.
155
+
156
+ #### Offline Learning
157
+
158
+ Offline learning involves retraining the model periodically with the accumulated feedback data.
159
+
160
+
161
+ - **Implementation**: Retrain the model every fixed interval (e.g., daily, weekly) using the feedback data stored in the CSV file.
162
+ - **Benefits**: Simpler to implement and manage, as retraining can be scheduled during off-peak times.
163
+ - **Challenges**: Model updates less frequently compared to online learning, which may delay the incorporation of new patterns.
164
+
165
+ #### Partial Fit
166
+
167
+ Partial fit combines aspects of both online and offline learning.
168
+
169
+ - **Implementation**: Use models that support the `partial_fit()` method. Collect feedback data over a period and then update the model in smaller batches.
170
+ - **Benefits**: Provides a balance between frequent updates and stability.
171
+ - **Challenges**: Requires careful management of the batch size and frequency of updates.
172
+
173
+ ### Example Workflow for Offline Learning with Periodic Retraining
174
+
175
+ 1. **Collect Feedback**: Save feedback data into a CSV file.
176
+ 2. **Scheduled Retraining**: Set up a cron job or similar scheduling tool to retrain the model every 10 days.
177
+ 3. **Model Update**: Load the feedback data, preprocess it, and retrain the model.
178
+ 4. **Save Model**: Save the retrained model to a file and replace the old model.
179
+
180
+ #### Cron Job Example (Linux)
181
+
182
+ ```sh
183
+ # Open the crontab editor
184
+ crontab -e
185
+
186
+ # Add the following line to schedule retraining every 10 days
187
+ 0 0 */10 * * /usr/bin/python3 /path/to/your/retrain_script.py
188
+ ```
189
+
190
+ ### Retraining Script Example
191
+
192
+ ```python
193
+ import pandas as pd
194
+ import joblib
195
+ from sklearn.feature_extraction.text import TfidfVectorizer
196
+ from sklearn.ensemble import ExtraTreesClassifier
197
+
198
+ # Load feedback data
199
+ df = pd.read_csv('feedback.csv')
200
+
201
+ # Preprocess the messages
202
+ # Include your preprocessing function here
203
+
204
+ # Vectorize the messages
205
+ vectorizer = TfidfVectorizer()
206
+ X = vectorizer.fit_transform(df['message'])
207
+ y = df['label']
208
+
209
+ # Retrain the model
210
+ model = ExtraTreesClassifier()
211
+ model.fit(X, y)
212
+
213
+ # Save the retrained model
214
+ joblib.dump(model, 'Extra_Tree.pkl')
215
+ ```
216
+
217
+
218
+ ### License
219
+ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
about.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## SMS Spam Detection Web Application
2
+
3
+ This SMS Spam Detection Web Application leverages Machine Learning models served as an API to identify potentially spam SMS messages. The app empowers users to assess message legitimacy based on their content, providing an efficient way to filter out unwanted spam.
4
+
5
+ ### Features:
6
+
7
+ 1. **FastAPI Backend**: The backend of the application is implemented using FastAPI, a modern web framework for building APIs with Python. It exposes an endpoint `/predict` that accepts POST requests with SMS message data and returns predictions. Another endpoint `/feedback` allows users to provide feedback on the predictions.
8
+
9
+ 2. **Streamlit Frontend**: The frontend of the application is implemented using Streamlit, a Python library that allows for the creation of customizable UI components for machine learning models. Users interact with the application through a user-friendly interface where they can input SMS messages and receive predictions.
10
+
11
+ 3. **Models**: The application utilizes ExtraTreeClassifier and Bernoulli Naive Bayes models, leveraging powerful machine learning algorithms for spam detection.
12
+
13
+ 4. **Feedback Mechanism**: Users can provide feedback on the predictions, indicating whether a message was correctly classified as spam or not. This feedback is stored and used to improve the model over time.
14
+
15
+ ### Usage:
16
+
17
+ - Users can run the application locally by executing the provided Python script.
18
+ - They can interact with the application through the Streamlit interface in their web browser, inputting SMS messages and receiving predictions.
19
+ - The application provides predictions in real-time, leveraging machine learning models trained on historical SMS data.
20
+
21
+ ### Deployment:
22
+
23
+ - The application can be deployed locally or on a cloud platform using Docker. Docker containers encapsulate both the FastAPI backend and the Streamlit frontend, making deployment straightforward.
24
+ - Additionally, the application can be deployed to a serverless platform like Vercel or Heroku, leveraging their respective deployment methods.
25
+
26
+ ### Future Improvements:
27
+
28
+ 1. Enhance model performance by fine-tuning hyperparameters or using more sophisticated models.
29
+ 2. Add more features to improve prediction accuracy.
30
+ 3. Implement user authentication and authorization for secure access to the application.
31
+ 4. Integrate with a database to store feedback examples for analysis and model improvement.
32
+
33
+ ### Development:
34
+
35
+ - Developers can extend and enhance the application by adding new features, improving model accuracy, or optimizing performance.
36
+ - The codebase is modular and well-structured, facilitating easy maintenance and collaboration among developers.
37
+
38
+ Overall, this SMS Spam Detection application provides a practical solution for identifying potentially spam messages, helping users keep their inboxes clean and efficient.
39
+
40
+ ## License
41
+
42
+ This project is licensed under the MIT License. See the [LICENSE](https://github.com/Sibikrish3000/sms-spam-detection/blob/main/LICENSE) file for details.
43
+
44
+ The Jupyter notebook, trained model, and accompanying documentation, including Dockerfiles, FastAPI script, and Streamlit Interface script, can be accessed through the GitHub repository linked below:
45
+
46
+ <p>
47
+ <a href="https://github.com/Sibikrish3000/sms-spam-detection"><img src=https://img.shields.io/badge/Github%20Repository-white.svg?style=flat&logo=github&logoColor=black alt="Github repo"></a>
48
+ </p>
49
+
50
+ ![size](https://img.shields.io/github/repo-size/Sibikrish3000/sms-spam-detection)
51
+
52
+ Please feel free to explore and utilize these resources for SMS spam detection purposes.
53
+
54
+ ### [@Sibi krishnamoorthy](https://sibikrish3000.github.io/portfolio/)
55
+ ___
56
+
57
+ <h5 align="center">
58
+ Sibi krishnamoorthy
59
+ </h5><p align="center">
60
+ A Data Science enthusiast with a passion for Machine Learning and Artificial Intelligence
61
+ </p><p style="color:teal" align="center">
62
+ &copy Sibikrish. All rights reserved 2024
63
+ </p>
64
+
app.py ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, Query
2
+ from pydantic import BaseModel
3
+ from fastapi.responses import HTMLResponse
4
+ from fastapi.staticfiles import StaticFiles
5
+ import pandas as pd
6
+ import joblib
7
+ import re
8
+ import os
9
+ import uvicorn
10
+
11
+ app = FastAPI(title="Credit Card Fraud Detection API",
12
+ description="""An API that utilises a Machine Learning model that detects a Spam messages""",
13
+ version="1.0.0", debug=True)
14
+ app.mount("/static", StaticFiles(directory="static"), name="static")
15
+
16
+ @app.get('/',response_class=HTMLResponse)
17
+ def running():
18
+ text='''
19
+ <html>
20
+ <head>
21
+ <link rel="icon" type="image/x-icon" href="static/images/api.png">
22
+ <title>SMS Spam Detection API</title>
23
+ </head>
24
+ <body>
25
+ <div>
26
+ <h1>SMS Spam Detection API</h1>
27
+ <a href="https://github.com/Sibikrish3000/">Github repository</a>
28
+ </div>
29
+ </body>
30
+ </html>
31
+ '''
32
+ return text
33
+
34
+ class Message(BaseModel):
35
+ message: str
36
+
37
+ class Feedback(BaseModel):
38
+ message: str
39
+ is_spam: bool
40
+
41
+ # Load pre-trained models
42
+ EXTRA_TREE_MODEL = joblib.load('models/Extra_Tree.pkl')
43
+ BERNOULLINB_MODEL = joblib.load('models/BernoulliNB.pkl')
44
+ FEEDBACK_CSV = 'feedback.csv'
45
+
46
+
47
+ def preprocess_message(message):
48
+ message = re.sub(r'\W', ' ', message)
49
+ tokens = word_tokenize(message.lower())
50
+ stemmed_words = [stemmer.stem(token) for token in tokens if token not in stop_words]
51
+ return " ".join(stemmed_words)
52
+
53
+ @app.post('/predict')
54
+ async def predict(message: Message, model: str = Query(...)):
55
+
56
+ if model == 'ExtraTree':
57
+ prediction = EXTRA_TREE_MODEL.predict([message.message])[0]
58
+ elif model == 'NaiveBayes':
59
+ prediction = BERNOULLINB_MODEL.predict([message.message])[0]
60
+ else:
61
+ return {"error": "Invalid model selection"}
62
+
63
+ return {"prediction": int(prediction)}
64
+
65
+ feedback_data = []
66
+
67
+ @app.post('/feedback')
68
+ async def feedback(feedback:Feedback):
69
+ processed_message = feedback.message
70
+ label = 1 if feedback.is_spam else 0
71
+ feedback_data.append((processed_message, label))
72
+ df = pd.DataFrame(feedback_data, columns=['message', 'label'])
73
+ if not os.path.exists(FEEDBACK_CSV):
74
+ df.to_csv(FEEDBACK_CSV, index=False)
75
+ else:
76
+ df.to_csv(FEEDBACK_CSV, mode='a', header=False, index=False)
77
+ feedback_data.clear()
78
+ return {'message': 'Feedback Received'}
79
+
80
+ #if __name__ == '__main__':
81
+ #uvicorn.run(app, host='127.0.0.1', port=8000)
docker_app.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import requests
3
+ from nltk.tokenize import word_tokenize
4
+ from nltk.stem import PorterStemmer
5
+ from nltk.corpus import stopwords
6
+ import re
7
+ from PIL import Image
8
+ #import nltk
9
+
10
+ icon =Image.open("static/images/icon.png")
11
+ about = open("about.md")
12
+ st.set_page_config(
13
+ page_title="SMS SPAM DETECTION",
14
+ page_icon=icon,
15
+ layout='wide'
16
+ )
17
+ st.markdown(
18
+ f"""
19
+ <style>
20
+ .stApp {{
21
+ background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
22
+ background-attachment: fixed;
23
+ background-repeat: no-repeat;
24
+ background-size: cover;
25
+ }}
26
+ .st-emotion-cache-1avcm0n{{
27
+ background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
28
+ background-size: cover;
29
+ background-attachment: fixed;
30
+ background-repeat: no-repeat;
31
+ }}
32
+ </style>
33
+ """,
34
+ unsafe_allow_html=True
35
+ )
36
+
37
+
38
+
39
+ stemmer = PorterStemmer()
40
+ # Attempt to load stopwords with error handling
41
+ try:
42
+ stop_words = set(stopwords.words('english'))
43
+ except Exception as e:
44
+ print(f"An error occurred while loading NLTK stopwords: {e}")
45
+ stop_words = set()
46
+
47
+ def preprocess_message(message):
48
+ message = re.sub(r'\W', ' ', message)
49
+ tokens = word_tokenize(message.lower())
50
+ stemmed_words = [stemmer.stem(token) for token in tokens if token not in stop_words]
51
+ return " ".join(stemmed_words)
52
+
53
+ # Main Streamlit app
54
+ def main():
55
+
56
+ st.title('SMS Spam Detection Webapp')
57
+ st.image('static/images/spam.png', width=720)
58
+
59
+ st.subheader('SMS Spam Detection Webapp Using FastAPI')
60
+
61
+
62
+ message = st.text_area('Enter your SMS message here:')
63
+ model = st.selectbox('Select Model:', ("ExtraTree", "NaiveBayes"))
64
+
65
+ processed_message = preprocess_message(message)
66
+ payload = {"message": processed_message}
67
+
68
+ if st.button('Predict'):
69
+ if message:
70
+ response = requests.post(f'http://127.0.0.1:8000/predict?model={model}', json=payload)
71
+ if response.status_code == 200:
72
+ prediction = response.json().get("prediction", "Error")
73
+ if prediction == 1:
74
+ st.error("The message is classified as **spam**.")
75
+ else:
76
+ st.success("The message is classified as **not spam**.")
77
+ else:
78
+ st.error("Error in prediction. Please try again.")
79
+ else:
80
+ st.error("Please enter a message")
81
+
82
+ st.write("Feedback")
83
+ is_spam = st.checkbox("Is it Spam", value=False)
84
+ if st.button("Submit Feedback"):
85
+ if message:
86
+ feedback_payload = {
87
+ "message": processed_message,
88
+ "is_spam": is_spam
89
+ }
90
+ feedback_response = requests.post("http://127.0.0.1:8000/feedback", json=feedback_payload)
91
+
92
+ if feedback_response.status_code == 200:
93
+ st.success("Thank you for your feedback!")
94
+ else:
95
+ st.error("Error in submitting feedback. Please try again.")
96
+ else:
97
+ st.error("Please enter a feedback message.")
98
+
99
+ with st.expander("About"):
100
+ st.title("SMS Spam Detection Webapp")
101
+ st.markdown(about.read(),unsafe_allow_html=True)
102
+ st.warning("Please press buttons after enter the messages")
103
+
104
+ st.markdown('---')
105
+ st.markdown('@Sibi krishnamoorthy')
106
+ if __name__ == "__main__":
107
+ main()
108
+
109
+
110
+
111
+
feedback.csv ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ message,label
2
+ "URGENT! Your mobile number has won £2000 cash prize. To claim, call 09066364589.",1
3
+ get new credit card low interest rate click appli,1
4
+ free entri 1000 cash draw text win 12345 hurri c appli,1
5
+ free entri 1000 cash draw text win 12345 hurri c appli,0
6
+ congratul 1 000 walmart gift card go http bit ly 123456 claim,1
7
+ helooo,1
8
+ helooo,0
9
+ hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhrtryyyyyyi,1
10
+ hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhrtryyyyyyi,1
11
+ heloo,0
12
+ heyyy,0
13
+ hi,0
14
+ hei,0
15
+ hi,0
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ pillow
2
+ fastapi
3
+ uvicorn
4
+ streamlit
5
+ requests
6
+ psutil
7
+ numpy
8
+ joblib
9
+ scikit-learn
10
+ pandas
11
+ nltk
12
+ pydantic
streamlit_app.py ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import time
2
+ import requests
3
+ import threading
4
+ import uvicorn
5
+ from app import app
6
+ from nltk.tokenize import word_tokenize
7
+ from nltk.stem import PorterStemmer
8
+ from nltk.corpus import stopwords
9
+ import re
10
+ from PIL import Image
11
+ #import nltk
12
+ import streamlit as st
13
+
14
+ icon =Image.open("static/images/icon.png")
15
+ about = open("about.md")
16
+ st.set_page_config(
17
+ page_title="SMS SPAM DETECTION",
18
+ page_icon=icon,
19
+ layout='wide'
20
+ )
21
+ st.markdown(
22
+ f"""
23
+ <style>
24
+ .stApp {{
25
+ background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
26
+ background-attachment: fixed;
27
+ background-repeat: no-repeat;
28
+ background-size: cover;
29
+ }}
30
+ .st-emotion-cache-1avcm0n{{
31
+ background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
32
+ background-size: cover;
33
+ background-attachment: fixed;
34
+ background-repeat: no-repeat;
35
+ }}
36
+ </style>
37
+ """,
38
+ unsafe_allow_html=True
39
+ )
40
+
41
+
42
+ def close_port(port):
43
+ for conn in psutil.net_connections(kind='inet'):
44
+ if conn.laddr.port == port:
45
+ print(f"Closing port {port} by terminating PID {conn.pid}")
46
+ process = psutil.Process(conn.pid)
47
+ process.terminate()
48
+ def run_fastapi():
49
+ try:
50
+ uvicorn.run(app, host="0.0.0.0", port=8000)
51
+ except Exception as e:
52
+ print(f'Error running fastapi:{e}')
53
+ close_port(8000)
54
+
55
+ fastapi_thread = threading.Thread(target=run_fastapi)
56
+ fastapi_thread.daemon = True
57
+ fastapi_thread.start()
58
+ time.sleep(2)
59
+ # Check if NLTK data is downloaded, download if not
60
+
61
+
62
+
63
+ stemmer = PorterStemmer()
64
+ # Attempt to load stopwords with error handling
65
+ try:
66
+ stop_words = set(stopwords.words('english'))
67
+ except Exception as e:
68
+ print(f"An error occurred while loading NLTK stopwords: {e}")
69
+ stop_words = set()
70
+
71
+ def preprocess_message(message):
72
+ message = re.sub(r'\W', ' ', message)
73
+ tokens = word_tokenize(message.lower())
74
+ stemmed_words = [stemmer.stem(token) for token in tokens if token not in stop_words]
75
+ return " ".join(stemmed_words)
76
+
77
+ # Main Streamlit app
78
+ def main():
79
+ #st-emotion-cache-1avcm0n
80
+ st.title('SMS Spam Detection Webapp')
81
+ st.image('static/images/spam.png',width=720)
82
+ st.subheader('SMS Spam Detection Webapp Using FastAPI')
83
+
84
+
85
+ message = st.text_area('Enter your SMS message here:')
86
+ model = st.selectbox('Select Model:', ("ExtraTree", "NaiveBayes"))
87
+
88
+ processed_message = preprocess_message(message)
89
+ payload = {"message": processed_message}
90
+
91
+ if st.button('Predict'):
92
+
93
+ if message:
94
+ response = requests.post(f'http://127.0.0.1:8000/predict?model={model}', json=payload)
95
+ if response.status_code == 200:
96
+ prediction = response.json().get("prediction", "Error")
97
+ if prediction == 1:
98
+ st.error("The message is classified as **spam**.")
99
+ else:
100
+ st.success("The message is classified as **not spam**.")
101
+ else:
102
+ st.error("Error in prediction. Please try again.")
103
+ else:
104
+ st.error("Please enter a message")
105
+
106
+ st.write("Feedback")
107
+ is_spam = st.checkbox("Is it Spam", value=False)
108
+ if st.button("Submit Feedback"):
109
+ if message:
110
+ feedback_payload = {
111
+ "message": processed_message,
112
+ "is_spam": is_spam
113
+ }
114
+ feedback_response = requests.post("http://127.0.0.1:8000/feedback", json=feedback_payload)
115
+
116
+ if feedback_response.status_code == 200:
117
+ st.success("Thank you for your feedback!")
118
+ else:
119
+ st.error("Error in submitting feedback. Please try again.")
120
+ else:
121
+ st.error("Please enter a feedback message.")
122
+
123
+ with st.expander("About"):
124
+ st.title("SMS Spam Detection Webapp")
125
+ st.markdown(about.read(),unsafe_allow_html=True)
126
+ st.warning("Please press buttons after enter the messages")
127
+
128
+ st.markdown('---')
129
+ st.markdown('@Sibi krishnamoorthy')
130
+ if __name__ == "__main__":
131
+ main()
132
+ fastapi_thread.join()
133
+
134
+
135
+
136
+