Spaces:

sibikrish
/

sms-spam-detection

Sleeping

App Files Files Community

sibikrish commited on Jun 29, 2024

Commit

56c33f2

verified ·

1 Parent(s): cdaa963

Upload 7 files

Browse files

Files changed (7) hide show

README.md +219 -13
about.md +64 -0
app.py +81 -0
docker_app.py +111 -0
feedback.csv +15 -0
requirements.txt +12 -0
streamlit_app.py +136 -0

README.md CHANGED Viewed

@@ -1,13 +1,219 @@
----
-title: Sms Spam Detection
-emoji: 👀
-colorFrom: purple
-colorTo: yellow
-sdk: streamlit
-sdk_version: 1.36.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<p align="center">
+<a href = "https://github.com/Sibikrish3000/sms-spam-detection" > <img src = "https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/spam.png?raw=true" alt = "sms spam image"  width=500 height=280> </a>
+</p>
+<h1 align="center"> SMS Spam Detection Web Application </h1>
+<p align="center">
+This application leverages machine learning to detect spam messages
+</p>
+<p align="center">
+<a href="https://github.com/Sibikrish3000/sms-spam-detection/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Sibikrish3000/sms-spam-detection" alt="GitHub license"></a>
+<a href="https://github.com/Sibikrish3000/sms-spam-detection/stargazers"><img src="https://img.shields.io/github/stars/Sibikrish3000/sms-spam-detection?style=social" alt="GitHub stars"></a>
+<a href="https://github.com/Sibikrish3000/sms-spam-detection/issues"><img src="https://img.shields.io/github/issues/Sibikrish3000/sms-spam-detection" alt="GitHub issues">
+</p>
+<p align="center">
+<a href="https://scikit-learn.org/"><img src=https://img.shields.io/badge/sklearn-darkorange.svg?style=flat&logo=scikit-learn&logoColor=white alt="sklearn"></a>
+<a href="https://www.python.org"><img src="https://img.shields.io/badge/Python-yellow.svg?style=flat&logo=python&logoColor=white" alt="language"></a>
+<a href="https://fastapi.tiangolo.com/" ><img src="https://img.shields.io/badge/FastAPI-darkgreen.svg?style=flat&logo=fastapi&logoColor=white " alt="fastapi"></a> <a href="https://hub.docker.com/repository/docker/sibikrish3000/sms-spam-detection/"><img src="https://img.shields.io/badge/Docker-blue?style=flat&logo=docker&logoColor=white" alt= "docker"></a>
+<a href="https://www.streamlit.io"><img src="https://img.shields.io/badge/Streamlit-e63946?style=flat&logo=streamlit&logoColor=white" alt="streamlit"></a>
+</p>
+This repository contains a web application for detecting spam SMS messages. The application uses machine learning models (Extra Trees and Bernoulli Naive Bayes) to classify messages as spam or not spam. The app also allows users to provide feedback on the classification results, which can be used to retrain the models periodically.
+[Dataset](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset)
+## Try on Streamlit
+<p>
+<a href="https://www.streamlit.io"><img src="https://img.shields.io/badge/Streamlit-e63946?style=flat&logo=streamlit&logoColor=linear-gradient(360deg, #f093fb 0%, #f5576c 100%)" alt="streamlit" width="160" height="50" ></a>
+</p>
+## Try on Huggingface Space
+<p>
+<a href="https://huggingface.co/spaces/sibikrish/sms-spam-detection?theme=dark"><img src="https://img.shields.io/badge/Huggingface-white?style=flat&logo=huggingface&logoSize=amd" alt="huggingface" width="160" height="50" ></a>
+</p>
+### Features
+- **Prediction**: Classify SMS messages as spam or not spam using Extra Trees or Bernoulli Naive Bayes models.
+- **Feedback**: Users can provide feedback on the predictions to improve model performance.
+- **Continuous Training**: The application supports periodic retraining of models using the feedback data.
+## Project Structure
+```
+/sms-spam-detection
+│
+├──/model
+│   ├── BernoulliNB.pkl
+│   └── Extra_Tree.pkl
+│
+├──/static
+│   └──/images
+│
+├── app.py
+├── streamlit_app.py
+├── docker_app.py
+├── Dockerfile
+├── Dockerfile.fastapi
+├── docker-compose.yml
+├── requirements.txt
+````
+- `app.py`: Defines the FastAPI application.
+- `streamlit_app.py`: Defines the streamlit webapp.
+- `docker_app.py`: streamlit webapp for docker
+- `Dockerfile`: Dockerfile for building the Docker image.
+- `docker-compose.yml`: Docker Compose file for orchestrating the services.
+- `requirements.txt`: List of dependencies.
+-  `model/`: Directory containing pre-trained machine learning models.
+- `static/`: Directory containing static files such as images used in the interface.
+### Installation
+1. **Clone the repository**:
+    ```sh
+    git clone https://github.com/Sibikrish3000/sms-spam-detection.git
+    cd sms-spam-detection
+    ```
+2. **Install the required packages**:
+    ```sh
+    pip install -r requirements.txt
+    ```
+3. **Download NLTK data**:
+    ```
+    python -m nltk.downloader punkt
+    python -m nltk.downloader stopwords
+    ```
+## Run Locally
+1. **Start the FastAPI Server**:
+    ```sh
+    uvicorn app:app --host 0.0.0.0 --port 8000 --reload
+    ```
+2. **Run the Streamlit Application**:
+    ```sh
+    streamlit run streamlit_app.py
+    ```
+### Using Docker Compose
+1. Build and start the containers:
+   ```sh
+   docker network create AIservice
+   ```
+    ```sh
+    docker-compose up --build
+    ```
+2. Access the streamlit webapp at [http://localhost:8501](http://localhost:8080).
+### Using Docker image
+```sh
+docker network create AIservice
+```
+```sh
+docker pull sibikrish/sms-spam-detection:latest
+docker run sibikrish/sms-spam-detection:latest #or
+docker run -d -p 8501:8501 sibikrish/sms-spam-detection:latest
+ ```
+## Development
+### Running in a Gitpod Cloud Environment
+**Click the button below to start a new development environment:**
+[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/Sibikrish3000/sms-spam-detection)
+### Usage
+- **Enter SMS Message**: Input the SMS message you want to classify.
+- **Select Model**: Choose between Extra Trees and Bernoulli Naive Bayes models.
+- **Predict**: Click the "Predict" button to see the classification result.
+- **Feedback**: Provide feedback on the prediction by marking the message as spam or not spam and submit.
+### Continuous Training (CT) in MLOps
+Continuous Training (CT) ensures that the machine learning models stay up-to-date with new data and feedback. Here are some suggestions for implementing CT for this application:
+#### Online Learning
+Online learning is suitable for scenarios where data arrives continuously, and the model needs to update frequently.
+- **Implementation**: Implement online learning techniques where models are updated incrementally as new labeled data arrives.
+Use techniques like stochastic gradient descent or mini-batch learning to update models in real-time based on user feedback. Use the `partial_fit()` method available in some scikit-learn models
+ (e.g., SGDClassifier,BernoulliNB) to update the model incrementally.
+- **Benefits**: The model updates with each new feedback, allowing it to adapt quickly to new patterns.
+- **Challenges**: May require more careful tuning and monitoring to ensure model stability.
+#### Offline Learning
+Offline learning involves retraining the model periodically with the accumulated feedback data.
+- **Implementation**: Retrain the model every fixed interval (e.g., daily, weekly) using the feedback data stored in the CSV file.
+- **Benefits**: Simpler to implement and manage, as retraining can be scheduled during off-peak times.
+- **Challenges**: Model updates less frequently compared to online learning, which may delay the incorporation of new patterns.
+#### Partial Fit
+Partial fit combines aspects of both online and offline learning.
+- **Implementation**: Use models that support the `partial_fit()` method. Collect feedback data over a period and then update the model in smaller batches.
+- **Benefits**: Provides a balance between frequent updates and stability.
+- **Challenges**: Requires careful management of the batch size and frequency of updates.
+### Example Workflow for Offline Learning with Periodic Retraining
+1. **Collect Feedback**: Save feedback data into a CSV file.
+2. **Scheduled Retraining**: Set up a cron job or similar scheduling tool to retrain the model every 10 days.
+3. **Model Update**: Load the feedback data, preprocess it, and retrain the model.
+4. **Save Model**: Save the retrained model to a file and replace the old model.
+#### Cron Job Example (Linux)
+```sh
+# Open the crontab editor
+crontab -e
+# Add the following line to schedule retraining every 10 days
+0 0 */10 * * /usr/bin/python3 /path/to/your/retrain_script.py
+```
+### Retraining Script Example
+```python
+import pandas as pd
+import joblib
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.ensemble import ExtraTreesClassifier
+# Load feedback data
+df = pd.read_csv('feedback.csv')
+# Preprocess the messages
+# Include your preprocessing function here
+# Vectorize the messages
+vectorizer = TfidfVectorizer()
+X = vectorizer.fit_transform(df['message'])
+y = df['label']
+# Retrain the model
+model = ExtraTreesClassifier()
+model.fit(X, y)
+# Save the retrained model
+joblib.dump(model, 'Extra_Tree.pkl')
+```
+### License
+This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

about.md ADDED Viewed

	@@ -0,0 +1,64 @@

+## SMS Spam Detection Web Application
+This SMS Spam Detection Web Application leverages Machine Learning models served as an API to identify potentially spam SMS messages. The app empowers users to assess message legitimacy based on their content, providing an efficient way to filter out unwanted spam.
+### Features:
+1. **FastAPI Backend**: The backend of the application is implemented using FastAPI, a modern web framework for building APIs with Python. It exposes an endpoint `/predict` that accepts POST requests with SMS message data and returns predictions. Another endpoint `/feedback` allows users to provide feedback on the predictions.
+2. **Streamlit Frontend**: The frontend of the application is implemented using Streamlit, a Python library that allows for the creation of customizable UI components for machine learning models. Users interact with the application through a user-friendly interface where they can input SMS messages and receive predictions.
+3. **Models**: The application utilizes ExtraTreeClassifier and Bernoulli Naive Bayes models, leveraging powerful machine learning algorithms for spam detection.
+4. **Feedback Mechanism**: Users can provide feedback on the predictions, indicating whether a message was correctly classified as spam or not. This feedback is stored and used to improve the model over time.
+### Usage:
+- Users can run the application locally by executing the provided Python script.
+- They can interact with the application through the Streamlit interface in their web browser, inputting SMS messages and receiving predictions.
+- The application provides predictions in real-time, leveraging machine learning models trained on historical SMS data.
+### Deployment:
+- The application can be deployed locally or on a cloud platform using Docker. Docker containers encapsulate both the FastAPI backend and the Streamlit frontend, making deployment straightforward.
+- Additionally, the application can be deployed to a serverless platform like Vercel or Heroku, leveraging their respective deployment methods.
+### Future Improvements:
+1. Enhance model performance by fine-tuning hyperparameters or using more sophisticated models.
+2. Add more features to improve prediction accuracy.
+3. Implement user authentication and authorization for secure access to the application.
+4. Integrate with a database to store feedback examples for analysis and model improvement.
+### Development:
+- Developers can extend and enhance the application by adding new features, improving model accuracy, or optimizing performance.
+- The codebase is modular and well-structured, facilitating easy maintenance and collaboration among developers.
+Overall, this SMS Spam Detection application provides a practical solution for identifying potentially spam messages, helping users keep their inboxes clean and efficient.
+## License
+This project is licensed under the MIT License. See the [LICENSE](https://github.com/Sibikrish3000/sms-spam-detection/blob/main/LICENSE)  file for details.
+The Jupyter notebook, trained model, and accompanying documentation, including Dockerfiles, FastAPI script, and Streamlit Interface script, can be accessed through the GitHub repository linked below:
+<p>
+<a href="https://github.com/Sibikrish3000/sms-spam-detection"><img src=https://img.shields.io/badge/Github%20Repository-white.svg?style=flat&logo=github&logoColor=black alt="Github repo"></a>
+</p>
+![size](https://img.shields.io/github/repo-size/Sibikrish3000/sms-spam-detection)
+Please feel free to explore and utilize these resources for SMS spam detection purposes.
+### [@Sibi krishnamoorthy](https://sibikrish3000.github.io/portfolio/)
+___
+<h5 align="center">
+Sibi krishnamoorthy
+</h5><p align="center">
+A Data Science enthusiast with a passion for Machine Learning and Artificial Intelligence
+</p><p style="color:teal" align="center">
+&copy Sibikrish. All rights reserved 2024
+</p>

app.py ADDED Viewed

	@@ -0,0 +1,81 @@

+from fastapi import FastAPI, Query
+from pydantic import BaseModel
+from fastapi.responses import HTMLResponse
+from fastapi.staticfiles import StaticFiles
+import pandas as pd
+import joblib
+import re
+import os
+import uvicorn
+app = FastAPI(title="Credit Card Fraud Detection API",
+    description="""An API that utilises a Machine Learning model that detects a Spam messages""",
+    version="1.0.0", debug=True)
+app.mount("/static", StaticFiles(directory="static"), name="static")
+@app.get('/',response_class=HTMLResponse)
+def running():
+    text='''
+    <html>
+    <head>
+    <link rel="icon" type="image/x-icon" href="static/images/api.png">
+    <title>SMS Spam Detection API</title>
+    </head>
+    <body>
+    <div>
+    <h1>SMS Spam Detection API</h1>
+        <a href="https://github.com/Sibikrish3000/">Github repository</a>
+    </div>
+    </body>
+    </html>
+    '''
+    return text
+class Message(BaseModel):
+    message: str
+class Feedback(BaseModel):
+    message: str
+    is_spam: bool
+# Load pre-trained models
+EXTRA_TREE_MODEL = joblib.load('models/Extra_Tree.pkl')
+BERNOULLINB_MODEL = joblib.load('models/BernoulliNB.pkl')
+FEEDBACK_CSV = 'feedback.csv'
+def preprocess_message(message):
+    message = re.sub(r'\W', ' ', message)
+    tokens = word_tokenize(message.lower())
+    stemmed_words = [stemmer.stem(token) for token in tokens if token not in stop_words]
+    return " ".join(stemmed_words)
+@app.post('/predict')
+async def predict(message: Message, model: str = Query(...)):
+    if model == 'ExtraTree':
+        prediction = EXTRA_TREE_MODEL.predict([message.message])[0]
+    elif model == 'NaiveBayes':
+        prediction = BERNOULLINB_MODEL.predict([message.message])[0]
+    else:
+        return {"error": "Invalid model selection"}
+    return {"prediction": int(prediction)}
+feedback_data = []
+@app.post('/feedback')
+async def feedback(feedback:Feedback):
+    processed_message = feedback.message
+    label = 1 if feedback.is_spam else 0
+    feedback_data.append((processed_message, label))
+    df = pd.DataFrame(feedback_data, columns=['message', 'label'])
+    if not os.path.exists(FEEDBACK_CSV):
+        df.to_csv(FEEDBACK_CSV, index=False)
+    else:
+        df.to_csv(FEEDBACK_CSV, mode='a', header=False, index=False)
+    feedback_data.clear()
+    return {'message': 'Feedback Received'}
+#if __name__ == '__main__':
+    #uvicorn.run(app, host='127.0.0.1', port=8000)

docker_app.py ADDED Viewed

	@@ -0,0 +1,111 @@

+import streamlit as st
+import requests
+from nltk.tokenize import word_tokenize
+from nltk.stem import PorterStemmer
+from nltk.corpus import stopwords
+import re
+from PIL import Image
+#import nltk
+icon =Image.open("static/images/icon.png")
+about = open("about.md")
+st.set_page_config(
+        page_title="SMS SPAM DETECTION",
+        page_icon=icon,
+        layout='wide'
+    )
+st.markdown(
+        f"""
+            <style>
+            .stApp {{
+                background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
+                background-attachment: fixed;
+                background-repeat: no-repeat;
+                background-size: cover;
+            }}
+             .st-emotion-cache-1avcm0n{{
+            background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
+            background-size: cover;
+            background-attachment: fixed;
+            background-repeat: no-repeat;
+            }}
+            </style>
+            """,
+        unsafe_allow_html=True
+    )
+stemmer = PorterStemmer()
+# Attempt to load stopwords with error handling
+try:
+    stop_words = set(stopwords.words('english'))
+except Exception as e:
+    print(f"An error occurred while loading NLTK stopwords: {e}")
+    stop_words = set()
+def preprocess_message(message):
+    message = re.sub(r'\W', ' ', message)
+    tokens = word_tokenize(message.lower())
+    stemmed_words = [stemmer.stem(token) for token in tokens if token not in stop_words]
+    return " ".join(stemmed_words)
+# Main Streamlit app
+def main():
+    st.title('SMS Spam Detection Webapp')
+    st.image('static/images/spam.png', width=720)
+    st.subheader('SMS Spam Detection Webapp Using FastAPI')
+    message = st.text_area('Enter your SMS message here:')
+    model = st.selectbox('Select Model:', ("ExtraTree", "NaiveBayes"))
+    processed_message = preprocess_message(message)
+    payload = {"message": processed_message}
+    if st.button('Predict'):
+        if message:
+            response = requests.post(f'http://127.0.0.1:8000/predict?model={model}', json=payload)
+            if response.status_code == 200:
+                prediction = response.json().get("prediction", "Error")
+                if prediction == 1:
+                    st.error("The message is classified as **spam**.")
+                else:
+                    st.success("The message is classified as **not spam**.")
+            else:
+                st.error("Error in prediction. Please try again.")
+        else:
+            st.error("Please enter a message")
+    st.write("Feedback")
+    is_spam = st.checkbox("Is it Spam", value=False)
+    if st.button("Submit Feedback"):
+        if message:
+            feedback_payload = {
+                "message": processed_message,
+                "is_spam": is_spam
+            }
+            feedback_response = requests.post("http://127.0.0.1:8000/feedback", json=feedback_payload)
+            if feedback_response.status_code == 200:
+                st.success("Thank you for your feedback!")
+            else:
+                st.error("Error in submitting feedback. Please try again.")
+        else:
+            st.error("Please enter a feedback message.")
+    with st.expander("About"):
+        st.title("SMS Spam Detection Webapp")
+        st.markdown(about.read(),unsafe_allow_html=True)
+    st.warning("Please press buttons after enter the messages")
+    st.markdown('---')
+    st.markdown('@Sibi krishnamoorthy')
+if __name__ == "__main__":
+    main()

feedback.csv ADDED Viewed

	@@ -0,0 +1,15 @@

+message,label
+"URGENT! Your mobile number has won £2000 cash prize. To claim, call 09066364589.",1
+get new credit card low interest rate click appli,1
+free entri 1000 cash draw text win 12345 hurri c appli,1
+free entri 1000 cash draw text win 12345 hurri c appli,0
+congratul 1 000 walmart gift card go http bit ly 123456 claim,1
+helooo,1
+helooo,0
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhrtryyyyyyi,1
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhrtryyyyyyi,1
+heloo,0
+heyyy,0
+hi,0
+hei,0
+hi,0

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+pillow
+fastapi
+uvicorn
+streamlit
+requests
+psutil
+numpy
+joblib
+scikit-learn
+pandas
+nltk
+pydantic

streamlit_app.py ADDED Viewed

	@@ -0,0 +1,136 @@

+import time
+import requests
+import threading
+import uvicorn
+from app import app
+from nltk.tokenize import word_tokenize
+from nltk.stem import PorterStemmer
+from nltk.corpus import stopwords
+import re
+from PIL import Image
+#import nltk
+import streamlit as st
+icon =Image.open("static/images/icon.png")
+about = open("about.md")
+st.set_page_config(
+        page_title="SMS SPAM DETECTION",
+        page_icon=icon,
+        layout='wide'
+    )
+st.markdown(
+        f"""
+            <style>
+            .stApp {{
+                background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
+                background-attachment: fixed;
+                background-repeat: no-repeat;
+                background-size: cover;
+            }}
+             .st-emotion-cache-1avcm0n{{
+            background-image: url('https://github.com/Sibikrish3000/sms-spam-detection/blob/main/static/images/cover.jpg?raw=true');
+            background-size: cover;
+            background-attachment: fixed;
+            background-repeat: no-repeat;
+            }}
+            </style>
+            """,
+        unsafe_allow_html=True
+    )
+def close_port(port):
+    for conn in psutil.net_connections(kind='inet'):
+        if conn.laddr.port == port:
+            print(f"Closing port {port} by terminating PID {conn.pid}")
+            process = psutil.Process(conn.pid)
+            process.terminate()
+def run_fastapi():
+    try:
+        uvicorn.run(app, host="0.0.0.0", port=8000)
+    except Exception as e:
+        print(f'Error running fastapi:{e}')
+        close_port(8000)
+fastapi_thread = threading.Thread(target=run_fastapi)
+fastapi_thread.daemon = True
+fastapi_thread.start()
+time.sleep(2)
+# Check if NLTK data is downloaded, download if not
+stemmer = PorterStemmer()
+# Attempt to load stopwords with error handling
+try:
+    stop_words = set(stopwords.words('english'))
+except Exception as e:
+    print(f"An error occurred while loading NLTK stopwords: {e}")
+    stop_words = set()
+def preprocess_message(message):
+    message = re.sub(r'\W', ' ', message)
+    tokens = word_tokenize(message.lower())
+    stemmed_words = [stemmer.stem(token) for token in tokens if token not in stop_words]
+    return " ".join(stemmed_words)
+# Main Streamlit app
+def main():
+    #st-emotion-cache-1avcm0n
+    st.title('SMS Spam Detection Webapp')
+    st.image('static/images/spam.png',width=720)
+    st.subheader('SMS Spam Detection Webapp Using FastAPI')
+    message = st.text_area('Enter your SMS message here:')
+    model = st.selectbox('Select Model:', ("ExtraTree", "NaiveBayes"))
+    processed_message = preprocess_message(message)
+    payload = {"message": processed_message}
+    if st.button('Predict'):
+        if message:
+            response = requests.post(f'http://127.0.0.1:8000/predict?model={model}', json=payload)
+            if response.status_code == 200:
+                prediction = response.json().get("prediction", "Error")
+                if prediction == 1:
+                    st.error("The message is classified as **spam**.")
+                else:
+                    st.success("The message is classified as **not spam**.")
+            else:
+                st.error("Error in prediction. Please try again.")
+        else:
+            st.error("Please enter a message")
+    st.write("Feedback")
+    is_spam = st.checkbox("Is it Spam", value=False)
+    if st.button("Submit Feedback"):
+        if message:
+            feedback_payload = {
+                "message": processed_message,
+                "is_spam": is_spam
+            }
+            feedback_response = requests.post("http://127.0.0.1:8000/feedback", json=feedback_payload)
+            if feedback_response.status_code == 200:
+                st.success("Thank you for your feedback!")
+            else:
+                st.error("Error in submitting feedback. Please try again.")
+        else:
+            st.error("Please enter a feedback message.")
+    with st.expander("About"):
+        st.title("SMS Spam Detection Webapp")
+        st.markdown(about.read(),unsafe_allow_html=True)
+    st.warning("Please press buttons after enter the messages")
+    st.markdown('---')
+    st.markdown('@Sibi krishnamoorthy')
+if __name__ == "__main__":
+    main()
+    fastapi_thread.join()