Spaces:
Sleeping
A newer version of the Streamlit SDK is available: 1.58.0
title: Sentiment Sleuth
emoji: π
colorFrom: red
colorTo: red
sdk: streamlit
app_port: 8501
tags:
- streamlit
pinned: false
short_description: ML-Powered Amazon Review Sentiment Analysis
license: mit
sdk_version: 1.55.0
Sentiment Sleuth
Table of Contents
Overview
ΩThis is a project for performing sentiment analysis on Amazon product reviews using classical machine-learning models. The project includes data processing and feature engineering notebooks, multiple trained classifiers saved as joblib artifacts, a TF-IDF vectorizer, and a Streamlit UI to analyze custom review text.
Key components in the repository:
- Interactive app:
app.py(Streamlit) - Saved models:
data/models/*.joblib - Vectorizer and precomputed TF-IDF sparse matrices:
data/vectorizers/ - Processed datasets and samples:
data/processed/anddata/samples/ - Notebooks:
notebooks/(EDA, preprocessing, feature engineering, and model notebooks) - Documentation:
docs/(research notes, project definition, workflow, and report)
The Streamlit app loads saved artifacts via src.utils.helpers and exposes multiple classifiers (Logistic Regression, Naive Bayes, SVM variants, KNN, Decision Trees, Random Forest, SGD, XGBoost and LightGBM) so you can compare predictions and confidence scores side-by-side.
Key Features
- Multiple Models: Compare results from several traditional classifiers (Logistic Regression, Naive Bayes, SVMs, KNN, Decision Trees, Random Forests, SGD, XGBoost, LightGBM).
- Reusable Artifacts: TF-IDF vectorizer and trained models are persisted under
data/vectorizers/anddata/models/for fast local inference. - Notebooks for Reproducibility: Step-by-step Jupyter notebooks for data acquisition, EDA, preprocessing, feature engineering and model training are included under
notebooks/.
Setup
- Prerequisites Before running this project, ensure you have the following installed:
- Clone the Repository
git clone https://github.com/elsayedelmandoh/sentiment-sleuth
cd sentiment-sleuth
- Create Conda Environment
# Create & activate the environment
conda create -n envname python=3.12 -y
conda activate envname
# Install pip and project dependencies
conda install pip -y
pip install -r requirements.txt
- Environment Variables
Create a
.envfile at the project root and add any necessary API keys or configuration variables
HF Hub runtime download (recommended for Spaces)
To avoid committing large model artifacts to the repository, you can host them on the Hugging Face Hub and let the Streamlit app download them at runtime. Set the following environment variable in your .env or in the Space settings:
HF_ASSETS_REPO: The Hugging Face repository id (e.g.username/repo-name) that contains the artifact files.HF_ASSETS_REPO_TYPE(optional): Usedatasetif your assets were uploaded as a dataset rather than a model/repo.
The app will attempt to load local files from data/models/ and data/vectorizers/ first. If a file is missing and HF_ASSETS_REPO is set and huggingface_hub is installed, the app will download the missing file into data/remote_cache/ and then load it from there. This keeps your Git repository small and lets Hugging Face host large binaries.
Example .env entries:
HF_ASSETS_REPO=your-username/sentiment-artifacts
HF_ASSETS_REPO_TYPE=dataset # optional
Advanced runtime configuration
You can control which files the app attempts to load and where downloaded assets are cached using these optional environment variables:
HF_ASSET_FILESβ optional comma-separated list of asset paths (relative to repo). If set, this list overrides the built-in default asset filenames. Example:
HF_ASSET_FILES=data/models/10_random_forest_classifier.joblib,data/vectorizers/tfidf_vectorizer.joblib
ASSET_CACHE_DIRβ optional path where downloaded artifacts are cached. Default:data/remote_cache.
Example .env with overrides:
HF_ASSETS_REPO=your-username/sentiment-artifacts
HF_ASSET_FILES=data/models/10_random_forest_classifier.joblib,data/vectorizers/tfidf_vectorizer.joblib
ASSET_CACHE_DIR=data/remote_cache
Behavior summary:
- The app uses the asset list from
settings(defaults are provided). IfHF_ASSET_FILESis set in the environment it becomes the active list. - When an asset is missing locally and
HF_ASSETS_REPOis set, the app will download it intoASSET_CACHE_DIRand then load from the cache.
To upload artifacts to the Hub, you can use the huggingface_hub CLI or Python API. Example (Python):
from huggingface_hub import Repository, HfApi
api = HfApi()
# create repo and upload files, or use `hf` CLI commands
Note: If you run the app locally without setting HF_ASSETS_REPO, ensure the data/models/ and data/vectorizers/ files exist locally.
Usage
This project uses Streamlit for the interactive UI. Start the app locally with one of the following commands:
# Run via Streamlit
streamlit run app.py
When the app starts, open the local URL printed in your terminal (usually http://localhost:8501) and paste an Amazon review into the text area to see per-model sentiment predictions and confidence scores.
Model artifacts and vectorizers are loaded from data/models/ and data/vectorizers/. If the vectorizer or model files are missing, the app will show an error message pointing to the expected files.
Reproducibility & Notebooks:
The notebooks/ directory contains step-by-step analysis and model training notebooks. Key notebooks:
01_data_acquisition.ipynbβ dataset loading and brief description02_eda.ipynbβ exploratory data analysis03_data_preprocessing.ipynbβ cleaning and preprocessing04_feature_engineering.ipynbβ TF-IDF vectorization and feature prep05_logistic_regression.ipynbthrough13_lightgbm.ipynbβ one notebook per model14_comparsion.ipynbβ model comparison and summary
Use these notebooks to retrain or refine models and regenerate the joblib artifacts saved in data/models/.
Contributing
Contributions are welcome! If you'd like to improve this project, please follow these steps:
- Fork the repository.
- Create a branch for your feature or bug fix (
git checkout -b feature/my-new-feature). - Commit your changes with clear messages (
git commit -m 'add some feature'). - Push to your fork (
git push origin feature/my-new-feature). - Open a pull request.
Please include reproducible steps and, if applicable, updated notebooks or scripts to regenerate models.
Author
Elsayed Elmandoh - NLP Engineer
- Connect on LinkedIn & X Linktree
Mohamed Kamal - AI Engineer
- Connect on LinkedIn
Mahmoud Magdy - Information Security Engineer
- Connect on LinkedIn