Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.55.0
metadata
license: mit
title: AutoML
sdk: streamlit
emoji: ๐ป
colorFrom: yellow
colorTo: green
sdk_version: 1.45.1
AutoML & Explainability Web Application
This Streamlit web application empowers users to perform end-to-end machine learning tasks with ease. Upload your data, automatically train and compare various models, understand their predictions through SHAP explainability, and export the best model for your needs.
๐ฏ Core Objectives
- Accessibility: Enable users of all technical backgrounds to leverage machine learning.
- Automation: Streamline the ML pipeline from data ingestion to model evaluation.
- Transparency: Provide clear insights into model behavior using SHAP.
- Efficiency: Quickly identify the best-performing model for a given dataset.
โจ Key Features
- Flexible Data Upload:
- Supports
.csvand.xlsxfiles. - Option to upload a single file (for automatic train/test splitting) or separate training and testing files.
- Supports
- Data Preprocessing:
- Automatic handling of missing values (imputation).
- Encoding of categorical features.
- Optional scaling of numeric features.
- Target Column & Problem Type Detection:
- Easy selection of the target variable.
- Automatic detection of problem type (Classification/Regression).
- Auto-detection of common target column names.
- Automated Model Training & Comparison:
- Trains a suite of models tailored to the problem type:
- Classification: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, SVM, K-Nearest Neighbors, Gaussian Naive Bayes.
- Regression: Linear Regression, Ridge Regression, ElasticNet, Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, SVR, K-Nearest Neighbors Regressor.
- Displays a leaderboard with key performance metrics (Accuracy, F1, AUC for classification; R2, MSE for regression).
- Trains a suite of models tailored to the problem type:
- Model Explainability (XAI):
- Utilizes SHAP (SHapley Additive exPlanations) for the best model.
- Global feature importance plots.
- Detailed SHAP summary plots (e.g., beeswarm) and individual prediction explanations (waterfall plots coming soon).
- Model Export: Download the trained best model (including preprocessing steps) as a
.joblibfile for deployment or further use.
โ๏ธ Setup & Installation
- Prerequisites: Python 3.7+ installed.
- Clone the Repository (Optional):
Alternatively, ensure# git clone <your_repository_url> # If you have it on Git # cd AutoML-WebAppapp.pyandrequirements.txtare in your project directory. - Create and Activate Virtual Environment (Recommended):
python3 -m venv venv source venv/bin/activate # macOS/Linux # venv\Scripts\activate # Windows - Install Dependencies:
pip install -r requirements.txt
๐ Running the Application
- Navigate to your project directory in the terminal.
- Run the Streamlit app:
streamlit run app.py - Open your browser and go to the URL provided (usually
http://localhost:8501).
๐ฎ Upcoming Features & Enhancements
We are continuously working to improve this AutoML application. Here are some features on our roadmap:
- Advanced Preprocessing Options:
- User control over imputation strategies (mean, median, mode, constant).
- More encoding techniques (e.g., One-Hot Encoding, Target Encoding).
- Feature selection techniques.
- Hyperparameter Tuning:
- Integration of GridSearchCV or RandomizedSearchCV for optimizing model hyperparameters.
- User interface to define search spaces.
- Expanded Model Support:
- LightGBM, XGBoost, CatBoost for both classification and regression.
- Basic Time Series forecasting models (e.g., ARIMA, Prophet) if applicable data is provided.
- Enhanced Evaluation & Visualization:
- Interactive Confusion Matrix, ROC/AUC curves, Precision-Recall curves for classification.
- Residual plots, Actual vs. Predicted plots for regression.
- Cross-validation score details.
- Deployment & Integration:
- Option to generate a simple Flask API endpoint for the exported model.
- Dockerization support for easier deployment.
- User Experience & Robustness:
- More detailed error handling and user guidance.
- Saving and loading of experiment configurations.
- Support for larger datasets (optimizations for memory and speed).
- Advanced Explainability:
- Individual prediction explanations (waterfall plots).
- Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) plots.
- Data Insights:
- Automated exploratory data analysis (EDA) report generation.
This application is actively developed, with assistance from AI pair programming.