Spaces:
Sleeping
Sleeping
File size: 5,014 Bytes
3b3fc39 48d3d8a 3b3fc39 d98380b b9a4e9a d98380b b9a4e9a d98380b b9a4e9a d98380b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | ---
license: mit
title: AutoML
sdk: streamlit
emoji: 💻
colorFrom: yellow
colorTo: green
sdk_version: 1.45.1
---
# AutoML & Explainability Web Application
This Streamlit web application empowers users to perform end-to-end machine learning tasks with ease. Upload your data, automatically train and compare various models, understand their predictions through SHAP explainability, and export the best model for your needs.
## 🎯 Core Objectives
* **Accessibility**: Enable users of all technical backgrounds to leverage machine learning.
* **Automation**: Streamline the ML pipeline from data ingestion to model evaluation.
* **Transparency**: Provide clear insights into model behavior using SHAP.
* **Efficiency**: Quickly identify the best-performing model for a given dataset.
## ✨ Key Features
* **Flexible Data Upload**:
* Supports `.csv` and `.xlsx` files.
* Option to upload a single file (for automatic train/test splitting) or separate training and testing files.
* **Data Preprocessing**:
* Automatic handling of missing values (imputation).
* Encoding of categorical features.
* Optional scaling of numeric features.
* **Target Column & Problem Type Detection**:
* Easy selection of the target variable.
* Automatic detection of problem type (Classification/Regression).
* Auto-detection of common target column names.
* **Automated Model Training & Comparison**:
* Trains a suite of models tailored to the problem type:
* **Classification**: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, SVM, K-Nearest Neighbors, Gaussian Naive Bayes.
* **Regression**: Linear Regression, Ridge Regression, ElasticNet, Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, SVR, K-Nearest Neighbors Regressor.
* Displays a leaderboard with key performance metrics (Accuracy, F1, AUC for classification; R2, MSE for regression).
* **Model Explainability (XAI)**:
* Utilizes SHAP (SHapley Additive exPlanations) for the best model.
* Global feature importance plots.
* Detailed SHAP summary plots (e.g., beeswarm) and individual prediction explanations (waterfall plots coming soon).
* **Model Export**: Download the trained best model (including preprocessing steps) as a `.joblib` file for deployment or further use.
## ⚙️ Setup & Installation
1. **Prerequisites**: Python 3.7+ installed.
2. **Clone the Repository (Optional)**:
```bash
# git clone <your_repository_url> # If you have it on Git
# cd AutoML-WebApp
```
Alternatively, ensure `app.py` and `requirements.txt` are in your project directory.
3. **Create and Activate Virtual Environment (Recommended)**:
```bash
python3 -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
```
4. **Install Dependencies**:
```bash
pip install -r requirements.txt
```
## 🚀 Running the Application
1. Navigate to your project directory in the terminal.
2. Run the Streamlit app:
```bash
streamlit run app.py
```
3. Open your browser and go to the URL provided (usually `http://localhost:8501`).
## 🔮 Upcoming Features & Enhancements
We are continuously working to improve this AutoML application. Here are some features on our roadmap:
* **Advanced Preprocessing Options**:
* User control over imputation strategies (mean, median, mode, constant).
* More encoding techniques (e.g., One-Hot Encoding, Target Encoding).
* Feature selection techniques.
* **Hyperparameter Tuning**:
* Integration of GridSearchCV or RandomizedSearchCV for optimizing model hyperparameters.
* User interface to define search spaces.
* **Expanded Model Support**:
* LightGBM, XGBoost, CatBoost for both classification and regression.
* Basic Time Series forecasting models (e.g., ARIMA, Prophet) if applicable data is provided.
* **Enhanced Evaluation & Visualization**:
* Interactive Confusion Matrix, ROC/AUC curves, Precision-Recall curves for classification.
* Residual plots, Actual vs. Predicted plots for regression.
* Cross-validation score details.
* **Deployment & Integration**:
* Option to generate a simple Flask API endpoint for the exported model.
* Dockerization support for easier deployment.
* **User Experience & Robustness**:
* More detailed error handling and user guidance.
* Saving and loading of experiment configurations.
* Support for larger datasets (optimizations for memory and speed).
* **Advanced Explainability**:
* Individual prediction explanations (waterfall plots).
* Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) plots.
* **Data Insights**:
* Automated exploratory data analysis (EDA) report generation.
---
_This application is actively developed, with assistance from AI pair programming._ |