File size: 5,014 Bytes
3b3fc39
 
 
 
 
 
 
48d3d8a
3b3fc39
d98380b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9a4e9a
d98380b
b9a4e9a
d98380b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9a4e9a
d98380b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: mit
title: AutoML
sdk: streamlit
emoji: 💻
colorFrom: yellow
colorTo: green
sdk_version: 1.45.1
---
# AutoML & Explainability Web Application

This Streamlit web application empowers users to perform end-to-end machine learning tasks with ease. Upload your data, automatically train and compare various models, understand their predictions through SHAP explainability, and export the best model for your needs.

## 🎯 Core Objectives

*   **Accessibility**: Enable users of all technical backgrounds to leverage machine learning.
*   **Automation**: Streamline the ML pipeline from data ingestion to model evaluation.
*   **Transparency**: Provide clear insights into model behavior using SHAP.
*   **Efficiency**: Quickly identify the best-performing model for a given dataset.

## ✨ Key Features

*   **Flexible Data Upload**: 
    *   Supports `.csv` and `.xlsx` files.
    *   Option to upload a single file (for automatic train/test splitting) or separate training and testing files.
*   **Data Preprocessing**: 
    *   Automatic handling of missing values (imputation).
    *   Encoding of categorical features.
    *   Optional scaling of numeric features.
*   **Target Column & Problem Type Detection**: 
    *   Easy selection of the target variable.
    *   Automatic detection of problem type (Classification/Regression).
    *   Auto-detection of common target column names.
*   **Automated Model Training & Comparison**: 
    *   Trains a suite of models tailored to the problem type:
        *   **Classification**: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, SVM, K-Nearest Neighbors, Gaussian Naive Bayes.
        *   **Regression**: Linear Regression, Ridge Regression, ElasticNet, Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, SVR, K-Nearest Neighbors Regressor.
    *   Displays a leaderboard with key performance metrics (Accuracy, F1, AUC for classification; R2, MSE for regression).
*   **Model Explainability (XAI)**: 
    *   Utilizes SHAP (SHapley Additive exPlanations) for the best model.
    *   Global feature importance plots.
    *   Detailed SHAP summary plots (e.g., beeswarm) and individual prediction explanations (waterfall plots coming soon).
*   **Model Export**: Download the trained best model (including preprocessing steps) as a `.joblib` file for deployment or further use.

## ⚙️ Setup & Installation

1.  **Prerequisites**: Python 3.7+ installed.
2.  **Clone the Repository (Optional)**:
    ```bash
    # git clone <your_repository_url> # If you have it on Git
    # cd AutoML-WebApp
    ```
    Alternatively, ensure `app.py` and `requirements.txt` are in your project directory.
3.  **Create and Activate Virtual Environment (Recommended)**:
    ```bash
    python3 -m venv venv
    source venv/bin/activate  # macOS/Linux
    # venv\Scripts\activate    # Windows
    ```
4.  **Install Dependencies**:
    ```bash
    pip install -r requirements.txt
    ```

## 🚀 Running the Application

1.  Navigate to your project directory in the terminal.
2.  Run the Streamlit app:
    ```bash
    streamlit run app.py
    ```
3.  Open your browser and go to the URL provided (usually `http://localhost:8501`).

## 🔮 Upcoming Features & Enhancements

We are continuously working to improve this AutoML application. Here are some features on our roadmap:

*   **Advanced Preprocessing Options**: 
    *   User control over imputation strategies (mean, median, mode, constant).
    *   More encoding techniques (e.g., One-Hot Encoding, Target Encoding).
    *   Feature selection techniques.
*   **Hyperparameter Tuning**: 
    *   Integration of GridSearchCV or RandomizedSearchCV for optimizing model hyperparameters.
    *   User interface to define search spaces.
*   **Expanded Model Support**: 
    *   LightGBM, XGBoost, CatBoost for both classification and regression.
    *   Basic Time Series forecasting models (e.g., ARIMA, Prophet) if applicable data is provided.
*   **Enhanced Evaluation & Visualization**: 
    *   Interactive Confusion Matrix, ROC/AUC curves, Precision-Recall curves for classification.
    *   Residual plots, Actual vs. Predicted plots for regression.
    *   Cross-validation score details.
*   **Deployment & Integration**: 
    *   Option to generate a simple Flask API endpoint for the exported model.
    *   Dockerization support for easier deployment.
*   **User Experience & Robustness**: 
    *   More detailed error handling and user guidance.
    *   Saving and loading of experiment configurations.
    *   Support for larger datasets (optimizations for memory and speed).
*   **Advanced Explainability**: 
    *   Individual prediction explanations (waterfall plots).
    *   Partial Dependence Plots (PDP) and Individual Conditional Expectation (ICE) plots.
*   **Data Insights**: 
    *   Automated exploratory data analysis (EDA) report generation.

---
_This application is actively developed, with assistance from AI pair programming._