Spaces:
Sleeping
Sleeping
Commit ·
993cfb9
1
Parent(s): 444d31b
Initial Commit
Browse files- README.md +190 -1
- app.py +153 -0
- requirements.txt +16 -0
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title: AutoML
|
| 3 |
emoji: 📈
|
| 4 |
colorFrom: yellow
|
| 5 |
colorTo: pink
|
|
@@ -8,7 +8,196 @@ sdk_version: 5.33.0
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
short_description: Automated ML model comparison with LazyPredict and MCP integ
|
| 12 |
---
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 1 |
---
|
| 2 |
+
title: AutoML - MCP Hackathon
|
| 3 |
emoji: 📈
|
| 4 |
colorFrom: yellow
|
| 5 |
colorTo: pink
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
+
tags:
|
| 12 |
+
- machine-learning
|
| 13 |
+
- mcp
|
| 14 |
+
- hackathon
|
| 15 |
+
- automl
|
| 16 |
+
- lazypredict
|
| 17 |
+
- gradio
|
| 18 |
+
- mcp-server-track
|
| 19 |
+
- agent-demo-track
|
| 20 |
short_description: Automated ML model comparison with LazyPredict and MCP integ
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# 🤖 AutoML - MCP Hackathon Submission
|
| 24 |
+
|
| 25 |
+
**Automated Machine Learning Platform with LazyPredict and Model Context Protocol Integration**
|
| 26 |
+
|
| 27 |
+
## 🏆 Hackathon Track
|
| 28 |
+
**Agents & MCP Hackathon - Track 1: MCP Tool / Server**
|
| 29 |
+
|
| 30 |
+
## 🌟 Key Features
|
| 31 |
+
|
| 32 |
+
### Core ML Capabilities
|
| 33 |
+
- **📤 Dual Data Input**: Support for both local CSV file uploads and public URL data sources
|
| 34 |
+
- **🎯 Auto Problem Detection**: Automatically determines regression vs classification tasks
|
| 35 |
+
- **🤖 Multi-Algorithm Comparison**: LazyPredict-powered comparison of 20+ ML algorithms
|
| 36 |
+
- **📊 Automated EDA**: Comprehensive dataset profiling with ydata-profiling
|
| 37 |
+
- **💾 Best Model Export**: Download top-performing model as pickle file
|
| 38 |
+
- **📈 Performance Visualization**: Interactive charts showing model comparison results
|
| 39 |
+
|
| 40 |
+
### 🚀 Advanced Features
|
| 41 |
+
- **🌐 URL Data Loading**: Direct data loading from public CSV URLs with robust error handling
|
| 42 |
+
- **🔄 Agent-Friendly Interface**: Designed for both human users and AI agent interactions
|
| 43 |
+
- **📊 Interactive Dashboards**: Real-time model performance metrics and visualizations
|
| 44 |
+
- **🔍 Smart Error Handling**: Comprehensive validation and user feedback system
|
| 45 |
+
- **💻 MCP Server Integration**: Full Model Context Protocol server implementation
|
| 46 |
+
|
| 47 |
+
## 🛠️ How It Works
|
| 48 |
+
|
| 49 |
+
The AutoML provides a streamlined pipeline for automated machine learning:
|
| 50 |
+
|
| 51 |
+
### Core Functions
|
| 52 |
+
|
| 53 |
+
1. **`load_data(file_input)`** - Universal data loader that handles:
|
| 54 |
+
- Local CSV file uploads through Gradio's file component
|
| 55 |
+
- Public CSV URLs with HTTP/HTTPS support
|
| 56 |
+
- Robust error handling and validation
|
| 57 |
+
- Automatic format detection and parsing
|
| 58 |
+
|
| 59 |
+
2. **`analyze_and_model(df, target_column)`** - Core ML pipeline that:
|
| 60 |
+
- Generates comprehensive EDA reports using ydata-profiling
|
| 61 |
+
- Automatically detects task type (classification vs regression) based on target variable uniqueness
|
| 62 |
+
- Trains and evaluates multiple models using LazyPredict
|
| 63 |
+
- Selects the best performing model based on appropriate metrics
|
| 64 |
+
- Creates publication-ready visualizations comparing model performance
|
| 65 |
+
- Exports the best model as a serialized pickle file
|
| 66 |
+
|
| 67 |
+
3. **`run_pipeline(data_source, target_column)`** - Main orchestration function:
|
| 68 |
+
- Validates all inputs and provides clear error messages
|
| 69 |
+
- Coordinates the entire ML workflow from data loading to model export
|
| 70 |
+
- Generates AI-powered explanations of results
|
| 71 |
+
- Returns all outputs in a format optimized for both UI and API consumption
|
| 72 |
+
|
| 73 |
+
### Agent-Friendly Design
|
| 74 |
+
- **Single Entry Point**: The `run_pipeline()` function serves as the primary interface for AI agents
|
| 75 |
+
- **Flexible Input Handling**: Automatically determines whether input is a file path or URL
|
| 76 |
+
- **Comprehensive Output**: Returns all generated artifacts (models, reports, visualizations)
|
| 77 |
+
- **Error Resilience**: Robust error handling with informative feedback
|
| 78 |
+
|
| 79 |
+
## 🚀 Quick Start
|
| 80 |
+
|
| 81 |
+
### Running the Application
|
| 82 |
+
|
| 83 |
+
The project includes two main application files:
|
| 84 |
+
|
| 85 |
+
#### Primary Application: `app.py` (Recommended)
|
| 86 |
+
```bash
|
| 87 |
+
# Install dependencies
|
| 88 |
+
pip install -r requirements.txt
|
| 89 |
+
|
| 90 |
+
# Run the main application
|
| 91 |
+
python app.py
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
### Web Interface
|
| 96 |
+
1. **Choose Data Source**:
|
| 97 |
+
- **Local Upload**: Use the file upload component to select a CSV file from your computer
|
| 98 |
+
- **URL Input**: Enter a public CSV URL (e.g., from GitHub, data repositories, or cloud storage)
|
| 99 |
+
2. **Specify Target**: Enter the exact name of your target column (case-sensitive)
|
| 100 |
+
3. **Run Analysis**: Click "Run Analysis & AutoML" to start the AutoML pipeline
|
| 101 |
+
4. **Review Results**:
|
| 102 |
+
- View detected task type (classification/regression)
|
| 103 |
+
- Examine model performance metrics in the interactive table
|
| 104 |
+
- Download comprehensive EDA report (HTML format)
|
| 105 |
+
- Download the best performing model (pickle format)
|
| 106 |
+
- View model comparison visualization
|
| 107 |
+
|
| 108 |
+
### Installation & Setup
|
| 109 |
+
```bash
|
| 110 |
+
# Clone the repository
|
| 111 |
+
git clone [repository-url]
|
| 112 |
+
cd AutoML
|
| 113 |
+
|
| 114 |
+
# Install dependencies
|
| 115 |
+
pip install -r requirements.txt
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
### Server Configuration
|
| 119 |
+
The application launches with the following settings:
|
| 120 |
+
- **Host**: `0.0.0.0` (accessible from any network interface)
|
| 121 |
+
- **Port**: `7860` (default Gradio port)
|
| 122 |
+
- **MCP Server**: Enabled for AI agent integration
|
| 123 |
+
- **API Documentation**: Available at `/docs` endpoint
|
| 124 |
+
- **Browser Launch**: Automatic browser opening enabled
|
| 125 |
+
|
| 126 |
+
## 🎯 Current Implementation
|
| 127 |
+
|
| 128 |
+
### 1. LazyPredict Integration
|
| 129 |
+
- **Automated Model Training**: Trains 20+ algorithms automatically
|
| 130 |
+
- **Performance Comparison**: Side-by-side evaluation of all models
|
| 131 |
+
- **Best Model Selection**: Automatically selects top performer based on accuracy/R² score
|
| 132 |
+
|
| 133 |
+
### 2. Comprehensive EDA
|
| 134 |
+
- **ydata-profiling**: Generates detailed dataset analysis reports
|
| 135 |
+
- **Automatic Insights**: Data quality, distributions, correlations, and missing values
|
| 136 |
+
- **Interactive Reports**: Downloadable HTML reports with comprehensive statistics
|
| 137 |
+
|
| 138 |
+
### 3. Smart Task Detection
|
| 139 |
+
- **Classification**: Automatically detected when target has ≤10 unique values
|
| 140 |
+
- **Regression**: Automatically detected for continuous target variables
|
| 141 |
+
- **Adaptive Metrics**: Uses appropriate evaluation metrics for each task type
|
| 142 |
+
|
| 143 |
+
### 4. Model Persistence
|
| 144 |
+
- **Pickle Export**: Save trained models for future use
|
| 145 |
+
- **Model Reuse**: Load and apply models to new datasets
|
| 146 |
+
- **Production Ready**: Serialized models ready for deployment
|
| 147 |
+
|
| 148 |
+
|
| 149 |
+
## 🏆 Demo Scenarios
|
| 150 |
+
|
| 151 |
+
|
| 152 |
+
### College Placement Analysis
|
| 153 |
+
- Upload `collegePlace.csv` included in the project with url: (https://raw.githubusercontent.com/daniel-was-taken/Placement-Prediction/refs/heads/master/collegePlace.csv)
|
| 154 |
+
- Analyze student placement outcomes
|
| 155 |
+
- Automatic feature analysis and model comparison
|
| 156 |
+
- Export trained model for future predictions
|
| 157 |
+
|
| 158 |
+
### URL-Based Data Analysis
|
| 159 |
+
- Use public dataset URLs for instant analysis
|
| 160 |
+
- Example: Government open data, research datasets, cloud-hosted files
|
| 161 |
+
- No file size limitations with URL-based loading
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
## 🚀 Technologies Used
|
| 165 |
+
|
| 166 |
+
- **Frontend**: Gradio 4.0+ with soft theme and MCP server integration
|
| 167 |
+
- **AutoML Engine**: LazyPredict for automated model comparison and evaluation
|
| 168 |
+
- **EDA Framework**: ydata-profiling for comprehensive dataset analysis and reporting
|
| 169 |
+
- **ML Libraries**: scikit-learn, XGBoost, LightGBM (via LazyPredict ecosystem)
|
| 170 |
+
- **Visualization**: Matplotlib and Seaborn for model comparison charts and statistical plots
|
| 171 |
+
- **Data Processing**: pandas and numpy for efficient data manipulation and preprocessing
|
| 172 |
+
- **Model Persistence**: pickle for secure model serialization and export
|
| 173 |
+
- **Web Requests**: requests library for robust URL-based data loading
|
| 174 |
+
- **MCP Integration**: Model Context Protocol server for AI agent compatibility
|
| 175 |
+
- **File Handling**: tempfile for secure temporary file management
|
| 176 |
+
|
| 177 |
+
## 📈 Current Features
|
| 178 |
+
|
| 179 |
+
- **🔄 Dual Input Support**: Upload local CSV files or provide public URLs for data loading
|
| 180 |
+
- **🤖 One-Click AutoML**: Complete ML pipeline from data upload to trained model export
|
| 181 |
+
- **🎯 Intelligent Task Detection**: Automatic classification vs regression detection based on target variable analysis
|
| 182 |
+
- **📊 Multi-Algorithm Comparison**: Simultaneous comparison of 20+ algorithms with LazyPredict
|
| 183 |
+
- **📋 Comprehensive EDA**: Detailed dataset profiling with statistical analysis and data quality reports
|
| 184 |
+
- **💾 Model Export**: Download best performing model as pickle file for production deployment
|
| 185 |
+
- **📈 Performance Visualization**: Clear charts showing algorithm comparison and performance metrics
|
| 186 |
+
- **🌐 MCP Server Integration**: Full Model Context Protocol support for seamless AI assistant integration
|
| 187 |
+
- **🛡️ Robust Error Handling**: Comprehensive validation with informative user feedback
|
| 188 |
+
- **🎨 Modern UI**: Clean, responsive interface optimized for both human and agent interactions
|
| 189 |
+
|
| 190 |
+
## 🎯 Hackathon Submission Highlights
|
| 191 |
+
|
| 192 |
+
1. **🤖 LazyPredict Integration**: Automated comparison of 20+ ML algorithms with minimal configuration
|
| 193 |
+
2. **🧠 Smart Automation**: Intelligent task detection, data validation, and model selection
|
| 194 |
+
3. **📊 Comprehensive Analysis**: ydata-profiling powered EDA reports with statistical insights
|
| 195 |
+
4. **👥 Dual Interface Design**: Optimized for both human users and AI agent interactions
|
| 196 |
+
5. **🌐 MCP Server Implementation**: Full Model Context Protocol integration for seamless agent workflows
|
| 197 |
+
6. **🔄 Flexible Data Loading**: Support for both local uploads and URL-based data sources
|
| 198 |
+
7. **📈 Production Ready**: Exportable models, comprehensive documentation, and robust error handling
|
| 199 |
+
8. **🎨 Modern UI/UX**: Clean Gradio interface with intuitive workflow and clear feedback systems
|
| 200 |
+
|
| 201 |
+
|
| 202 |
+
|
| 203 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
app.py
ADDED
|
@@ -0,0 +1,153 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import pandas as pd
|
| 3 |
+
import io
|
| 4 |
+
import pickle
|
| 5 |
+
import matplotlib.pyplot as plt
|
| 6 |
+
import seaborn as sns
|
| 7 |
+
from lazypredict.Supervised import LazyClassifier, LazyRegressor
|
| 8 |
+
from sklearn.model_selection import train_test_split
|
| 9 |
+
from ydata_profiling import ProfileReport
|
| 10 |
+
import tempfile
|
| 11 |
+
import requests
|
| 12 |
+
|
| 13 |
+
def load_data(file_input):
|
| 14 |
+
"""Loads CSV data from either a local file upload or a public URL."""
|
| 15 |
+
if file_input is None:
|
| 16 |
+
return None
|
| 17 |
+
try:
|
| 18 |
+
# For local file uploads, file_input is a temporary file object
|
| 19 |
+
if hasattr(file_input, 'name'):
|
| 20 |
+
file_path = file_input.name
|
| 21 |
+
with open(file_path, 'rb') as f:
|
| 22 |
+
file_bytes = f.read()
|
| 23 |
+
df = pd.read_csv(io.BytesIO(file_bytes))
|
| 24 |
+
# For URL text input
|
| 25 |
+
elif isinstance(file_input, str) and file_input.startswith('http'):
|
| 26 |
+
response = requests.get(file_input)
|
| 27 |
+
response.raise_for_status()
|
| 28 |
+
df = pd.read_csv(io.StringIO(response.text))
|
| 29 |
+
else:
|
| 30 |
+
return None
|
| 31 |
+
return df
|
| 32 |
+
except Exception as e:
|
| 33 |
+
gr.Warning(f"Failed to load or parse data: {e}")
|
| 34 |
+
return None
|
| 35 |
+
|
| 36 |
+
def analyze_and_model(df, target_column):
|
| 37 |
+
"""Internal function to perform EDA, model training, and visualization."""
|
| 38 |
+
# ... (This function's content is unchanged)
|
| 39 |
+
profile = ProfileReport(df, title="EDA Report", minimal=True)
|
| 40 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix=".html") as temp_html:
|
| 41 |
+
profile.to_file(temp_html.name)
|
| 42 |
+
profile_path = temp_html.name
|
| 43 |
+
|
| 44 |
+
X = df.drop(columns=[target_column])
|
| 45 |
+
y = df[target_column]
|
| 46 |
+
task = "classification" if y.nunique() <= 10 else "regression"
|
| 47 |
+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
| 48 |
+
|
| 49 |
+
model = LazyClassifier(ignore_warnings=True, verbose=0) if task == "classification" else LazyRegressor(ignore_warnings=True, verbose=0)
|
| 50 |
+
models, _ = model.fit(X_train, X_test, y_train, y_test)
|
| 51 |
+
|
| 52 |
+
sort_metric = "Accuracy" if task == "classification" else "R-Squared"
|
| 53 |
+
best_model_name = models.sort_values(by=sort_metric, ascending=False).index[0]
|
| 54 |
+
best_model = model.models[best_model_name]
|
| 55 |
+
|
| 56 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix=".pkl") as temp_pkl:
|
| 57 |
+
pickle.dump(best_model, temp_pkl)
|
| 58 |
+
pickle_path = temp_pkl.name
|
| 59 |
+
|
| 60 |
+
plt.figure(figsize=(10, 6))
|
| 61 |
+
plot_column = "Accuracy" if task == "classification" else "R-Squared"
|
| 62 |
+
sns.barplot(x=models[plot_column].head(10), y=models.head(10).index)
|
| 63 |
+
plt.title(f"Top 10 Models by {plot_column}")
|
| 64 |
+
plt.tight_layout()
|
| 65 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as temp_png:
|
| 66 |
+
plt.savefig(temp_png.name)
|
| 67 |
+
plot_path = temp_png.name
|
| 68 |
+
plt.close()
|
| 69 |
+
|
| 70 |
+
models_reset = models.reset_index().rename(columns={'index': 'Model'})
|
| 71 |
+
return profile_path, task, models_reset, plot_path, pickle_path
|
| 72 |
+
|
| 73 |
+
def run_pipeline(data_source, target_column):
|
| 74 |
+
"""
|
| 75 |
+
This single function drives the entire application.
|
| 76 |
+
It's exposed as the primary tool for the MCP server.
|
| 77 |
+
|
| 78 |
+
:param data_source: A local file path (from gr.File) or a URL (from gr.Textbox).
|
| 79 |
+
:param target_column: The name of the target column for prediction.
|
| 80 |
+
"""
|
| 81 |
+
# --- 1. Input Validation ---
|
| 82 |
+
if not data_source or not target_column:
|
| 83 |
+
error_msg = "Error: Data source and target column must be provided."
|
| 84 |
+
gr.Warning(error_msg)
|
| 85 |
+
return None, error_msg, None, None, None, "Please provide all inputs."
|
| 86 |
+
|
| 87 |
+
gr.Info("Starting analysis...")
|
| 88 |
+
|
| 89 |
+
# --- 2. Data Loading ---
|
| 90 |
+
df = load_data(data_source)
|
| 91 |
+
if df is None:
|
| 92 |
+
return None, "Error: Could not load data.", None, None, None, None
|
| 93 |
+
|
| 94 |
+
if target_column not in df.columns:
|
| 95 |
+
error_msg = f"Error: Target column '{target_column}' not found in the dataset. Available: {list(df.columns)}"
|
| 96 |
+
gr.Warning(error_msg)
|
| 97 |
+
return None, error_msg, None, None, None, None
|
| 98 |
+
|
| 99 |
+
# --- 3. Analysis and Modeling ---
|
| 100 |
+
profile_path, task, models_df, plot_path, pickle_path = analyze_and_model(df, target_column)
|
| 101 |
+
|
| 102 |
+
# --- 4. Explanation ---
|
| 103 |
+
best_model_name = models_df.iloc[0]['Model']
|
| 104 |
+
llm_explanation = f"AI explanation for the '{task}' task: The top performing model was **{best_model_name}**."
|
| 105 |
+
|
| 106 |
+
gr.Info("Analysis complete!")
|
| 107 |
+
return profile_path, task, models_df, plot_path, pickle_path, llm_explanation
|
| 108 |
+
|
| 109 |
+
# --- Gradio UI ---
|
| 110 |
+
with gr.Blocks(title="AutoML Trainer", theme=gr.themes.Soft()) as demo:
|
| 111 |
+
gr.Markdown("## 🤖 AutoML Trainer")
|
| 112 |
+
gr.Markdown("Enter a CSV data source (local file or public URL) and a target column to run the analysis. This interface is now friendly for both humans and AI agents.")
|
| 113 |
+
|
| 114 |
+
with gr.Row():
|
| 115 |
+
with gr.Column(scale=1):
|
| 116 |
+
# Using gr.File allows for both upload and is compatible with agents
|
| 117 |
+
file_input = gr.File(label="Upload Local CSV File")
|
| 118 |
+
url_input = gr.Textbox(label="Or Enter Public CSV URL", placeholder="e.g., https://.../data.csv")
|
| 119 |
+
target_column_input = gr.Textbox(label="Enter Target Column Name", placeholder="e.g., approved")
|
| 120 |
+
run_button = gr.Button("Run Analysis & AutoML", variant="primary")
|
| 121 |
+
|
| 122 |
+
with gr.Column(scale=2):
|
| 123 |
+
task_output = gr.Textbox(label="Detected Task", interactive=False)
|
| 124 |
+
llm_output = gr.Textbox(label="AI Explanation (WIP)", lines=3, interactive=False)
|
| 125 |
+
metrics_output = gr.Dataframe(label="Model Performance Metrics")
|
| 126 |
+
|
| 127 |
+
with gr.Row():
|
| 128 |
+
vis_output = gr.Image(label="Top Models Comparison")
|
| 129 |
+
with gr.Column():
|
| 130 |
+
eda_output = gr.File(label="Download Full EDA Report")
|
| 131 |
+
model_output = gr.File(label="Download Best Model (.pkl)")
|
| 132 |
+
|
| 133 |
+
# The single click event that powers the whole app
|
| 134 |
+
# A helper function decides whether to use the file or URL input
|
| 135 |
+
def process_inputs(file_data, url_data, target):
|
| 136 |
+
data_source = file_data if file_data is not None else url_data
|
| 137 |
+
return run_pipeline(data_source, target)
|
| 138 |
+
|
| 139 |
+
run_button.click(
|
| 140 |
+
fn=process_inputs,
|
| 141 |
+
inputs=[file_input, url_input, target_column_input],
|
| 142 |
+
outputs=[eda_output, task_output, metrics_output, vis_output, model_output, llm_output]
|
| 143 |
+
)
|
| 144 |
+
|
| 145 |
+
demo.launch(
|
| 146 |
+
server_name="0.0.0.0",
|
| 147 |
+
server_port=7860,
|
| 148 |
+
share=True,
|
| 149 |
+
show_api=True,
|
| 150 |
+
inbrowser=True,
|
| 151 |
+
mcp_server=True
|
| 152 |
+
|
| 153 |
+
)
|
requirements.txt
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
mcp>=1.9.2
|
| 2 |
+
openai>=1.0.0
|
| 3 |
+
python-dotenv>=1.0.0
|
| 4 |
+
gradio>=4.0.0
|
| 5 |
+
Pillow>=10.0.0
|
| 6 |
+
scikit-learn>=1.3.0
|
| 7 |
+
pandas>=2.0.0
|
| 8 |
+
numpy>=1.24.0
|
| 9 |
+
matplotlib>=3.7.0
|
| 10 |
+
seaborn>=0.12.0
|
| 11 |
+
plotly>=5.0.0
|
| 12 |
+
xgboost>=1.7.0
|
| 13 |
+
lightgbm>=3.3.0
|
| 14 |
+
shap>=0.42.0
|
| 15 |
+
lazypredict>=0.2.12
|
| 16 |
+
ydata-profiling>=4.0.0
|