kashh65 commited on
Commit
d4ee3e0
ยท
verified ยท
1 Parent(s): 890025a

Upload README2.md

Browse files
Files changed (1) hide show
  1. README2.md +311 -0
README2.md ADDED
@@ -0,0 +1,311 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- Custom header with green glow effect -->
2
+ <p align="center">
3
+ <img src="header.svg" alt="AutoML - Automated Machine Learning Platform" width="800" />
4
+ </p>
5
+
6
+ <p>
7
+ <p align="center">
8
+ <a href="https://github.com/username/Auto-ML/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="License: MIT"></a>
9
+ <a href="https://www.python.org/"><img src="https://img.shields.io/badge/Made%20with-Python-1f425f.svg" alt="Made with Python"></a>
10
+ <a href="https://streamlit.io/"><img src="https://img.shields.io/badge/Made%20with-Streamlit-FF4B4B.svg" alt="Made with Streamlit"></a>
11
+ <a href="https://scikit-learn.org/"><img src="https://img.shields.io/badge/Made%20with-Scikit--Learn-F7931E.svg" alt="Made with Scikit-Learn"></a>
12
+ </p>
13
+
14
+ <p align="center">
15
+ <a href="https://pandas.pydata.org/"><img src="https://img.shields.io/badge/Made%20with-Pandas-150458.svg" alt="Made with Pandas"></a>
16
+ <a href="https://numpy.org/"><img src="https://img.shields.io/badge/Made%20with-NumPy-013243.svg" alt="Made with NumPy"></a>
17
+ <a href="https://matplotlib.org/"><img src="https://img.shields.io/badge/Made%20with-Matplotlib-11557c.svg" alt="Made with Matplotlib"></a>
18
+ <a href="https://seaborn.pydata.org/"><img src="https://img.shields.io/badge/Made%20with-Seaborn-3776AB.svg" alt="Made with Seaborn"></a>
19
+ <a href="https://plotly.com/"><img src="https://img.shields.io/badge/Made%20with-Plotly-3F4F75.svg" alt="Made with Plotly"></a>
20
+ <a href="https://xgboost.readthedocs.io/"><img src="https://img.shields.io/badge/Made%20with-XGBoost-0073B7.svg" alt="Made with XGBoost"></a>
21
+ </p>
22
+
23
+ <p align="center">
24
+ <a href="https://python.langchain.com/"><img src="https://img.shields.io/badge/Made%20with-LangChain-00A86B.svg" alt="Made with LangChain"></a>
25
+ <a href="https://smith.langchain.com/"><img src="https://img.shields.io/badge/Monitored%20with-LangSmith-7742DD.svg" alt="Monitored with LangSmith"></a>
26
+ <a href="https://ai.google.dev/"><img src="https://img.shields.io/badge/Powered%20by-Google%20Gemini-4285F4.svg" alt="Powered by Google Gemini"></a>
27
+ <a href="https://groq.com/"><img src="https://img.shields.io/badge/Powered%20by-Groq-6236FF.svg" alt="Powered by Groq"></a>
28
+ <a href="https://www.python-dotenv.org/"><img src="https://img.shields.io/badge/Made%20with-python--dotenv-2E7D32.svg" alt="Made with python-dotenv"></a>
29
+ <a href="https://pickle.readthedocs.io/"><img src="https://img.shields.io/badge/Uses-pickle-8BC34A.svg" alt="Uses pickle"></a>
30
+ </p>
31
+
32
+ <p align="center">
33
+ <b>AutoML</b> is a powerful tool for automating the end-to-end process of applying machine learning to real-world problems. It simplifies the process of model selection, hyperparameter tuning, and downloading, making machine learning accessible to everyone.
34
+ </p>
35
+
36
+ ## ๐Ÿ”— Live Demo
37
+
38
+ <p align="center">
39
+ <a href="https://automl-demo.streamlit.app" target="_blank">
40
+ <img src="https://img.shields.io/badge/Try%20the%20Demo-00B8D9?style=for-the-badge&logo=streamlit&logoColor=white" alt="Try the Demo" />
41
+ </a>
42
+ </p>
43
+
44
+ <p align="center">
45
+ Check out the live demo of AutoML and experience the power of automated machine learning firsthand!
46
+ </p>
47
+
48
+ ## ๐ŸŽฌ Video Showcase
49
+
50
+ <p align="center">
51
+ <video width="800" controls>
52
+ <source src="demo-video.mp4" type="video/mp4">
53
+ Your browser does not support the video tag.
54
+ </video>
55
+ </p>
56
+
57
+ <p align="center">
58
+ <em>See AutoML in action: This demonstration shows how to analyze data, train models, and get AI-powered insights in minutes!</em>
59
+ </p>
60
+
61
+ ## โœจ Features
62
+
63
+ - ๐Ÿ“Š **Data Visualization and Analysis**: Interactive visualizations to understand your data
64
+ - Correlation heatmaps
65
+ - Distribution plots
66
+ - Feature importance charts
67
+ - Pair plots for relationship analysis
68
+
69
+ - ๐Ÿงน **Automated Data Cleaning and Preprocessing**: Handle missing values, outliers, and feature engineering
70
+ - Automatic detection and handling of missing values
71
+ - Outlier detection and treatment
72
+ - Feature scaling and normalization
73
+ - Categorical encoding (One-Hot, Label, Target encoding)
74
+
75
+ - ๐Ÿค– **Multiple ML Model Selection**: Choose from a variety of models or let AutoML select the best one
76
+ - Classification models: Logistic Regression, Random Forest, XGBoost, SVC, Decision Tree, KNN, Gradient Boosting, AdaBoost, Gaussian Naive Bayes, QDA, LDA
77
+ - Regression models: Linear Regression, Random Forest, XGBoost, SVR, Decision Tree, KNN, ElasticNet, Gradient Boosting, AdaBoost, Bayesian Ridge, Ridge, Lasso
78
+
79
+ - โš™๏ธ **Hyperparameter Tuning**: Optimize model performance with advanced tuning techniques
80
+ - Added Support for 20+ Models to easily fine tune hyperparameters
81
+ - Added Support for 10+ Hyperparameter Tuning Techniques
82
+
83
+
84
+ - ๐Ÿ“ˆ **Model Performance Evaluation**: Comprehensive metrics and visualizations
85
+ - Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC, Confusion Matrix
86
+ - Regression: MAE, MSE, RMSE, Rยฒ, Residual Plots
87
+
88
+ - ๐Ÿ” **AI-powered Data Insights**: Leverage Google's Gemini for intelligent data analysis
89
+ - Natural language explanations of model decisions
90
+ - Automated feature importance interpretation
91
+ - Data quality assessment
92
+ - Trend identification and anomaly detection
93
+
94
+ - ๐Ÿง  **LLM Fine-Tuning and Download**: Access and utilize pre-trained language models
95
+ - Download fine-tuned LLMs for specific domains
96
+ - Customize existing models for your specific use case
97
+ - Access to various model sizes (small, medium, large)
98
+ - Seamless integration with your data processing pipeline
99
+
100
+ ## ๐Ÿš€ Installation
101
+
102
+ ### Prerequisites
103
+
104
+ - Python 3.8 or higher
105
+ - Google API key for Gemini for data insights and dataframe cleaning
106
+ - Groq API key for LLM based test results analysis
107
+ - langsmith API for monitoring llm calls
108
+
109
+ ### Setup
110
+
111
+ 1. Clone the repository:
112
+ ```bash
113
+ git clone <repository-url>
114
+ cd Auto-ML
115
+ ```
116
+
117
+ 2. Create a virtual environment:
118
+ ```bash
119
+ python -m venv venv
120
+ source venv/bin/activate # On Windows: venv\Scripts\activate
121
+ ```
122
+
123
+ 3. Install dependencies:
124
+ ```bash
125
+ pip install -r requirements.txt
126
+ ```
127
+
128
+ 4. Set up your environment variables:
129
+ ```bash
130
+ # Create a .env file with your Google API key as well as other keys
131
+ echo "GOOGLE_API_KEY=your_api_key_here" > .env
132
+ ```
133
+
134
+ ## ๐ŸŽฎ Usage
135
+
136
+ Start the application:
137
+
138
+ ```bash
139
+ streamlit run app.py
140
+ ```
141
+
142
+ ### Quick Start Guide
143
+
144
+ 1. **Upload Data**: Upload your CSV file
145
+ - Supported format: CSV
146
+ - Automatic data type detection
147
+ - Preview of first few rows
148
+
149
+ 2. **Explore Data**: Visualize and understand your dataset
150
+ - Summary statistics
151
+ - Correlation analysis
152
+ - Distribution visualization
153
+ - Missing value analysis
154
+
155
+ 3. **Preprocess**: Clean and transform your data
156
+ - Handle missing values (imputation strategies)
157
+ - Remove or transform outliers
158
+ - Feature scaling options
159
+ - Encoding categorical variables
160
+
161
+ 4. **Train Models**: Select models and tune hyperparameters
162
+ - Choose target variable and features
163
+ - Select machine learning algorithms
164
+ - Configure hyperparameter search space
165
+ - Set evaluation metrics
166
+
167
+ 5. **Evaluate**: Compare model performance
168
+ - Performance metrics visualization
169
+ - Feature importance analysis
170
+ - Model comparison dashboard
171
+ - Cross-validation results
172
+
173
+ 6. **Deploy**: Export your model
174
+ - Download trained model as pickle file
175
+
176
+
177
+
178
+
179
+ ## ๐Ÿงฉ Project Structure
180
+
181
+ ```
182
+ Auto-ML/
183
+ โ”œโ”€โ”€ app.py # Main Streamlit application
184
+ โ”œโ”€โ”€ requirements.txt # Project dependencies
185
+ โ”œโ”€โ”€ .env # Environment variables (API keys)
186
+ โ”œโ”€โ”€ README.md # Project documentation
187
+ โ”œโ”€โ”€ models/ # Saved model files
188
+ โ”œโ”€โ”€ logs/ # Application logs
189
+ โ””โ”€โ”€ src/ # Source code
190
+ โ”œโ”€โ”€ __init__.py # Package initialization
191
+ โ”œโ”€โ”€ preprocessing/ # Data preprocessing modules
192
+ โ”‚ โ”œโ”€โ”€ __init__.py
193
+ โ”‚ โ””โ”€โ”€ ... # Data cleaning, transformation
194
+ โ”œโ”€โ”€ training/ # Model training modules
195
+ โ”‚ โ”œโ”€โ”€ __init__.py
196
+ โ”‚ โ””โ”€โ”€ ... # Model training, evaluation
197
+ โ”œโ”€โ”€ ui/ # User interface components
198
+ โ”‚ โ”œโ”€โ”€ __init__.py
199
+ โ”‚ โ””โ”€โ”€ ... # Streamlit UI elements
200
+ โ””โ”€โ”€ utils/ # Utility functions
201
+ โ”œโ”€โ”€ __init__.py
202
+ โ””โ”€โ”€ ... # Helper functions
203
+ ```
204
+
205
+
206
+
207
+ # Preprocessing Pipelines
208
+
209
+ 1\. Data Ingestion Pipeline
210
+ ---------------------------
211
+
212
+ **Purpose:** Collects raw data from multiple sources (CSV, databases, APIs).
213
+
214
+ * Reads structured/unstructured data
215
+ * Handles missing values and duplicates
216
+ * Converts raw data into a clean DataFrame
217
+
218
+ 2\. Data Cleaning & Preprocessing Pipeline
219
+ ------------------------------------------
220
+
221
+ **Purpose:** Transforms raw data into a machine-learning-ready format.
222
+
223
+ * **Cleans Data:** Handles NaNs, outliers, and standardizes columns
224
+ * **Encodes Categorical Features:** One-hot encoding, label encoding
225
+ * **Scales Numerical Data:** MinMaxScaler, StandardScaler
226
+
227
+
228
+
229
+
230
+ 3\. Model Selection & Training Pipeline
231
+ ---------------------------------------
232
+
233
+ **Purpose:** Automates the process of selecting and training.
234
+
235
+ * **Multiple Algorithms:** Trains XGBoost, RandomForest, Deep Learning models
236
+ * **Hyperparameter Optimization:** Finds the best config for each model
237
+
238
+
239
+
240
+ 6\. Model Deployment Pipeline
241
+ -----------------------------
242
+
243
+ **Purpose:** Makes the model available for real-world usage.
244
+
245
+ * Exports the Model (Pickle, ONNX, TensorFlow SavedModel)
246
+ * Easily Download after training
247
+
248
+
249
+
250
+ # Feedback and Fallback Mechanism
251
+
252
+ AutoML implements a robust feedback and fallback system to ensure reliability:
253
+
254
+ 1. **Data Cleaning Validation**: The system validates all cleaning operations and provides feedback on the changes made
255
+ - Automatic detection of cleaning effectiveness
256
+ - Detailed logs of transformations applied to the data
257
+
258
+ 2. **LLM Fallback Mechanism**: For AI-powered insights and data analysis
259
+ - Primary attempt uses advanced LLMs (Google Gemini/Groq)
260
+ - Automatic fallback to rule-based algorithms if LLM fails
261
+ - Graceful degradation to ensure core functionality remains available
262
+ - Error logging and reporting for continuous improvement
263
+ - LangSmith integration for monitoring and tracking all LLM calls
264
+
265
+ 3. **Error Feedback Loop**: Intelligent error handling during data cleaning
266
+ - Automatically captures errors that occur during data cleaning operations
267
+ - Sends error context to LLM to generate refined cleaning code
268
+ - Re-executes the improved cleaning process
269
+ - Iterative refinement ensures robust data preparation even with challenging datasets
270
+
271
+ ## ๐Ÿค Contributing
272
+
273
+ We welcome contributions!
274
+
275
+ ### Development Setup
276
+
277
+ 1. Fork the repository
278
+ 2. Create a feature branch
279
+ 3. Install development dependencies:
280
+ ```bash
281
+ pip install -r requirements-dev.txt
282
+ ```
283
+ 4. Make your changes
284
+ 5. Run tests:
285
+ ```bash
286
+ pytest
287
+ ```
288
+ 6. Submit a pull request
289
+
290
+ ## ๐Ÿ“„ License
291
+
292
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
293
+
294
+ ## ๐Ÿ™ Acknowledgements
295
+
296
+ - [Streamlit](https://streamlit.io/) for the interactive web framework
297
+ - [Scikit-learn](https://scikit-learn.org/) for machine learning algorithms
298
+ - [Pandas](https://pandas.pydata.org/) for data manipulation
299
+ - [Plotly](https://plotly.com/) for interactive visualizations
300
+ - [Google Gemini](https://ai.google.dev/) for AI-powered insights
301
+ - [XGBoost](https://xgboost.readthedocs.io/) for gradient boosting
302
+ - [Seaborn](https://seaborn.pydata.org/) for statistical visualizations
303
+ - [LangChain](https://python.langchain.com/) for large language model integration
304
+ - [LangSmith](https://smith.langchain.com/) for LLM call tracking and monitoring
305
+ - [Groq](https://groq.com/) for high-performance computing
306
+
307
+ ---
308
+
309
+ <p align="center">
310
+ Made with โค๏ธ by Akash Anandani
311
+ </p>