Spaces:
Sleeping
title: AutoML
sdk: docker
emoji: 🚀
colorFrom: red
colorTo: purple
AutoML Project
Overview
This project is a comprehensive Automated Machine Learning (AutoML) platform designed to streamline the machine learning workflow for CSV-formatted datasets, particularly catering to students and researchers who need a rapid system for demo sessions in their data science and AI projects. It integrates various functionalities including automated data cleaning, supervised and unsupervised learning model training, an AI-powered data assistant, and an interactive web-based frontend for user interaction and visualization.
Features
- Automated Data Cleaning: Utilities to preprocess and clean raw datasets, ensuring data quality for model training.
- Supervised Learning Models: Implementation and integration of various supervised machine learning algorithms.
- Unsupervised Learning Models: Support for unsupervised learning techniques for tasks like clustering and dimensionality reduction.
- AI Data Assistant (Agentic Capability): A Retrieval Augmented Generation (RAG) based AI assistant designed to help users interact with and understand their CSV datasets. This component demonstrates agentic capabilities by intelligently processing natural language queries, retrieving relevant information from the dataset, and assisting with data exploration and analysis.
- Interactive Web Frontend: A user-friendly web interface built with HTML, CSS, and JavaScript for interacting with the AutoML functionalities and visualizing results.
- Data Visualization: Tools to generate insightful charts and graphs from processed data and model outputs.
Project Structure
The project is organized into the following main directories:
.env: Environment variables, including API keys.app.py: The main application entry point.config.py: Configuration settings for the application.frontend/: Contains the static files for the web-based user interface (HTML, CSS, JavaScript, images).models/: Houses the implementations for supervised and unsupervised machine learning models.rag/: Contains modules related to the Retrieval Augmented Generation (RAG) system, including memory management and query processing.utils/: Utility functions for data cleaning, metrics calculation, and other common tasks.visuals/: Modules dedicated to data visualization and chart generation.
Installation
To set up the project locally, follow these steps:
Clone the repository:
git clone https://github.com/Al1Abdullah/AutoML.git cd AutoMLCreate a virtual environment (recommended):
python -m venv venv # On Windows .\venv\Scripts\activate # On macOS/Linux source venv/bin/activateInstall dependencies:
pip install -r requirements.txtConfigure API Keys: Create or update the
.envfile in the root directory with your Groq API key:GROQ_API_KEY="YOUR_GROQ_API_KEY_HERE"Similarly, update
groq_config.jsonwith your Groq API key:{ "GROQ_API_KEY": "YOUR_GROQ_API_KEY_HERE" }Note: Replace
"YOUR_GROQ_API_KEY_HERE"with your actual Groq API key. Do not commit your actual API keys to version control.
Usage
To run the AutoML application:
- Activate your virtual environment (if not already active).
- Run the main application file:
(Further instructions on how to access the web frontend would depend on howpython app.pyapp.pyserves it. If it's a Flask/Django app, it would typically mention a local server address.)
Technologies Used
- Python: Core programming language.
- HTML, CSS, JavaScript: For the frontend development.
- Git: Version control.
- Groq API: For AI-powered functionalities (e.g., data assistant).
- CatBoost: (Implied by
catboost_info) A machine learning library.
Future Enhancements (Autonomous System Potential)
The architecture of this project, particularly the RAG-based AI Data Assistant, lays the groundwork for developing more autonomous capabilities. Future enhancements could involve integrating more complex decision-making processes, self-correction mechanisms, and broader task automation, moving towards a more fully autonomous AutoML system.
Contributing
Contributions are welcome! Please feel free to fork the repository, create a new branch, and submit a pull request for any improvements or bug fixes.
License
This project is licensed under the MIT License. See the LICENSE file for more details (if applicable).