File size: 4,749 Bytes
a61f5b3
 
 
 
 
 
 
661c21e
 
 
 
c6d58fd
661c21e
 
 
 
 
 
c6d58fd
661c21e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9d02f2
661c21e
 
 
 
 
 
c6d58fd
661c21e
 
7e12ceb
 
c6d58fd
7e12ceb
661c21e
 
 
 
 
 
c6d58fd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: AutoML
sdk: docker
emoji: 🚀
colorFrom: red
colorTo: purple
---
# AutoML Project

## Overview

This project is a comprehensive Automated Machine Learning (AutoML) platform designed to streamline the machine learning workflow for **CSV-formatted datasets**, particularly catering to students and researchers who need a rapid system for **demo sessions** in their data science and AI projects. It integrates various functionalities including automated data cleaning, supervised and unsupervised learning model training, an AI-powered data assistant, and an interactive web-based frontend for user interaction and visualization.

## Features

*   **Automated Data Cleaning:** Utilities to preprocess and clean raw datasets, ensuring data quality for model training.
*   **Supervised Learning Models:** Implementation and integration of various supervised machine learning algorithms.
*   **Unsupervised Learning Models:** Support for unsupervised learning techniques for tasks like clustering and dimensionality reduction.
*   **AI Data Assistant (Agentic Capability):** A Retrieval Augmented Generation (RAG) based AI assistant designed to help users interact with and understand their **CSV datasets**. This component demonstrates agentic capabilities by intelligently processing natural language queries, retrieving relevant information from the dataset, and assisting with data exploration and analysis.
*   **Interactive Web Frontend:** A user-friendly web interface built with HTML, CSS, and JavaScript for interacting with the AutoML functionalities and visualizing results.
*   **Data Visualization:** Tools to generate insightful charts and graphs from processed data and model outputs.

## Project Structure

The project is organized into the following main directories:

*   `.env`: Environment variables, including API keys.
*   `app.py`: The main application entry point.
*   `config.py`: Configuration settings for the application.
*   `frontend/`: Contains the static files for the web-based user interface (HTML, CSS, JavaScript, images).
*   `models/`: Houses the implementations for supervised and unsupervised machine learning models.
*   `rag/`: Contains modules related to the Retrieval Augmented Generation (RAG) system, including memory management and query processing.
*   `utils/`: Utility functions for data cleaning, metrics calculation, and other common tasks.
*   `visuals/`: Modules dedicated to data visualization and chart generation.

## Installation

To set up the project locally, follow these steps:

1.  **Clone the repository:**
    ```bash
    git clone https://github.com/Al1Abdullah/AutoML.git
    cd AutoML
    ```

2.  **Create a virtual environment (recommended):**
    ```bash
    python -m venv venv
    # On Windows
    .\venv\Scripts\activate
    # On macOS/Linux
    source venv/bin/activate
    ```

3.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

4.  **Configure API Keys:**
    Create or update the `.env` file in the root directory with your Groq API key:
    ```
    GROQ_API_KEY="YOUR_GROQ_API_KEY_HERE"
    ```
    Similarly, update `groq_config.json` with your Groq API key:
    ```json
    {
      "GROQ_API_KEY": "YOUR_GROQ_API_KEY_HERE"
    }
    ```
    **Note:** Replace `"YOUR_GROQ_API_KEY_HERE"` with your actual Groq API key. Do not commit your actual API keys to version control.

## Usage

To run the AutoML application:

1.  **Activate your virtual environment** (if not already active).
2.  **Run the main application file:**
    ```bash
    python app.py
    ```
    (Further instructions on how to access the web frontend would depend on how `app.py` serves it. If it's a Flask/Django app, it would typically mention a local server address.)

## Technologies Used

*   **Python:** Core programming language.
*   **HTML, CSS, JavaScript:** For the frontend development.
*   **Git:** Version control.
*   **Groq API:** For AI-powered functionalities (e.g., data assistant).
*   **CatBoost:** (Implied by `catboost_info`) A machine learning library.

## Future Enhancements (Autonomous System Potential)

The architecture of this project, particularly the RAG-based AI Data Assistant, lays the groundwork for developing more autonomous capabilities. Future enhancements could involve integrating more complex decision-making processes, self-correction mechanisms, and broader task automation, moving towards a more fully autonomous AutoML system.

## Contributing

Contributions are welcome! Please feel free to fork the repository, create a new branch, and submit a pull request for any improvements or bug fixes.

## License

This project is licensed under the MIT License. See the `LICENSE` file for more details (if applicable).