--- title: AutoML sdk: docker emoji: 🚀 colorFrom: red colorTo: purple --- # AutoML Project ## Overview This project is a comprehensive Automated Machine Learning (AutoML) platform designed to streamline the machine learning workflow for **CSV-formatted datasets**, particularly catering to students and researchers who need a rapid system for **demo sessions** in their data science and AI projects. It integrates various functionalities including automated data cleaning, supervised and unsupervised learning model training, an AI-powered data assistant, and an interactive web-based frontend for user interaction and visualization. ## Features * **Automated Data Cleaning:** Utilities to preprocess and clean raw datasets, ensuring data quality for model training. * **Supervised Learning Models:** Implementation and integration of various supervised machine learning algorithms. * **Unsupervised Learning Models:** Support for unsupervised learning techniques for tasks like clustering and dimensionality reduction. * **AI Data Assistant (Agentic Capability):** A Retrieval Augmented Generation (RAG) based AI assistant designed to help users interact with and understand their **CSV datasets**. This component demonstrates agentic capabilities by intelligently processing natural language queries, retrieving relevant information from the dataset, and assisting with data exploration and analysis. * **Interactive Web Frontend:** A user-friendly web interface built with HTML, CSS, and JavaScript for interacting with the AutoML functionalities and visualizing results. * **Data Visualization:** Tools to generate insightful charts and graphs from processed data and model outputs. ## Project Structure The project is organized into the following main directories: * `.env`: Environment variables, including API keys. * `app.py`: The main application entry point. * `config.py`: Configuration settings for the application. * `frontend/`: Contains the static files for the web-based user interface (HTML, CSS, JavaScript, images). * `models/`: Houses the implementations for supervised and unsupervised machine learning models. * `rag/`: Contains modules related to the Retrieval Augmented Generation (RAG) system, including memory management and query processing. * `utils/`: Utility functions for data cleaning, metrics calculation, and other common tasks. * `visuals/`: Modules dedicated to data visualization and chart generation. ## Installation To set up the project locally, follow these steps: 1. **Clone the repository:** ```bash git clone https://github.com/Al1Abdullah/AutoML.git cd AutoML ``` 2. **Create a virtual environment (recommended):** ```bash python -m venv venv # On Windows .\venv\Scripts\activate # On macOS/Linux source venv/bin/activate ``` 3. **Install dependencies:** ```bash pip install -r requirements.txt ``` 4. **Configure API Keys:** Create or update the `.env` file in the root directory with your Groq API key: ``` GROQ_API_KEY="YOUR_GROQ_API_KEY_HERE" ``` Similarly, update `groq_config.json` with your Groq API key: ```json { "GROQ_API_KEY": "YOUR_GROQ_API_KEY_HERE" } ``` **Note:** Replace `"YOUR_GROQ_API_KEY_HERE"` with your actual Groq API key. Do not commit your actual API keys to version control. ## Usage To run the AutoML application: 1. **Activate your virtual environment** (if not already active). 2. **Run the main application file:** ```bash python app.py ``` (Further instructions on how to access the web frontend would depend on how `app.py` serves it. If it's a Flask/Django app, it would typically mention a local server address.) ## Technologies Used * **Python:** Core programming language. * **HTML, CSS, JavaScript:** For the frontend development. * **Git:** Version control. * **Groq API:** For AI-powered functionalities (e.g., data assistant). * **CatBoost:** (Implied by `catboost_info`) A machine learning library. ## Future Enhancements (Autonomous System Potential) The architecture of this project, particularly the RAG-based AI Data Assistant, lays the groundwork for developing more autonomous capabilities. Future enhancements could involve integrating more complex decision-making processes, self-correction mechanisms, and broader task automation, moving towards a more fully autonomous AutoML system. ## Contributing Contributions are welcome! Please feel free to fork the repository, create a new branch, and submit a pull request for any improvements or bug fixes. ## License This project is licensed under the MIT License. See the `LICENSE` file for more details (if applicable).