# 🤖 Auto-FineTune-Ops

> **Autonomous End-to-End LLM Fine-Tuning Pipeline**
>
> From raw data to production API in one click. No ML expertise required.

[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
[![Streamlit](https://img.shields.io/badge/Streamlit-1.32+-red.svg)](https://streamlit.io/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

---

## 🎯 What Is This?

Auto-FineTune-Ops is a **no-code/low-code platform** that automates the entire lifecycle of fine-tuning Large Language Models (LLMs). It handles:

1.  **Data Ingestion:** Upload CSV, JSON, or JSONL files.
2.  **Advanced Preprocessing:** 10+ modules for cleaning, PII redaction, deduplication, and formatting.
3.  **Hybrid Training:** Train locally on GPU (Unsloth/LoRA) or generate a **Google Colab Notebook** for free cloud GPU training.
4.  **AI Judge Evaluation:** Compare your fine-tuned model against the base model using GPT-4, Claude 3.5, Gemini, or Groq as a judge.
5.  **One-Click Deployment:** Export your trained model as a production-ready FastAPI endpoint.

**All accessible via a premium, easy-to-use Streamlit Dashboard.**

---

## ✨ Key Features

### 🧠 Intelligent Preprocessing
- **Text Cleaning:** Remove HTML, URLs, emojis, normalize whitespace.
- **PII Filter:** Redact emails, phone numbers, API keys.
- **Deduplication:** Remove exact and semantic (TF-IDF) duplicates.
- **Quality Filters:** Filter by length, language, toxicity.
- **Balancing:** Oversample/undersample classes for classification tasks.
- **Export Formats:** Auto-convert to OpenAI Chat, Completion, or Classification JSONL formats.

### ⚡ Flexible Training Workflows
- **Local GPU:** Uses **Unsloth** for ultra-fast 4-bit LoRA fine-tuning (2x faster, 70% less memory).
- **Google Colab Fallback:** Don't have a GPU? The app generates a ready-to-run Colab notebook for you. Download models back to the app for evaluation.
- **Custom Models:** Fine-tune any HuggingFace model (Llama 3, Mistral, Gemma, Phi-3, etc.).

### ⚖️ Multi-Provider AI Judge
Evaluate models head-to-head using:
- **OpenAI** (GPT-4o, GPT-4-turbo)
- **Anthropic** (Claude 3.5 Sonnet, Opus)
- **Google** (Gemini 1.5 Pro)
- **Groq** (Llama 3, Mixtral)
- **Custom Endpoints** (Ollama, vLLM)

---

## 🚀 Quick Start

### 1. Installation

```bash
# Clone the repository
git clone https://github.com/your-username/Auto-FineTune-Ops.git
cd Auto-FineTune-Ops

# Create a virtual environment
python -m venv venv
# Windows:
.\venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt
```

### 2. Launch the Dashboard

```bash
streamlit run app.py
```

Open your browser to the URL shown (usually `http://localhost:8501`).

---

## 🛠️ Workflow Guide

### Step 1: Data Upload
- Upload your raw `CSV` or `JSON` file containing instruction-response pairs.
- The app automatically detects columns like `instruction`, `input`, `output`.
- Preview full dataset with pagination.

### Step 2: Preprocessing
- Configure cleaning rules (HTML removal, lowercase, etc.).
- Set PII filters (mask emails/phones).
- Enable semantic deduplication.
- Click **Run Pipeline** to clean and format your data.

### Step 3: Training
- **If you have a GPU:** Select a base model (e.g., Llama-3-8b) and click **Start Training**.
- **If you have no GPU:**
    1.  Download the preprocessed data.
    2.  Download the generated `Colab Notebook`.
    3.  Run training on Google Colab (Free Tier).
    4.  Upload the fine-tuned model results back to the app.

### Step 4: Evaluation
- Compare your fine-tuned model vs. the base model.
- Select an AI Judge (e.g., GPT-4o).
- Visualize win rates and quality scores (Accuracy, Helpfulness, Tone).

### Step 5: Deployment
- Deploy your model locally as a REST API:
  ```bash
  python scripts/deploy.py --model ./output/models/your_model --port 8000
  ```
- Or push to HuggingFace Hub directly from the dashboard.

---

## 🏗️ Project Structure

```
ml_oops/
├── app.py                     # 🚀 Main Streamlit Dashboard
├── main.py                    # 🧠 CLI Orchestrator (Headless mode)
├── requirements.txt           # Dependencies
├── agents/                    # Core Logic Agents
│   ├── data_architect.py      # Data Analysis & Cleaning
│   ├── training_pilot.py      # Fine-Tuning Logic
│   └── the_judge.py           # Evaluation Logic
├── preprocessing/             # Advanced Preprocessing Modules
│   ├── text_cleaning.py       # Regex & Normalization
│   ├── pii_filter.py          # PII Redaction
│   ├── deduplication.py       # Semantic Dedupe
│   └── ...
├── configs/                   # Configuration Files
└── output/                    # Artifacts (Models, Logs, Reports)
```

---
## live App
https://aneebnaqvi15-auto-finetune-ops-app-1xmv11.streamlit.app/

## 🤝 Contributing

Contributions are welcome! Please read `CONTRIBUTING.md` for details on our code of conduct and the process for submitting pull requests.

## 📜 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

<div align="center">
  <b>Built for modern ML teams.</b><br>
  <i>Replace weeks of manual engineering with minutes of automated ops.</i>
</div>