Auto-FineTune-Ops / README.md
aneeb15's picture
Revise README content and add live app link
941865d unverified
# πŸ€– Auto-FineTune-Ops
> **Autonomous End-to-End LLM Fine-Tuning Pipeline**
>
> From raw data to production API in one click. No ML expertise required.
[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
[![Streamlit](https://img.shields.io/badge/Streamlit-1.32+-red.svg)](https://streamlit.io/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
---
## 🎯 What Is This?
Auto-FineTune-Ops is a **no-code/low-code platform** that automates the entire lifecycle of fine-tuning Large Language Models (LLMs). It handles:
1. **Data Ingestion:** Upload CSV, JSON, or JSONL files.
2. **Advanced Preprocessing:** 10+ modules for cleaning, PII redaction, deduplication, and formatting.
3. **Hybrid Training:** Train locally on GPU (Unsloth/LoRA) or generate a **Google Colab Notebook** for free cloud GPU training.
4. **AI Judge Evaluation:** Compare your fine-tuned model against the base model using GPT-4, Claude 3.5, Gemini, or Groq as a judge.
5. **One-Click Deployment:** Export your trained model as a production-ready FastAPI endpoint.
**All accessible via a premium, easy-to-use Streamlit Dashboard.**
---
## ✨ Key Features
### 🧠 Intelligent Preprocessing
- **Text Cleaning:** Remove HTML, URLs, emojis, normalize whitespace.
- **PII Filter:** Redact emails, phone numbers, API keys.
- **Deduplication:** Remove exact and semantic (TF-IDF) duplicates.
- **Quality Filters:** Filter by length, language, toxicity.
- **Balancing:** Oversample/undersample classes for classification tasks.
- **Export Formats:** Auto-convert to OpenAI Chat, Completion, or Classification JSONL formats.
### ⚑ Flexible Training Workflows
- **Local GPU:** Uses **Unsloth** for ultra-fast 4-bit LoRA fine-tuning (2x faster, 70% less memory).
- **Google Colab Fallback:** Don't have a GPU? The app generates a ready-to-run Colab notebook for you. Download models back to the app for evaluation.
- **Custom Models:** Fine-tune any HuggingFace model (Llama 3, Mistral, Gemma, Phi-3, etc.).
### βš–οΈ Multi-Provider AI Judge
Evaluate models head-to-head using:
- **OpenAI** (GPT-4o, GPT-4-turbo)
- **Anthropic** (Claude 3.5 Sonnet, Opus)
- **Google** (Gemini 1.5 Pro)
- **Groq** (Llama 3, Mixtral)
- **Custom Endpoints** (Ollama, vLLM)
---
## πŸš€ Quick Start
### 1. Installation
```bash
# Clone the repository
git clone https://github.com/your-username/Auto-FineTune-Ops.git
cd Auto-FineTune-Ops
# Create a virtual environment
python -m venv venv
# Windows:
.\venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
```
### 2. Launch the Dashboard
```bash
streamlit run app.py
```
Open your browser to the URL shown (usually `http://localhost:8501`).
---
## πŸ› οΈ Workflow Guide
### Step 1: Data Upload
- Upload your raw `CSV` or `JSON` file containing instruction-response pairs.
- The app automatically detects columns like `instruction`, `input`, `output`.
- Preview full dataset with pagination.
### Step 2: Preprocessing
- Configure cleaning rules (HTML removal, lowercase, etc.).
- Set PII filters (mask emails/phones).
- Enable semantic deduplication.
- Click **Run Pipeline** to clean and format your data.
### Step 3: Training
- **If you have a GPU:** Select a base model (e.g., Llama-3-8b) and click **Start Training**.
- **If you have no GPU:**
1. Download the preprocessed data.
2. Download the generated `Colab Notebook`.
3. Run training on Google Colab (Free Tier).
4. Upload the fine-tuned model results back to the app.
### Step 4: Evaluation
- Compare your fine-tuned model vs. the base model.
- Select an AI Judge (e.g., GPT-4o).
- Visualize win rates and quality scores (Accuracy, Helpfulness, Tone).
### Step 5: Deployment
- Deploy your model locally as a REST API:
```bash
python scripts/deploy.py --model ./output/models/your_model --port 8000
```
- Or push to HuggingFace Hub directly from the dashboard.
---
## πŸ—οΈ Project Structure
```
ml_oops/
β”œβ”€β”€ app.py # πŸš€ Main Streamlit Dashboard
β”œβ”€β”€ main.py # 🧠 CLI Orchestrator (Headless mode)
β”œβ”€β”€ requirements.txt # Dependencies
β”œβ”€β”€ agents/ # Core Logic Agents
β”‚ β”œβ”€β”€ data_architect.py # Data Analysis & Cleaning
β”‚ β”œβ”€β”€ training_pilot.py # Fine-Tuning Logic
β”‚ └── the_judge.py # Evaluation Logic
β”œβ”€β”€ preprocessing/ # Advanced Preprocessing Modules
β”‚ β”œβ”€β”€ text_cleaning.py # Regex & Normalization
β”‚ β”œβ”€β”€ pii_filter.py # PII Redaction
β”‚ β”œβ”€β”€ deduplication.py # Semantic Dedupe
β”‚ └── ...
β”œβ”€β”€ configs/ # Configuration Files
└── output/ # Artifacts (Models, Logs, Reports)
```
---
## live App
https://aneebnaqvi15-auto-finetune-ops-app-1xmv11.streamlit.app/
## 🀝 Contributing
Contributions are welcome! Please read `CONTRIBUTING.md` for details on our code of conduct and the process for submitting pull requests.
## πŸ“œ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
---
<div align="center">
<b>Built for modern ML teams.</b><br>
<i>Replace weeks of manual engineering with minutes of automated ops.</i>
</div>