Spaces:

aneeb15
/

Auto-FineTune-Ops

Configuration error

App Files Files Community

Auto-FineTune-Ops / README.md

aneeb15

Revise README content and add live app link

941865d unverified 4 months ago

preview code

raw

history blame contribute delete

5.55 kB

	# 🤖 Auto-FineTune-Ops

	> Autonomous End-to-End LLM Fine-Tuning Pipeline
	>
	> From raw data to production API in one click. No ML expertise required.

	[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/)
	[![Streamlit](https://img.shields.io/badge/Streamlit-1.32+-red.svg)](https://streamlit.io/)
	[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

	---

	## 🎯 What Is This?

	Auto-FineTune-Ops is a no-code/low-code platform that automates the entire lifecycle of fine-tuning Large Language Models (LLMs). It handles:

	1. Data Ingestion: Upload CSV, JSON, or JSONL files.
	2. Advanced Preprocessing: 10+ modules for cleaning, PII redaction, deduplication, and formatting.
	3. Hybrid Training: Train locally on GPU (Unsloth/LoRA) or generate a Google Colab Notebook for free cloud GPU training.
	4. AI Judge Evaluation: Compare your fine-tuned model against the base model using GPT-4, Claude 3.5, Gemini, or Groq as a judge.
	5. One-Click Deployment: Export your trained model as a production-ready FastAPI endpoint.

	All accessible via a premium, easy-to-use Streamlit Dashboard.

	---

	## ✨ Key Features

	### 🧠 Intelligent Preprocessing
	- Text Cleaning: Remove HTML, URLs, emojis, normalize whitespace.
	- PII Filter: Redact emails, phone numbers, API keys.
	- Deduplication: Remove exact and semantic (TF-IDF) duplicates.
	- Quality Filters: Filter by length, language, toxicity.
	- Balancing: Oversample/undersample classes for classification tasks.
	- Export Formats: Auto-convert to OpenAI Chat, Completion, or Classification JSONL formats.

	### ⚡ Flexible Training Workflows
	- Local GPU: Uses Unsloth for ultra-fast 4-bit LoRA fine-tuning (2x faster, 70% less memory).
	- Google Colab Fallback: Don't have a GPU? The app generates a ready-to-run Colab notebook for you. Download models back to the app for evaluation.
	- Custom Models: Fine-tune any HuggingFace model (Llama 3, Mistral, Gemma, Phi-3, etc.).

	### ⚖️ Multi-Provider AI Judge
	Evaluate models head-to-head using:
	- OpenAI (GPT-4o, GPT-4-turbo)
	- Anthropic (Claude 3.5 Sonnet, Opus)
	- Google (Gemini 1.5 Pro)
	- Groq (Llama 3, Mixtral)
	- Custom Endpoints (Ollama, vLLM)

	---

	## 🚀 Quick Start

	### 1. Installation

	```bash
	# Clone the repository
	git clone https://github.com/your-username/Auto-FineTune-Ops.git
	cd Auto-FineTune-Ops

	# Create a virtual environment
	python -m venv venv
	# Windows:
	.\venv\Scripts\activate
	# Mac/Linux:
	source venv/bin/activate

	# Install dependencies
	pip install -r requirements.txt
	```

	### 2. Launch the Dashboard

	```bash
	streamlit run app.py
	```

	Open your browser to the URL shown (usually `http://localhost:8501`).

	---

	## 🛠️ Workflow Guide

	### Step 1: Data Upload
	- Upload your raw `CSV` or `JSON` file containing instruction-response pairs.
	- The app automatically detects columns like `instruction`, `input`, `output`.
	- Preview full dataset with pagination.

	### Step 2: Preprocessing
	- Configure cleaning rules (HTML removal, lowercase, etc.).
	- Set PII filters (mask emails/phones).
	- Enable semantic deduplication.
	- Click Run Pipeline to clean and format your data.

	### Step 3: Training
	- If you have a GPU: Select a base model (e.g., Llama-3-8b) and click Start Training.
	- If you have no GPU:
	1. Download the preprocessed data.
	2. Download the generated `Colab Notebook`.
	3. Run training on Google Colab (Free Tier).
	4. Upload the fine-tuned model results back to the app.

	### Step 4: Evaluation
	- Compare your fine-tuned model vs. the base model.
	- Select an AI Judge (e.g., GPT-4o).
	- Visualize win rates and quality scores (Accuracy, Helpfulness, Tone).

	### Step 5: Deployment
	- Deploy your model locally as a REST API:
	```bash
	python scripts/deploy.py --model ./output/models/your_model --port 8000
	```
	- Or push to HuggingFace Hub directly from the dashboard.

	---

	## 🏗️ Project Structure

	```
	ml_oops/
	├── app.py # 🚀 Main Streamlit Dashboard
	├── main.py # 🧠 CLI Orchestrator (Headless mode)
	├── requirements.txt # Dependencies
	├── agents/ # Core Logic Agents
	│ ├── data_architect.py # Data Analysis & Cleaning
	│ ├── training_pilot.py # Fine-Tuning Logic
	│ └── the_judge.py # Evaluation Logic
	├── preprocessing/ # Advanced Preprocessing Modules
	│ ├── text_cleaning.py # Regex & Normalization
	│ ├── pii_filter.py # PII Redaction
	│ ├── deduplication.py # Semantic Dedupe
	│ └── ...
	├── configs/ # Configuration Files
	└── output/ # Artifacts (Models, Logs, Reports)
	```

	---
	## live App
	https://aneebnaqvi15-auto-finetune-ops-app-1xmv11.streamlit.app/

	## 🤝 Contributing

	Contributions are welcome! Please read `CONTRIBUTING.md` for details on our code of conduct and the process for submitting pull requests.

	## 📜 License

	This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

	---

	<div align="center">
	<b>Built for modern ML teams.</b><br>
	<i>Replace weeks of manual engineering with minutes of automated ops.</i>
	</div>