Spaces:

aneeb15
/

Auto-FineTune-Ops

Configuration error

App Files Files Community

Auto-FineTune-Ops / README.md

aneeb15

Revise README content and add live app link

941865d unverified 4 months ago

preview code

raw

history blame contribute delete

5.55 kB

🤖 Auto-FineTune-Ops

Autonomous End-to-End LLM Fine-Tuning Pipeline

From raw data to production API in one click. No ML expertise required.

🎯 What Is This?

Auto-FineTune-Ops is a no-code/low-code platform that automates the entire lifecycle of fine-tuning Large Language Models (LLMs). It handles:

Data Ingestion: Upload CSV, JSON, or JSONL files.
Advanced Preprocessing: 10+ modules for cleaning, PII redaction, deduplication, and formatting.
Hybrid Training: Train locally on GPU (Unsloth/LoRA) or generate a Google Colab Notebook for free cloud GPU training.
AI Judge Evaluation: Compare your fine-tuned model against the base model using GPT-4, Claude 3.5, Gemini, or Groq as a judge.
One-Click Deployment: Export your trained model as a production-ready FastAPI endpoint.

All accessible via a premium, easy-to-use Streamlit Dashboard.

✨ Key Features

🧠 Intelligent Preprocessing

Text Cleaning: Remove HTML, URLs, emojis, normalize whitespace.
PII Filter: Redact emails, phone numbers, API keys.
Deduplication: Remove exact and semantic (TF-IDF) duplicates.
Quality Filters: Filter by length, language, toxicity.
Balancing: Oversample/undersample classes for classification tasks.
Export Formats: Auto-convert to OpenAI Chat, Completion, or Classification JSONL formats.

⚡ Flexible Training Workflows

Local GPU: Uses Unsloth for ultra-fast 4-bit LoRA fine-tuning (2x faster, 70% less memory).
Google Colab Fallback: Don't have a GPU? The app generates a ready-to-run Colab notebook for you. Download models back to the app for evaluation.
Custom Models: Fine-tune any HuggingFace model (Llama 3, Mistral, Gemma, Phi-3, etc.).

⚖️ Multi-Provider AI Judge

Evaluate models head-to-head using:

OpenAI (GPT-4o, GPT-4-turbo)
Anthropic (Claude 3.5 Sonnet, Opus)
Google (Gemini 1.5 Pro)
Groq (Llama 3, Mixtral)
Custom Endpoints (Ollama, vLLM)

🚀 Quick Start

1. Installation

# Clone the repository
git clone https://github.com/your-username/Auto-FineTune-Ops.git
cd Auto-FineTune-Ops

# Create a virtual environment
python -m venv venv
# Windows:
.\venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Launch the Dashboard

streamlit run app.py

Open your browser to the URL shown (usually http://localhost:8501).

🛠️ Workflow Guide

Step 1: Data Upload

Upload your raw CSV or JSON file containing instruction-response pairs.
The app automatically detects columns like instruction, input, output.
Preview full dataset with pagination.

Step 2: Preprocessing

Configure cleaning rules (HTML removal, lowercase, etc.).
Set PII filters (mask emails/phones).
Enable semantic deduplication.
Click Run Pipeline to clean and format your data.

Step 3: Training

If you have a GPU: Select a base model (e.g., Llama-3-8b) and click Start Training.
If you have no GPU:
1. Download the preprocessed data.
2. Download the generated Colab Notebook.
3. Run training on Google Colab (Free Tier).
4. Upload the fine-tuned model results back to the app.

Step 4: Evaluation

Compare your fine-tuned model vs. the base model.
Select an AI Judge (e.g., GPT-4o).
Visualize win rates and quality scores (Accuracy, Helpfulness, Tone).

Step 5: Deployment

Deploy your model locally as a REST API:

python scripts/deploy.py --model ./output/models/your_model --port 8000

Or push to HuggingFace Hub directly from the dashboard.

🏗️ Project Structure

ml_oops/
├── app.py                     # 🚀 Main Streamlit Dashboard
├── main.py                    # 🧠 CLI Orchestrator (Headless mode)
├── requirements.txt           # Dependencies
├── agents/                    # Core Logic Agents
│   ├── data_architect.py      # Data Analysis & Cleaning
│   ├── training_pilot.py      # Fine-Tuning Logic
│   └── the_judge.py           # Evaluation Logic
├── preprocessing/             # Advanced Preprocessing Modules
│   ├── text_cleaning.py       # Regex & Normalization
│   ├── pii_filter.py          # PII Redaction
│   ├── deduplication.py       # Semantic Dedupe
│   └── ...
├── configs/                   # Configuration Files
└── output/                    # Artifacts (Models, Logs, Reports)

live App

https://aneebnaqvi15-auto-finetune-ops-app-1xmv11.streamlit.app/

🤝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Built for modern ML teams.
Replace weeks of manual engineering with minutes of automated ops.