Auto-FineTune-Ops / README.md
aneeb15's picture
Revise README content and add live app link
941865d unverified

πŸ€– Auto-FineTune-Ops

Autonomous End-to-End LLM Fine-Tuning Pipeline

From raw data to production API in one click. No ML expertise required.

Python 3.10+ Streamlit License: MIT


🎯 What Is This?

Auto-FineTune-Ops is a no-code/low-code platform that automates the entire lifecycle of fine-tuning Large Language Models (LLMs). It handles:

  1. Data Ingestion: Upload CSV, JSON, or JSONL files.
  2. Advanced Preprocessing: 10+ modules for cleaning, PII redaction, deduplication, and formatting.
  3. Hybrid Training: Train locally on GPU (Unsloth/LoRA) or generate a Google Colab Notebook for free cloud GPU training.
  4. AI Judge Evaluation: Compare your fine-tuned model against the base model using GPT-4, Claude 3.5, Gemini, or Groq as a judge.
  5. One-Click Deployment: Export your trained model as a production-ready FastAPI endpoint.

All accessible via a premium, easy-to-use Streamlit Dashboard.


✨ Key Features

🧠 Intelligent Preprocessing

  • Text Cleaning: Remove HTML, URLs, emojis, normalize whitespace.
  • PII Filter: Redact emails, phone numbers, API keys.
  • Deduplication: Remove exact and semantic (TF-IDF) duplicates.
  • Quality Filters: Filter by length, language, toxicity.
  • Balancing: Oversample/undersample classes for classification tasks.
  • Export Formats: Auto-convert to OpenAI Chat, Completion, or Classification JSONL formats.

⚑ Flexible Training Workflows

  • Local GPU: Uses Unsloth for ultra-fast 4-bit LoRA fine-tuning (2x faster, 70% less memory).
  • Google Colab Fallback: Don't have a GPU? The app generates a ready-to-run Colab notebook for you. Download models back to the app for evaluation.
  • Custom Models: Fine-tune any HuggingFace model (Llama 3, Mistral, Gemma, Phi-3, etc.).

βš–οΈ Multi-Provider AI Judge

Evaluate models head-to-head using:

  • OpenAI (GPT-4o, GPT-4-turbo)
  • Anthropic (Claude 3.5 Sonnet, Opus)
  • Google (Gemini 1.5 Pro)
  • Groq (Llama 3, Mixtral)
  • Custom Endpoints (Ollama, vLLM)

πŸš€ Quick Start

1. Installation

# Clone the repository
git clone https://github.com/your-username/Auto-FineTune-Ops.git
cd Auto-FineTune-Ops

# Create a virtual environment
python -m venv venv
# Windows:
.\venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Launch the Dashboard

streamlit run app.py

Open your browser to the URL shown (usually http://localhost:8501).


πŸ› οΈ Workflow Guide

Step 1: Data Upload

  • Upload your raw CSV or JSON file containing instruction-response pairs.
  • The app automatically detects columns like instruction, input, output.
  • Preview full dataset with pagination.

Step 2: Preprocessing

  • Configure cleaning rules (HTML removal, lowercase, etc.).
  • Set PII filters (mask emails/phones).
  • Enable semantic deduplication.
  • Click Run Pipeline to clean and format your data.

Step 3: Training

  • If you have a GPU: Select a base model (e.g., Llama-3-8b) and click Start Training.
  • If you have no GPU:
    1. Download the preprocessed data.
    2. Download the generated Colab Notebook.
    3. Run training on Google Colab (Free Tier).
    4. Upload the fine-tuned model results back to the app.

Step 4: Evaluation

  • Compare your fine-tuned model vs. the base model.
  • Select an AI Judge (e.g., GPT-4o).
  • Visualize win rates and quality scores (Accuracy, Helpfulness, Tone).

Step 5: Deployment

  • Deploy your model locally as a REST API:
    python scripts/deploy.py --model ./output/models/your_model --port 8000
    
  • Or push to HuggingFace Hub directly from the dashboard.

πŸ—οΈ Project Structure

ml_oops/
β”œβ”€β”€ app.py                     # πŸš€ Main Streamlit Dashboard
β”œβ”€β”€ main.py                    # 🧠 CLI Orchestrator (Headless mode)
β”œβ”€β”€ requirements.txt           # Dependencies
β”œβ”€β”€ agents/                    # Core Logic Agents
β”‚   β”œβ”€β”€ data_architect.py      # Data Analysis & Cleaning
β”‚   β”œβ”€β”€ training_pilot.py      # Fine-Tuning Logic
β”‚   └── the_judge.py           # Evaluation Logic
β”œβ”€β”€ preprocessing/             # Advanced Preprocessing Modules
β”‚   β”œβ”€β”€ text_cleaning.py       # Regex & Normalization
β”‚   β”œβ”€β”€ pii_filter.py          # PII Redaction
β”‚   β”œβ”€β”€ deduplication.py       # Semantic Dedupe
β”‚   └── ...
β”œβ”€β”€ configs/                   # Configuration Files
└── output/                    # Artifacts (Models, Logs, Reports)

live App

https://aneebnaqvi15-auto-finetune-ops-app-1xmv11.streamlit.app/

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


Built for modern ML teams.
Replace weeks of manual engineering with minutes of automated ops.