# 🤖 Auto-FineTune-Ops > **Autonomous End-to-End LLM Fine-Tuning Pipeline** > > From raw data to production API in one click. No ML expertise required. [![Python 3.10+](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/) [![Streamlit](https://img.shields.io/badge/Streamlit-1.32+-red.svg)](https://streamlit.io/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE) --- ## 🎯 What Is This? Auto-FineTune-Ops is a **no-code/low-code platform** that automates the entire lifecycle of fine-tuning Large Language Models (LLMs). It handles: 1. **Data Ingestion:** Upload CSV, JSON, or JSONL files. 2. **Advanced Preprocessing:** 10+ modules for cleaning, PII redaction, deduplication, and formatting. 3. **Hybrid Training:** Train locally on GPU (Unsloth/LoRA) or generate a **Google Colab Notebook** for free cloud GPU training. 4. **AI Judge Evaluation:** Compare your fine-tuned model against the base model using GPT-4, Claude 3.5, Gemini, or Groq as a judge. 5. **One-Click Deployment:** Export your trained model as a production-ready FastAPI endpoint. **All accessible via a premium, easy-to-use Streamlit Dashboard.** --- ## ✨ Key Features ### 🧠 Intelligent Preprocessing - **Text Cleaning:** Remove HTML, URLs, emojis, normalize whitespace. - **PII Filter:** Redact emails, phone numbers, API keys. - **Deduplication:** Remove exact and semantic (TF-IDF) duplicates. - **Quality Filters:** Filter by length, language, toxicity. - **Balancing:** Oversample/undersample classes for classification tasks. - **Export Formats:** Auto-convert to OpenAI Chat, Completion, or Classification JSONL formats. ### ⚡ Flexible Training Workflows - **Local GPU:** Uses **Unsloth** for ultra-fast 4-bit LoRA fine-tuning (2x faster, 70% less memory). - **Google Colab Fallback:** Don't have a GPU? The app generates a ready-to-run Colab notebook for you. Download models back to the app for evaluation. - **Custom Models:** Fine-tune any HuggingFace model (Llama 3, Mistral, Gemma, Phi-3, etc.). ### ⚖️ Multi-Provider AI Judge Evaluate models head-to-head using: - **OpenAI** (GPT-4o, GPT-4-turbo) - **Anthropic** (Claude 3.5 Sonnet, Opus) - **Google** (Gemini 1.5 Pro) - **Groq** (Llama 3, Mixtral) - **Custom Endpoints** (Ollama, vLLM) --- ## 🚀 Quick Start ### 1. Installation ```bash # Clone the repository git clone https://github.com/your-username/Auto-FineTune-Ops.git cd Auto-FineTune-Ops # Create a virtual environment python -m venv venv # Windows: .\venv\Scripts\activate # Mac/Linux: source venv/bin/activate # Install dependencies pip install -r requirements.txt ``` ### 2. Launch the Dashboard ```bash streamlit run app.py ``` Open your browser to the URL shown (usually `http://localhost:8501`). --- ## 🛠️ Workflow Guide ### Step 1: Data Upload - Upload your raw `CSV` or `JSON` file containing instruction-response pairs. - The app automatically detects columns like `instruction`, `input`, `output`. - Preview full dataset with pagination. ### Step 2: Preprocessing - Configure cleaning rules (HTML removal, lowercase, etc.). - Set PII filters (mask emails/phones). - Enable semantic deduplication. - Click **Run Pipeline** to clean and format your data. ### Step 3: Training - **If you have a GPU:** Select a base model (e.g., Llama-3-8b) and click **Start Training**. - **If you have no GPU:** 1. Download the preprocessed data. 2. Download the generated `Colab Notebook`. 3. Run training on Google Colab (Free Tier). 4. Upload the fine-tuned model results back to the app. ### Step 4: Evaluation - Compare your fine-tuned model vs. the base model. - Select an AI Judge (e.g., GPT-4o). - Visualize win rates and quality scores (Accuracy, Helpfulness, Tone). ### Step 5: Deployment - Deploy your model locally as a REST API: ```bash python scripts/deploy.py --model ./output/models/your_model --port 8000 ``` - Or push to HuggingFace Hub directly from the dashboard. --- ## 🏗️ Project Structure ``` ml_oops/ ├── app.py # 🚀 Main Streamlit Dashboard ├── main.py # 🧠 CLI Orchestrator (Headless mode) ├── requirements.txt # Dependencies ├── agents/ # Core Logic Agents │ ├── data_architect.py # Data Analysis & Cleaning │ ├── training_pilot.py # Fine-Tuning Logic │ └── the_judge.py # Evaluation Logic ├── preprocessing/ # Advanced Preprocessing Modules │ ├── text_cleaning.py # Regex & Normalization │ ├── pii_filter.py # PII Redaction │ ├── deduplication.py # Semantic Dedupe │ └── ... ├── configs/ # Configuration Files └── output/ # Artifacts (Models, Logs, Reports) ``` --- ## live App https://aneebnaqvi15-auto-finetune-ops-app-1xmv11.streamlit.app/ ## 🤝 Contributing Contributions are welcome! Please read `CONTRIBUTING.md` for details on our code of conduct and the process for submitting pull requests. ## 📜 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ---
Built for modern ML teams.
Replace weeks of manual engineering with minutes of automated ops.