Spaces:
Configuration error
Configuration error
| # π€ Auto-FineTune-Ops | |
| > **Autonomous End-to-End LLM Fine-Tuning Pipeline** | |
| > | |
| > From raw data to production API in one click. No ML expertise required. | |
| [](https://www.python.org/) | |
| [](https://streamlit.io/) | |
| [](LICENSE) | |
| --- | |
| ## π― What Is This? | |
| Auto-FineTune-Ops is a **no-code/low-code platform** that automates the entire lifecycle of fine-tuning Large Language Models (LLMs). It handles: | |
| 1. **Data Ingestion:** Upload CSV, JSON, or JSONL files. | |
| 2. **Advanced Preprocessing:** 10+ modules for cleaning, PII redaction, deduplication, and formatting. | |
| 3. **Hybrid Training:** Train locally on GPU (Unsloth/LoRA) or generate a **Google Colab Notebook** for free cloud GPU training. | |
| 4. **AI Judge Evaluation:** Compare your fine-tuned model against the base model using GPT-4, Claude 3.5, Gemini, or Groq as a judge. | |
| 5. **One-Click Deployment:** Export your trained model as a production-ready FastAPI endpoint. | |
| **All accessible via a premium, easy-to-use Streamlit Dashboard.** | |
| --- | |
| ## β¨ Key Features | |
| ### π§ Intelligent Preprocessing | |
| - **Text Cleaning:** Remove HTML, URLs, emojis, normalize whitespace. | |
| - **PII Filter:** Redact emails, phone numbers, API keys. | |
| - **Deduplication:** Remove exact and semantic (TF-IDF) duplicates. | |
| - **Quality Filters:** Filter by length, language, toxicity. | |
| - **Balancing:** Oversample/undersample classes for classification tasks. | |
| - **Export Formats:** Auto-convert to OpenAI Chat, Completion, or Classification JSONL formats. | |
| ### β‘ Flexible Training Workflows | |
| - **Local GPU:** Uses **Unsloth** for ultra-fast 4-bit LoRA fine-tuning (2x faster, 70% less memory). | |
| - **Google Colab Fallback:** Don't have a GPU? The app generates a ready-to-run Colab notebook for you. Download models back to the app for evaluation. | |
| - **Custom Models:** Fine-tune any HuggingFace model (Llama 3, Mistral, Gemma, Phi-3, etc.). | |
| ### βοΈ Multi-Provider AI Judge | |
| Evaluate models head-to-head using: | |
| - **OpenAI** (GPT-4o, GPT-4-turbo) | |
| - **Anthropic** (Claude 3.5 Sonnet, Opus) | |
| - **Google** (Gemini 1.5 Pro) | |
| - **Groq** (Llama 3, Mixtral) | |
| - **Custom Endpoints** (Ollama, vLLM) | |
| --- | |
| ## π Quick Start | |
| ### 1. Installation | |
| ```bash | |
| # Clone the repository | |
| git clone https://github.com/your-username/Auto-FineTune-Ops.git | |
| cd Auto-FineTune-Ops | |
| # Create a virtual environment | |
| python -m venv venv | |
| # Windows: | |
| .\venv\Scripts\activate | |
| # Mac/Linux: | |
| source venv/bin/activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Launch the Dashboard | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| Open your browser to the URL shown (usually `http://localhost:8501`). | |
| --- | |
| ## π οΈ Workflow Guide | |
| ### Step 1: Data Upload | |
| - Upload your raw `CSV` or `JSON` file containing instruction-response pairs. | |
| - The app automatically detects columns like `instruction`, `input`, `output`. | |
| - Preview full dataset with pagination. | |
| ### Step 2: Preprocessing | |
| - Configure cleaning rules (HTML removal, lowercase, etc.). | |
| - Set PII filters (mask emails/phones). | |
| - Enable semantic deduplication. | |
| - Click **Run Pipeline** to clean and format your data. | |
| ### Step 3: Training | |
| - **If you have a GPU:** Select a base model (e.g., Llama-3-8b) and click **Start Training**. | |
| - **If you have no GPU:** | |
| 1. Download the preprocessed data. | |
| 2. Download the generated `Colab Notebook`. | |
| 3. Run training on Google Colab (Free Tier). | |
| 4. Upload the fine-tuned model results back to the app. | |
| ### Step 4: Evaluation | |
| - Compare your fine-tuned model vs. the base model. | |
| - Select an AI Judge (e.g., GPT-4o). | |
| - Visualize win rates and quality scores (Accuracy, Helpfulness, Tone). | |
| ### Step 5: Deployment | |
| - Deploy your model locally as a REST API: | |
| ```bash | |
| python scripts/deploy.py --model ./output/models/your_model --port 8000 | |
| ``` | |
| - Or push to HuggingFace Hub directly from the dashboard. | |
| --- | |
| ## ποΈ Project Structure | |
| ``` | |
| ml_oops/ | |
| βββ app.py # π Main Streamlit Dashboard | |
| βββ main.py # π§ CLI Orchestrator (Headless mode) | |
| βββ requirements.txt # Dependencies | |
| βββ agents/ # Core Logic Agents | |
| β βββ data_architect.py # Data Analysis & Cleaning | |
| β βββ training_pilot.py # Fine-Tuning Logic | |
| β βββ the_judge.py # Evaluation Logic | |
| βββ preprocessing/ # Advanced Preprocessing Modules | |
| β βββ text_cleaning.py # Regex & Normalization | |
| β βββ pii_filter.py # PII Redaction | |
| β βββ deduplication.py # Semantic Dedupe | |
| β βββ ... | |
| βββ configs/ # Configuration Files | |
| βββ output/ # Artifacts (Models, Logs, Reports) | |
| ``` | |
| --- | |
| ## live App | |
| https://aneebnaqvi15-auto-finetune-ops-app-1xmv11.streamlit.app/ | |
| ## π€ Contributing | |
| Contributions are welcome! Please read `CONTRIBUTING.md` for details on our code of conduct and the process for submitting pull requests. | |
| ## π License | |
| This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. | |
| --- | |
| <div align="center"> | |
| <b>Built for modern ML teams.</b><br> | |
| <i>Replace weeks of manual engineering with minutes of automated ops.</i> | |
| </div> | |