--- title: AutoML MLOps Pipeline emoji: ๐Ÿค– colorFrom: blue colorTo: green sdk: docker app_port: 8000 pinned: false license: mit --- # ๐Ÿค– AutoML MLOps Pipeline Production-ready end-to-end AutoML pipeline with MLflow tracking, comprehensive monitoring, and automated orchestration. [![CI Pipeline](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml) [![Docker Build](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml) ## ๐Ÿš€ Features - **๐Ÿค– AutoML**: AutoGluon, FLAML, PyCaret integration - **๐Ÿ“Š MLflow Tracking**: DagsHub integration with comprehensive metrics - **๐Ÿ” Monitoring**: Drift detection, prediction logging, performance tracking - **๐Ÿ“ˆ Observability**: Prometheus metrics & Grafana dashboards - **๐Ÿ”„ Orchestration**: Airflow DAGs for automated scheduling - **๐Ÿณ Docker**: Complete containerization with docker-compose - **โšก FastAPI**: RESTful API with 11+ endpoints - **๐ŸŽฏ CI/CD**: GitHub Actions for automated testing and deployment ## ๐Ÿ“‹ Pipeline Stages 1. **Data Ingestion** - Load and validate dataset 2. **Data Validation** - Schema validation and quality checks 3. **Data Transformation** - Feature engineering and preprocessing 4. **AutoML Training** - Multi-framework model training 5. **Model Evaluation** - Comprehensive metrics and validation 6. **Model Comparison** - Best model selection 7. **Model Pusher** - Production model deployment ## ๐Ÿ› ๏ธ Tech Stack - **ML Frameworks**: AutoGluon, FLAML, PyCaret - **API**: FastAPI, Uvicorn - **Tracking**: MLflow, DagsHub - **Monitoring**: Prometheus, Grafana, Evidently AI - **Orchestration**: Apache Airflow - **Containerization**: Docker, Docker Compose - **CI/CD**: GitHub Actions ## ๐Ÿ“ฆ Quick Start ### Local Development ```bash # Clone repository git clone https://github.com/Abeshith/AutoML-MLOps-PipeLine.git cd AutoML-MLOps-PipeLine # Create virtual environment python -m venv automlenv source automlenv/bin/activate # On Windows: automlenv\Scripts\activate # Install dependencies pip install -r requirements.txt # Set environment variables cp .env.example .env # Edit .env with your credentials # Run training pipeline python scripts/train.py # Start API server python scripts/serve.py --reload ``` ### Docker Deployment ```bash # Start all services docker-compose up -d # Access services # API: http://localhost:8000/docs # Prometheus: http://localhost:9090 # Grafana: http://localhost:3000 (admin/admin) ``` ## ๐ŸŒ API Endpoints ### Prediction ```bash POST /predict { "age": 45, "sex": 1, "cp": 2, "trestbps": 130, "chol": 250, "fbs": 0, "restecg": 1, "thalach": 150, "exang": 0, "oldpeak": 2.5, "slope": 2, "ca": 0, "thal": 2 } ``` ### Training ```bash POST /train GET /train/status ``` ### Monitoring ```bash GET /monitoring/metrics # Prometheus metrics GET /monitoring/health/drift # Drift detection status GET /monitoring/performance/summary GET /monitoring/reports/daily ``` ## ๐Ÿ“Š Model Performance - **Validation Accuracy**: 88.84% - **Test Accuracy**: 88.68% - **ROC-AUC**: 95.48% - **Best Model**: WeightedEnsemble_L3 ## ๐Ÿ”ง Utility Scripts ```bash # Train model python scripts/train.py # Evaluate model python scripts/evaluate.py --model-path # Start API server python scripts/serve.py --host 0.0.0.0 --port 8000 --reload # Initialize Airflow python scripts/init_db.py ``` ## ๐Ÿ”„ Airflow Orchestration ```bash # Set AIRFLOW_HOME export AIRFLOW_HOME=$(pwd)/airflow # Initialize database python scripts/init_db.py # Start services airflow scheduler # Terminal 1 airflow webserver # Terminal 2 # Access UI: http://localhost:8080 ``` ## ๐Ÿ“ˆ Monitoring Stack - **Drift Detection**: KS test for numerical features - **Prediction Logging**: JSONL format with threading - **Performance Tracking**: Batch-level metrics - **Report Generation**: Daily/weekly JSON reports - **Prometheus Metrics**: Request count, latency, accuracy, drift status - **Grafana Dashboards**: 5-panel visualization ## ๐Ÿณ Docker Services - **FastAPI App** (8000): Main ML API - **Prometheus** (9090): Metrics collection - **Grafana** (3000): Visualization dashboards ## ๐Ÿ” Environment Variables ```env MLFLOW_TRACKING_URI=your_dagshub_uri DAGSHUB_TOKEN=your_token ``` ## ๐Ÿ“š Documentation - [Docker Setup](DOCKER.md) - [Scripts Usage](scripts/README.md) - [CI/CD Workflows](.github/workflows/README.md) - [Airflow Guide](airflow/README.md) ## ๐Ÿงช CI/CD Pipeline ### Automated Workflows - **CI**: Lint with flake8, format check with black - **Docker Build**: Build and push to GitHub Container Registry - **HuggingFace Deploy**: Auto-deploy to Spaces on push ### Container Images ```bash docker pull ghcr.io/abeshith/automl-mlops-pipeline:latest ``` ## ๐Ÿ“Š Project Structure ``` AutoML-MLOps-PipeLine/ โ”œโ”€โ”€ src/mlpipeline/ # Core pipeline components โ”œโ”€โ”€ app/ # FastAPI application โ”œโ”€โ”€ config/ # Configuration files โ”œโ”€โ”€ scripts/ # Utility scripts โ”œโ”€โ”€ airflow/ # Airflow DAGs โ”œโ”€โ”€ monitoring/ # Monitoring components โ”œโ”€โ”€ observability/ # Prometheus/Grafana configs โ”œโ”€โ”€ notebooks/ # Jupyter notebooks โ”œโ”€โ”€ Dockerfile # Container definition โ”œโ”€โ”€ docker-compose.yaml # Multi-service orchestration โ””โ”€โ”€ requirements.txt # Python dependencies ``` โญ Star this repo if you find it helpful!