Spaces:
Running
Running
| title: AutoML MLOps Pipeline | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 8000 | |
| pinned: false | |
| license: mit | |
| # π€ AutoML MLOps Pipeline | |
| Production-ready end-to-end AutoML pipeline with MLflow tracking, comprehensive monitoring, and automated orchestration. | |
| [](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml) | |
| [](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml) | |
| ## π Features | |
| - **π€ AutoML**: AutoGluon, FLAML, PyCaret integration | |
| - **π MLflow Tracking**: DagsHub integration with comprehensive metrics | |
| - **π Monitoring**: Drift detection, prediction logging, performance tracking | |
| - **π Observability**: Prometheus metrics & Grafana dashboards | |
| - **π Orchestration**: Airflow DAGs for automated scheduling | |
| - **π³ Docker**: Complete containerization with docker-compose | |
| - **β‘ FastAPI**: RESTful API with 11+ endpoints | |
| - **π― CI/CD**: GitHub Actions for automated testing and deployment | |
| ## π Pipeline Stages | |
| 1. **Data Ingestion** - Load and validate dataset | |
| 2. **Data Validation** - Schema validation and quality checks | |
| 3. **Data Transformation** - Feature engineering and preprocessing | |
| 4. **AutoML Training** - Multi-framework model training | |
| 5. **Model Evaluation** - Comprehensive metrics and validation | |
| 6. **Model Comparison** - Best model selection | |
| 7. **Model Pusher** - Production model deployment | |
| ## π οΈ Tech Stack | |
| - **ML Frameworks**: AutoGluon, FLAML, PyCaret | |
| - **API**: FastAPI, Uvicorn | |
| - **Tracking**: MLflow, DagsHub | |
| - **Monitoring**: Prometheus, Grafana, Evidently AI | |
| - **Orchestration**: Apache Airflow | |
| - **Containerization**: Docker, Docker Compose | |
| - **CI/CD**: GitHub Actions | |
| ## π¦ Quick Start | |
| ### Local Development | |
| ```bash | |
| # Clone repository | |
| git clone https://github.com/Abeshith/AutoML-MLOps-PipeLine.git | |
| cd AutoML-MLOps-PipeLine | |
| # Create virtual environment | |
| python -m venv automlenv | |
| source automlenv/bin/activate # On Windows: automlenv\Scripts\activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Set environment variables | |
| cp .env.example .env | |
| # Edit .env with your credentials | |
| # Run training pipeline | |
| python scripts/train.py | |
| # Start API server | |
| python scripts/serve.py --reload | |
| ``` | |
| ### Docker Deployment | |
| ```bash | |
| # Start all services | |
| docker-compose up -d | |
| # Access services | |
| # API: http://localhost:8000/docs | |
| # Prometheus: http://localhost:9090 | |
| # Grafana: http://localhost:3000 (admin/admin) | |
| ``` | |
| ## π API Endpoints | |
| ### Prediction | |
| ```bash | |
| POST /predict | |
| { | |
| "age": 45, | |
| "sex": 1, | |
| "cp": 2, | |
| "trestbps": 130, | |
| "chol": 250, | |
| "fbs": 0, | |
| "restecg": 1, | |
| "thalach": 150, | |
| "exang": 0, | |
| "oldpeak": 2.5, | |
| "slope": 2, | |
| "ca": 0, | |
| "thal": 2 | |
| } | |
| ``` | |
| ### Training | |
| ```bash | |
| POST /train | |
| GET /train/status | |
| ``` | |
| ### Monitoring | |
| ```bash | |
| GET /monitoring/metrics # Prometheus metrics | |
| GET /monitoring/health/drift # Drift detection status | |
| GET /monitoring/performance/summary | |
| GET /monitoring/reports/daily | |
| ``` | |
| ## π Model Performance | |
| - **Validation Accuracy**: 88.84% | |
| - **Test Accuracy**: 88.68% | |
| - **ROC-AUC**: 95.48% | |
| - **Best Model**: WeightedEnsemble_L3 | |
| ## π§ Utility Scripts | |
| ```bash | |
| # Train model | |
| python scripts/train.py | |
| # Evaluate model | |
| python scripts/evaluate.py --model-path <path> | |
| # Start API server | |
| python scripts/serve.py --host 0.0.0.0 --port 8000 --reload | |
| # Initialize Airflow | |
| python scripts/init_db.py | |
| ``` | |
| ## π Airflow Orchestration | |
| ```bash | |
| # Set AIRFLOW_HOME | |
| export AIRFLOW_HOME=$(pwd)/airflow | |
| # Initialize database | |
| python scripts/init_db.py | |
| # Start services | |
| airflow scheduler # Terminal 1 | |
| airflow webserver # Terminal 2 | |
| # Access UI: http://localhost:8080 | |
| ``` | |
| ## π Monitoring Stack | |
| - **Drift Detection**: KS test for numerical features | |
| - **Prediction Logging**: JSONL format with threading | |
| - **Performance Tracking**: Batch-level metrics | |
| - **Report Generation**: Daily/weekly JSON reports | |
| - **Prometheus Metrics**: Request count, latency, accuracy, drift status | |
| - **Grafana Dashboards**: 5-panel visualization | |
| ## π³ Docker Services | |
| - **FastAPI App** (8000): Main ML API | |
| - **Prometheus** (9090): Metrics collection | |
| - **Grafana** (3000): Visualization dashboards | |
| ## π Environment Variables | |
| ```env | |
| MLFLOW_TRACKING_URI=your_dagshub_uri | |
| DAGSHUB_TOKEN=your_token | |
| ``` | |
| ## π Documentation | |
| - [Docker Setup](DOCKER.md) | |
| - [Scripts Usage](scripts/README.md) | |
| - [CI/CD Workflows](.github/workflows/README.md) | |
| - [Airflow Guide](airflow/README.md) | |
| ## π§ͺ CI/CD Pipeline | |
| ### Automated Workflows | |
| - **CI**: Lint with flake8, format check with black | |
| - **Docker Build**: Build and push to GitHub Container Registry | |
| - **HuggingFace Deploy**: Auto-deploy to Spaces on push | |
| ### Container Images | |
| ```bash | |
| docker pull ghcr.io/abeshith/automl-mlops-pipeline:latest | |
| ``` | |
| ## π Project Structure | |
| ``` | |
| AutoML-MLOps-PipeLine/ | |
| βββ src/mlpipeline/ # Core pipeline components | |
| βββ app/ # FastAPI application | |
| βββ config/ # Configuration files | |
| βββ scripts/ # Utility scripts | |
| βββ airflow/ # Airflow DAGs | |
| βββ monitoring/ # Monitoring components | |
| βββ observability/ # Prometheus/Grafana configs | |
| βββ notebooks/ # Jupyter notebooks | |
| βββ Dockerfile # Container definition | |
| βββ docker-compose.yaml # Multi-service orchestration | |
| βββ requirements.txt # Python dependencies | |
| ``` | |
| β Star this repo if you find it helpful! | |