Abeshith's picture
Upload folder using huggingface_hub
11fba5d verified
---
title: AutoML MLOps Pipeline
emoji: πŸ€–
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
license: mit
---
# πŸ€– AutoML MLOps Pipeline
Production-ready end-to-end AutoML pipeline with MLflow tracking, comprehensive monitoring, and automated orchestration.
[![CI Pipeline](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml)
[![Docker Build](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml)
## πŸš€ Features
- **πŸ€– AutoML**: AutoGluon, FLAML, PyCaret integration
- **πŸ“Š MLflow Tracking**: DagsHub integration with comprehensive metrics
- **πŸ” Monitoring**: Drift detection, prediction logging, performance tracking
- **πŸ“ˆ Observability**: Prometheus metrics & Grafana dashboards
- **πŸ”„ Orchestration**: Airflow DAGs for automated scheduling
- **🐳 Docker**: Complete containerization with docker-compose
- **⚑ FastAPI**: RESTful API with 11+ endpoints
- **🎯 CI/CD**: GitHub Actions for automated testing and deployment
## πŸ“‹ Pipeline Stages
1. **Data Ingestion** - Load and validate dataset
2. **Data Validation** - Schema validation and quality checks
3. **Data Transformation** - Feature engineering and preprocessing
4. **AutoML Training** - Multi-framework model training
5. **Model Evaluation** - Comprehensive metrics and validation
6. **Model Comparison** - Best model selection
7. **Model Pusher** - Production model deployment
## πŸ› οΈ Tech Stack
- **ML Frameworks**: AutoGluon, FLAML, PyCaret
- **API**: FastAPI, Uvicorn
- **Tracking**: MLflow, DagsHub
- **Monitoring**: Prometheus, Grafana, Evidently AI
- **Orchestration**: Apache Airflow
- **Containerization**: Docker, Docker Compose
- **CI/CD**: GitHub Actions
## πŸ“¦ Quick Start
### Local Development
```bash
# Clone repository
git clone https://github.com/Abeshith/AutoML-MLOps-PipeLine.git
cd AutoML-MLOps-PipeLine
# Create virtual environment
python -m venv automlenv
source automlenv/bin/activate # On Windows: automlenv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set environment variables
cp .env.example .env
# Edit .env with your credentials
# Run training pipeline
python scripts/train.py
# Start API server
python scripts/serve.py --reload
```
### Docker Deployment
```bash
# Start all services
docker-compose up -d
# Access services
# API: http://localhost:8000/docs
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
```
## 🌐 API Endpoints
### Prediction
```bash
POST /predict
{
"age": 45,
"sex": 1,
"cp": 2,
"trestbps": 130,
"chol": 250,
"fbs": 0,
"restecg": 1,
"thalach": 150,
"exang": 0,
"oldpeak": 2.5,
"slope": 2,
"ca": 0,
"thal": 2
}
```
### Training
```bash
POST /train
GET /train/status
```
### Monitoring
```bash
GET /monitoring/metrics # Prometheus metrics
GET /monitoring/health/drift # Drift detection status
GET /monitoring/performance/summary
GET /monitoring/reports/daily
```
## πŸ“Š Model Performance
- **Validation Accuracy**: 88.84%
- **Test Accuracy**: 88.68%
- **ROC-AUC**: 95.48%
- **Best Model**: WeightedEnsemble_L3
## πŸ”§ Utility Scripts
```bash
# Train model
python scripts/train.py
# Evaluate model
python scripts/evaluate.py --model-path <path>
# Start API server
python scripts/serve.py --host 0.0.0.0 --port 8000 --reload
# Initialize Airflow
python scripts/init_db.py
```
## πŸ”„ Airflow Orchestration
```bash
# Set AIRFLOW_HOME
export AIRFLOW_HOME=$(pwd)/airflow
# Initialize database
python scripts/init_db.py
# Start services
airflow scheduler # Terminal 1
airflow webserver # Terminal 2
# Access UI: http://localhost:8080
```
## πŸ“ˆ Monitoring Stack
- **Drift Detection**: KS test for numerical features
- **Prediction Logging**: JSONL format with threading
- **Performance Tracking**: Batch-level metrics
- **Report Generation**: Daily/weekly JSON reports
- **Prometheus Metrics**: Request count, latency, accuracy, drift status
- **Grafana Dashboards**: 5-panel visualization
## 🐳 Docker Services
- **FastAPI App** (8000): Main ML API
- **Prometheus** (9090): Metrics collection
- **Grafana** (3000): Visualization dashboards
## πŸ” Environment Variables
```env
MLFLOW_TRACKING_URI=your_dagshub_uri
DAGSHUB_TOKEN=your_token
```
## πŸ“š Documentation
- [Docker Setup](DOCKER.md)
- [Scripts Usage](scripts/README.md)
- [CI/CD Workflows](.github/workflows/README.md)
- [Airflow Guide](airflow/README.md)
## πŸ§ͺ CI/CD Pipeline
### Automated Workflows
- **CI**: Lint with flake8, format check with black
- **Docker Build**: Build and push to GitHub Container Registry
- **HuggingFace Deploy**: Auto-deploy to Spaces on push
### Container Images
```bash
docker pull ghcr.io/abeshith/automl-mlops-pipeline:latest
```
## πŸ“Š Project Structure
```
AutoML-MLOps-PipeLine/
β”œβ”€β”€ src/mlpipeline/ # Core pipeline components
β”œβ”€β”€ app/ # FastAPI application
β”œβ”€β”€ config/ # Configuration files
β”œβ”€β”€ scripts/ # Utility scripts
β”œβ”€β”€ airflow/ # Airflow DAGs
β”œβ”€β”€ monitoring/ # Monitoring components
β”œβ”€β”€ observability/ # Prometheus/Grafana configs
β”œβ”€β”€ notebooks/ # Jupyter notebooks
β”œβ”€β”€ Dockerfile # Container definition
β”œβ”€β”€ docker-compose.yaml # Multi-service orchestration
└── requirements.txt # Python dependencies
```
⭐ Star this repo if you find it helpful!