Spaces:

Abeshith
/

AutoML_MLOps_PipeLine

Running

App Files Files Community

AutoML_MLOps_PipeLine / README.md

Abeshith

Upload folder using huggingface_hub

11fba5d verified 3 months ago

preview code

raw

history blame contribute delete

5.73 kB

	---
	title: AutoML MLOps Pipeline
	emoji: 🤖
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 8000
	pinned: false
	license: mit
	---

	# 🤖 AutoML MLOps Pipeline

	Production-ready end-to-end AutoML pipeline with MLflow tracking, comprehensive monitoring, and automated orchestration.

	[![CI Pipeline](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml)
	[![Docker Build](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml)

	## 🚀 Features

	- 🤖 AutoML: AutoGluon, FLAML, PyCaret integration
	- 📊 MLflow Tracking: DagsHub integration with comprehensive metrics
	- 🔍 Monitoring: Drift detection, prediction logging, performance tracking
	- 📈 Observability: Prometheus metrics & Grafana dashboards
	- 🔄 Orchestration: Airflow DAGs for automated scheduling
	- 🐳 Docker: Complete containerization with docker-compose
	- ⚡ FastAPI: RESTful API with 11+ endpoints
	- 🎯 CI/CD: GitHub Actions for automated testing and deployment

	## 📋 Pipeline Stages

	1. Data Ingestion - Load and validate dataset
	2. Data Validation - Schema validation and quality checks
	3. Data Transformation - Feature engineering and preprocessing
	4. AutoML Training - Multi-framework model training
	5. Model Evaluation - Comprehensive metrics and validation
	6. Model Comparison - Best model selection
	7. Model Pusher - Production model deployment

	## 🛠️ Tech Stack

	- ML Frameworks: AutoGluon, FLAML, PyCaret
	- API: FastAPI, Uvicorn
	- Tracking: MLflow, DagsHub
	- Monitoring: Prometheus, Grafana, Evidently AI
	- Orchestration: Apache Airflow
	- Containerization: Docker, Docker Compose
	- CI/CD: GitHub Actions

	## 📦 Quick Start

	### Local Development

	```bash
	# Clone repository
	git clone https://github.com/Abeshith/AutoML-MLOps-PipeLine.git
	cd AutoML-MLOps-PipeLine

	# Create virtual environment
	python -m venv automlenv
	source automlenv/bin/activate # On Windows: automlenv\Scripts\activate

	# Install dependencies
	pip install -r requirements.txt

	# Set environment variables
	cp .env.example .env
	# Edit .env with your credentials

	# Run training pipeline
	python scripts/train.py

	# Start API server
	python scripts/serve.py --reload
	```

	### Docker Deployment

	```bash
	# Start all services
	docker-compose up -d

	# Access services
	# API: http://localhost:8000/docs
	# Prometheus: http://localhost:9090
	# Grafana: http://localhost:3000 (admin/admin)
	```

	## 🌐 API Endpoints

	### Prediction
	```bash
	POST /predict
	{
	"age": 45,
	"sex": 1,
	"cp": 2,
	"trestbps": 130,
	"chol": 250,
	"fbs": 0,
	"restecg": 1,
	"thalach": 150,
	"exang": 0,
	"oldpeak": 2.5,
	"slope": 2,
	"ca": 0,
	"thal": 2
	}
	```

	### Training
	```bash
	POST /train
	GET /train/status
	```

	### Monitoring
	```bash
	GET /monitoring/metrics # Prometheus metrics
	GET /monitoring/health/drift # Drift detection status
	GET /monitoring/performance/summary
	GET /monitoring/reports/daily
	```

	## 📊 Model Performance

	- Validation Accuracy: 88.84%
	- Test Accuracy: 88.68%
	- ROC-AUC: 95.48%
	- Best Model: WeightedEnsemble_L3

	## 🔧 Utility Scripts

	```bash
	# Train model
	python scripts/train.py

	# Evaluate model
	python scripts/evaluate.py --model-path <path>

	# Start API server
	python scripts/serve.py --host 0.0.0.0 --port 8000 --reload

	# Initialize Airflow
	python scripts/init_db.py
	```

	## 🔄 Airflow Orchestration

	```bash
	# Set AIRFLOW_HOME
	export AIRFLOW_HOME=$(pwd)/airflow

	# Initialize database
	python scripts/init_db.py

	# Start services
	airflow scheduler # Terminal 1
	airflow webserver # Terminal 2

	# Access UI: http://localhost:8080
	```

	## 📈 Monitoring Stack

	- Drift Detection: KS test for numerical features
	- Prediction Logging: JSONL format with threading
	- Performance Tracking: Batch-level metrics
	- Report Generation: Daily/weekly JSON reports
	- Prometheus Metrics: Request count, latency, accuracy, drift status
	- Grafana Dashboards: 5-panel visualization

	## 🐳 Docker Services

	- FastAPI App (8000): Main ML API
	- Prometheus (9090): Metrics collection
	- Grafana (3000): Visualization dashboards

	## 🔐 Environment Variables

	```env
	MLFLOW_TRACKING_URI=your_dagshub_uri
	DAGSHUB_TOKEN=your_token
	```

	## 📚 Documentation

	- [Docker Setup](DOCKER.md)
	- [Scripts Usage](scripts/README.md)
	- [CI/CD Workflows](.github/workflows/README.md)
	- [Airflow Guide](airflow/README.md)

	## 🧪 CI/CD Pipeline

	### Automated Workflows
	- CI: Lint with flake8, format check with black
	- Docker Build: Build and push to GitHub Container Registry
	- HuggingFace Deploy: Auto-deploy to Spaces on push

	### Container Images
	```bash
	docker pull ghcr.io/abeshith/automl-mlops-pipeline:latest
	```

	## 📊 Project Structure

	```
	AutoML-MLOps-PipeLine/
	├── src/mlpipeline/ # Core pipeline components
	├── app/ # FastAPI application
	├── config/ # Configuration files
	├── scripts/ # Utility scripts
	├── airflow/ # Airflow DAGs
	├── monitoring/ # Monitoring components
	├── observability/ # Prometheus/Grafana configs
	├── notebooks/ # Jupyter notebooks
	├── Dockerfile # Container definition
	├── docker-compose.yaml # Multi-service orchestration
	└── requirements.txt # Python dependencies
	```

	⭐ Star this repo if you find it helpful!