File size: 5,733 Bytes
f413108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
title: AutoML MLOps Pipeline
emoji: πŸ€–
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: false
license: mit
---

# πŸ€– AutoML MLOps Pipeline

Production-ready end-to-end AutoML pipeline with MLflow tracking, comprehensive monitoring, and automated orchestration.

[![CI Pipeline](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/ci.yaml)
[![Docker Build](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml/badge.svg)](https://github.com/Abeshith/AutoML-MLOps-PipeLine/actions/workflows/docker-build.yaml)

## πŸš€ Features

- **πŸ€– AutoML**: AutoGluon, FLAML, PyCaret integration
- **πŸ“Š MLflow Tracking**: DagsHub integration with comprehensive metrics
- **πŸ” Monitoring**: Drift detection, prediction logging, performance tracking
- **πŸ“ˆ Observability**: Prometheus metrics & Grafana dashboards
- **πŸ”„ Orchestration**: Airflow DAGs for automated scheduling
- **🐳 Docker**: Complete containerization with docker-compose
- **⚑ FastAPI**: RESTful API with 11+ endpoints
- **🎯 CI/CD**: GitHub Actions for automated testing and deployment

## πŸ“‹ Pipeline Stages

1. **Data Ingestion** - Load and validate dataset
2. **Data Validation** - Schema validation and quality checks
3. **Data Transformation** - Feature engineering and preprocessing
4. **AutoML Training** - Multi-framework model training
5. **Model Evaluation** - Comprehensive metrics and validation
6. **Model Comparison** - Best model selection
7. **Model Pusher** - Production model deployment

## πŸ› οΈ Tech Stack

- **ML Frameworks**: AutoGluon, FLAML, PyCaret
- **API**: FastAPI, Uvicorn
- **Tracking**: MLflow, DagsHub
- **Monitoring**: Prometheus, Grafana, Evidently AI
- **Orchestration**: Apache Airflow
- **Containerization**: Docker, Docker Compose
- **CI/CD**: GitHub Actions

## πŸ“¦ Quick Start

### Local Development

```bash
# Clone repository
git clone https://github.com/Abeshith/AutoML-MLOps-PipeLine.git
cd AutoML-MLOps-PipeLine

# Create virtual environment
python -m venv automlenv
source automlenv/bin/activate  # On Windows: automlenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set environment variables
cp .env.example .env
# Edit .env with your credentials

# Run training pipeline
python scripts/train.py

# Start API server
python scripts/serve.py --reload
```

### Docker Deployment

```bash
# Start all services
docker-compose up -d

# Access services
# API: http://localhost:8000/docs
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
```

## 🌐 API Endpoints

### Prediction
```bash
POST /predict
{
  "age": 45,
  "sex": 1,
  "cp": 2,
  "trestbps": 130,
  "chol": 250,
  "fbs": 0,
  "restecg": 1,
  "thalach": 150,
  "exang": 0,
  "oldpeak": 2.5,
  "slope": 2,
  "ca": 0,
  "thal": 2
}
```

### Training
```bash
POST /train
GET /train/status
```

### Monitoring
```bash
GET /monitoring/metrics          # Prometheus metrics
GET /monitoring/health/drift     # Drift detection status
GET /monitoring/performance/summary
GET /monitoring/reports/daily
```

## πŸ“Š Model Performance

- **Validation Accuracy**: 88.84%
- **Test Accuracy**: 88.68%
- **ROC-AUC**: 95.48%
- **Best Model**: WeightedEnsemble_L3

## πŸ”§ Utility Scripts

```bash
# Train model
python scripts/train.py

# Evaluate model
python scripts/evaluate.py --model-path <path>

# Start API server
python scripts/serve.py --host 0.0.0.0 --port 8000 --reload

# Initialize Airflow
python scripts/init_db.py
```

## πŸ”„ Airflow Orchestration

```bash
# Set AIRFLOW_HOME
export AIRFLOW_HOME=$(pwd)/airflow

# Initialize database
python scripts/init_db.py

# Start services
airflow scheduler  # Terminal 1
airflow webserver  # Terminal 2

# Access UI: http://localhost:8080
```

## πŸ“ˆ Monitoring Stack

- **Drift Detection**: KS test for numerical features
- **Prediction Logging**: JSONL format with threading
- **Performance Tracking**: Batch-level metrics
- **Report Generation**: Daily/weekly JSON reports
- **Prometheus Metrics**: Request count, latency, accuracy, drift status
- **Grafana Dashboards**: 5-panel visualization

## 🐳 Docker Services

- **FastAPI App** (8000): Main ML API
- **Prometheus** (9090): Metrics collection
- **Grafana** (3000): Visualization dashboards

## πŸ” Environment Variables

```env
MLFLOW_TRACKING_URI=your_dagshub_uri
DAGSHUB_TOKEN=your_token
```

## πŸ“š Documentation

- [Docker Setup](DOCKER.md)
- [Scripts Usage](scripts/README.md)
- [CI/CD Workflows](.github/workflows/README.md)
- [Airflow Guide](airflow/README.md)

## πŸ§ͺ CI/CD Pipeline

### Automated Workflows
- **CI**: Lint with flake8, format check with black
- **Docker Build**: Build and push to GitHub Container Registry
- **HuggingFace Deploy**: Auto-deploy to Spaces on push

### Container Images
```bash
docker pull ghcr.io/abeshith/automl-mlops-pipeline:latest
```

## πŸ“Š Project Structure

```
AutoML-MLOps-PipeLine/
β”œβ”€β”€ src/mlpipeline/          # Core pipeline components
β”œβ”€β”€ app/                      # FastAPI application
β”œβ”€β”€ config/                   # Configuration files
β”œβ”€β”€ scripts/                  # Utility scripts
β”œβ”€β”€ airflow/                  # Airflow DAGs
β”œβ”€β”€ monitoring/               # Monitoring components
β”œβ”€β”€ observability/            # Prometheus/Grafana configs
β”œβ”€β”€ notebooks/                # Jupyter notebooks
β”œβ”€β”€ Dockerfile                # Container definition
β”œβ”€β”€ docker-compose.yaml       # Multi-service orchestration
└── requirements.txt          # Python dependencies
```

⭐ Star this repo if you find it helpful!