Spaces:

Hodfa71
/

argus-mlops

Sleeping

App Files Files Community

argus-mlops / README.md

hodfa840

Fix scroll reset for HF Spaces double-iframe context

1aa566a about 2 months ago

preview code

raw

history blame contribute delete

12.2 kB

	---
	title: Argus ML Observability Platform
	emoji: 🚕
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	license: mit
	short_description: MLOps drift detection and auto-retraining platform
	---

	# Argus

	> Production ML models fail silently. Argus detects the failure, explains why, and fixes it automatically.

	[![Live Demo](https://img.shields.io/badge/Live%20Demo-Hugging%20Face-FFD21E?logo=huggingface&logoColor=black)](https://huggingface.co/spaces/Hodfa71/argus-mlops)
	[![Python](https://img.shields.io/badge/Python-3.10%2B-blue?logo=python&logoColor=white)](https://python.org)
	[![FastAPI](https://img.shields.io/badge/FastAPI-0.111-009688?logo=fastapi&logoColor=white)](https://fastapi.tiangolo.com)
	[![Streamlit](https://img.shields.io/badge/Streamlit-1.33-FF4B4B?logo=streamlit&logoColor=white)](https://streamlit.io)
	[![MLflow](https://img.shields.io/badge/MLflow-2.12-0194E2?logo=mlflow&logoColor=white)](https://mlflow.org)
	[![scikit-learn](https://img.shields.io/badge/scikit--learn-1.4-F7931E?logo=scikit-learn&logoColor=white)](https://scikit-learn.org)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)

	[Try the live demo on Hugging Face Spaces](https://huggingface.co/spaces/Hodfa71/argus-mlops)

	---

	## Architecture

	![Architecture](assets/architecture.png)

	Argus is a production-grade ML monitoring platform built around a NYC taxi trip-duration prediction service. It demonstrates the full MLOps lifecycle:

	- Serves low-latency predictions through a FastAPI REST API
	- Monitors model health continuously with PSI + KS-test drift detection
	- Handles 24-hour delayed ground truth using request-ID matching
	- Explains why drift occurred via root-cause analysis (PSI × feature importance)
	- Retrains automatically only when both drift and performance degradation are confirmed
	- Runs safe champion-challenger model promotion — no regression allowed
	- Tracks every experiment in MLflow
	- Visualises everything in a dark-themed Streamlit dashboard

	---

	## Live Dashboard

	![Argus dashboard demo](assets/demo.gif)

	---

	## Dashboard Preview

	![Dashboard Preview](assets/dashboard_preview.png)

	---

	## Drift Detection in Action

	![Drift Detection](assets/drift_detection.png)

	Per-feature PSI breakdown showing distribution shift between reference and live windows.
	Features are color-coded: green = stable, amber = moderate, red = significant drift.

	---

	## Before vs. After Retraining

	![Performance Recovery](assets/before_after_retraining.png)

	RMSE degrades during the drift window, then recovers once the challenger is promoted.
	The right panel shows rolling degradation percentage relative to the baseline.

	---

	## Feature Importance Tracking

	![Feature Importance](assets/feature_importance.png)

	Side-by-side champion vs. challenger feature importance with delta view.

	---

	## PSI Heatmap

	![PSI Heatmap](assets/psi_heatmap.png)

	Feature drift intensity across consecutive monitoring windows.
	Dark = stable, amber = moderate, red = significant drift.

	---

	## Live Demo

	\| Service \| URL \|
	\|---------\|-----\|
	\| API Docs (Swagger) \| `https://argus-ml-api.onrender.com/docs` \|
	\| Monitoring Dashboard \| `https://argus-dashboard.streamlit.app` \|
	\| Health Check \| `https://argus-ml-api.onrender.com/health` \|

	> Deploy your own instance: [deployment/DEPLOYMENT.md](deployment/DEPLOYMENT.md)

	---

	## Quick Start

	```bash
	# 1. Clone
	git clone https://github.com/YOUR_USERNAME/argus.git
	cd argus

	# 2. Install
	python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
	pip install -r requirements.txt

	# 3. Train initial model + build reference dataset
	python scripts/train_initial_model.py

	# 4. Full end-to-end demo (no running API required)
	python scripts/demo.py
	```

	Expected demo output:

	```
	STEP 4: Inject Drift
	Drift type : SUDDEN (abrupt distribution shift)
	trip_distance mean: 3.21 -> 12.86 (delta=+9.65)
	pickup_hour mean: 11.4 -> 16.8 (delta=+5.4)

	STEP 5: Drift Detection
	Feature drift detected : True
	Drifted features : ['trip_distance', 'pickup_hour']
	Performance degraded : True (+48.6%)

	STEP 6: Root-Cause Analysis
	Primary cause : trip_distance (RCA score=0.521)
	Action : retrain_immediately

	STEP 9: Summary
	{
	"drift_detected": true,
	"root_cause": ["trip_distance", "pickup_hour"],
	"performance_drop": "48.6%",
	"action": "retrain_immediately",
	"model_promoted": true,
	"new_model_improvement": "11.23%"
	}
	```

	---

	## API Reference

	### Start the server

	```bash
	uvicorn app:app --reload --port 8000
	# Swagger UI: http://localhost:8000/docs
	```

	### Predict trip duration

	```bash
	curl -X POST http://localhost:8000/predict \
	-H "Content-Type: application/json" \
	-d '{
	"passenger_count": 2,
	"trip_distance": 3.5,
	"pickup_hour": 8,
	"pickup_dow": 1,
	"pickup_month": 3,
	"pickup_is_weekend": 0,
	"rate_code_id": 1,
	"payment_type": 1,
	"pu_location_zone": 10,
	"do_location_zone": 25,
	"vendor_id": 1
	}'
	```

	```json
	{
	"request_id": "3f4a1b2c-...",
	"predicted_duration_min": 14.23,
	"model_version": "a3b7f921",
	"timestamp": "2024-03-15T09:42:11Z"
	}
	```

	### Submit delayed ground truth

	```bash
	curl -X POST http://localhost:8000/predict/feedback \
	-H "Content-Type: application/json" \
	-d '{"request_id": "3f4a1b2c-...", "actual_duration_min": 16.5}'
	```

	### Drift status

	```bash
	curl http://localhost:8000/monitor/drift
	```

	```json
	{
	"drift_detected": true,
	"root_cause": ["trip_distance", "pickup_hour"],
	"performance_drop": "18.3%",
	"action": "retraining_triggered",
	"rca_details": [
	{"feature": "trip_distance", "psi": 0.312, "importance": 0.421, "rca_score": 0.444},
	{"feature": "pickup_hour", "psi": 0.241, "importance": 0.187, "rca_score": 0.286}
	]
	}
	```

	### Performance metrics

	```bash
	curl http://localhost:8000/monitor/metrics
	```

	### Trigger retraining manually

	```bash
	curl -X POST http://localhost:8000/monitor/retrain
	```

	---

	## Drift Simulation

	```bash
	# Gradual shift over 500 windows
	python scripts/simulate_drift.py --drift-type gradual --steps 500

	# Sudden distribution shift (severity 2x)
	python scripts/simulate_drift.py --drift-type sudden --severity 2.0

	# Seasonal oscillation
	python scripts/simulate_drift.py --drift-type seasonal --steps 1000

	# Mixed drift with 30-step feedback lag
	python scripts/simulate_drift.py --drift-type mixed --feedback-lag 30
	```

	---

	## Docker Stack

	```bash
	docker compose up --build

	# API: http://localhost:8000
	# Dashboard: http://localhost:8501
	# MLflow UI: http://localhost:5000
	```

	---

	## Project Structure

	```
	argus/
	├── src/
	│ ├── api/
	│ │ ├── main.py # FastAPI app factory + lifespan
	│ │ ├── schemas.py # Pydantic request/response models
	│ │ └── routes/
	│ │ ├── predict.py # POST /predict, POST /predict/feedback
	│ │ ├── monitor.py # GET /monitor/drift, /metrics, /history
	│ │ └── health.py # GET /health
	│ │
	│ ├── data/
	│ │ ├── generator.py # Synthetic NYC taxi data generator
	│ │ ├── drift_simulator.py # Gradual / Sudden / Seasonal drift engine
	│ │ └── preprocessing.py # Stateless feature pipeline
	│ │
	│ ├── models/
	│ │ ├── trainer.py # GradientBoosting + MLflow experiment tracking
	│ │ ├── registry.py # File-based champion/challenger registry
	│ │ └── evaluator.py # Side-by-side model comparison
	│ │
	│ ├── monitoring/
	│ │ ├── drift_detector.py # PSI + KS-test; performance drift detection
	│ │ ├── performance_monitor.py # Rolling window + delayed feedback matching
	│ │ └── root_cause_analyzer.py # PSI × importance ranking
	│ │
	│ └── retraining/
	│ ├── trigger.py # Dual-gate retraining decision logic
	│ └── pipeline.py # Retrain → compare → promote workflow
	│
	├── dashboard/
	│ └── app.py # Streamlit dashboard (5 pages)
	│
	├── tests/
	│ ├── test_drift_detection.py
	│ ├── test_api.py
	│ └── test_retraining.py
	│
	├── scripts/
	│ ├── train_initial_model.py # Bootstrap: generate data + train champion
	│ ├── simulate_drift.py # Production traffic + drift simulator
	│ ├── demo.py # Full in-process walkthrough
	│ └── generate_assets.py # Regenerate README images
	│
	├── configs/
	│ └── config.yaml # All thresholds, hyperparams, paths
	│
	├── assets/ # Generated plots for README
	├── deployment/ # Render + Streamlit Cloud guides
	├── Dockerfile
	├── Dockerfile.dashboard
	├── docker-compose.yml
	└── requirements.txt
	```

	---

	## Design Highlights

	### 1. Delayed Feedback Handling

	Production labels rarely arrive instantly — a taxi duration is only known after the trip ends.
	A fraud label can take days to confirm. Argus handles this with a request-ID matching system:

	```
	Prediction time Label arrival
	\| \|
	v v
	[predict] ── 24h gap ─────► [feedback]
	\| \|
	└───────── request_id ──────┘
	(matched)
	```

	Performance metrics are only updated when the label arrives and is correctly attributed to
	the prediction at the time the features were observed.

	---

	### 2. Root-Cause Drift Analysis

	Most drift detectors stop at a Boolean flag. Argus answers a more useful question:
	which features drifted, and how much do they actually matter to the model?

	```
	RCA Score = PSI × (1 + model feature importance)
	```

	A feature with high PSI but zero importance scores low — safe to ignore.
	A feature with moderate PSI but high importance scores high — retrain urgently.

	---

	### 3. Dual-Gate Retraining Trigger

	Retraining fires only when all four conditions are met simultaneously:

	```
	Gate 1 Feature drift detected (PSI >= 0.20)
	Gate 2 Performance degraded (RMSE > baseline + 15%)
	Gate 3 Enough new samples (>= 1000 since last retrain)
	Gate 4 Cooldown elapsed (>= 6 hours)
	```

	This prevents:
	- Retraining on transient anomalies (drift without performance impact)
	- Retraining when performance drops for non-drift reasons
	- Oscillation loops (cooldown window)
	- Retraining on statistically insufficient data

	---

	### 4. Champion-Challenger Promotion

	Every retrain produces a challenger model evaluated against the champion on a held-out
	set. The challenger is only promoted if it improves RMSE by at least 5%. This ensures the
	production model never regresses.

	---

	## Configuration

	All thresholds and paths are in [configs/config.yaml](configs/config.yaml):

	```yaml
	monitoring:
	drift:
	psi_threshold: 0.2 # flag when PSI >= 0.2
	ks_pvalue_threshold: 0.05 # flag when p-value < 0.05
	performance:
	degradation_threshold: 0.15 # retrain when RMSE +15% vs baseline

	retraining:
	cooldown_hours: 6
	min_samples_since_last_retrain: 1000

	model:
	evaluation:
	promotion_threshold: 0.05 # challenger must beat champion by 5%
	hyperparams:
	n_estimators: 200
	max_depth: 6
	learning_rate: 0.05
	```

	---

	## Tests

	```bash
	pytest tests/ -v

	# Individual suites
	pytest tests/test_drift_detection.py -v
	pytest tests/test_retraining.py -v
	pytest tests/test_api.py -v
	```

	---

	## MLflow Tracking

	```bash
	mlflow ui --backend-store-uri data/mlruns --port 5000
	# Open: http://localhost:5000
	```

	Each run logs hyperparameters, RMSE/MAE/R2, feature importances, the trained model
	artifact, and custom tags for trigger reason and root cause.

	---

	## License

	MIT — see [LICENSE](LICENSE).