Spaces:

sibikrish
/

PayShield-ML

Sleeping

App Files Files Community

PayShield-ML / docs /DEVELOPMENT.md

Sibi Krishnamoorthy

prod

8a08300 26 days ago

preview code

raw

history blame contribute delete

3.21 kB

	# 🛠️ Development & MLOps Guide

	This guide provides detailed instructions on setting up the environment, running experiments, and developing the PayShield-ML system.

	---

	## 🏗️ 1. Environment Setup

	The project uses `uv` for lightning-fast Python package and project management.

	### Prerequisites
	- [uv](https://github.com/astral-sh/uv) installed
	- Docker & Docker Compose
	- Redis (can be run via Docker)

	### Installation
	```bash
	# Sync dependencies and create virtual environment
	uv sync

	# Activate the environment
	source .venv/bin/activate
	```

	---

	## 📊 2. MLflow Tracking

	We use MLflow to track hyperparameters, metrics, and model artifacts.

	### Start MLflow Server
	Run this in a separate terminal to view the UI:
	```bash
	uv run mlflow ui --host 0.0.0.0 --port 5000
	```
	Then access the dashboard at [http://localhost:5000](http://localhost:5000).

	---

	## 🚂 3. Model Training Pipeline

	The training script handles data ingestion, feature engineering, cross-validation, and MLflow logging.

	### Basic Training
	```bash
	uv run python src/models/train.py --data_path data/fraud_sample.csv
	```

	### Advanced Training with Custom Params
	```bash
	uv run python src/models/train.py \
	--data_path data/fraud_sample.csv \
	--experiment_name fraud_v2 \
	--min_recall 0.85 \
	--output_dir models/
	```

	\| Argument \| Description \| Default \|
	\| :--- \| :--- \| :--- \|
	\| `--data_path` \| Path to CSV/Parquet training data \| (Required) \|
	\| `--params_path` \| Path to model config YAML \| `configs/model_config.yaml` \|
	\| `--experiment_name` \| MLflow experiment grouping \| `fraud_detection` \|
	\| `--min_recall` \| Target recall for threshold optimization \| `0.80` \|

	---

	## 🔌 4. Running Services Locally

	For rapid iteration without rebuilding Docker images.

	### Start Redis (Required)
	```bash
	docker run -d --name payshield-redis -p 6379:6379 redis:7-alpine
	```

	### Start FastAPI Backend
	```bash
	# Running with hot-reload
	uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
	```
	- Swagger Docs: [http://localhost:8000/docs](http://localhost:8000/docs)

	### Start Streamlit Dashboard
	```bash
	export API_URL="http://localhost:8000/v1/predict"
	uv run streamlit run src/frontend/app.py --server.port 8501
	```
	- Dashboard: [http://localhost:8501](http://localhost:8501)

	---

	## 🐳 5. Full-Stack Development (Docker Compose)

	The easiest way to replicate production-like environment.

	### Build and Launch
	```bash
	docker-compose up --build
	```

	### Useful Commands
	```bash
	# Run in background
	docker-compose up -d

	# Check all service logs
	docker-compose logs -f

	# Stop and remove containers
	docker-compose down
	```

	### Service Map
	\| Service \| Port \| Endpoint \|
	\| :--- \| :--- \| :--- \|
	\| API \| 8000 \| `http://localhost:8000` \|
	\| Dashboard \| 8501 \| `http://localhost:8501` \|
	\| Redis \| 6379 \| `localhost:6379` \|

	---

	## 🧪 6. Testing

	We use `pytest` for unit and integration tests.

	```bash
	# Run all tests
	uv run pytest

	# Run with coverage report
	uv run pytest --cov=src
	```

	---
	Note: Ensure you have the `models/fraud_model.pkl` and `models/threshold.json` artifacts present before starting the API. These are generated by the training pipeline.