Spaces:

sibikrish
/

PayShield-ML

Sleeping

File size: 3,207 Bytes

8a08300

# 🛠️ Development & MLOps Guide

This guide provides detailed instructions on setting up the environment, running experiments, and developing the PayShield-ML system.

---

## 🏗️ 1. Environment Setup

The project uses `uv` for lightning-fast Python package and project management.

### Prerequisites
- [uv](https://github.com/astral-sh/uv) installed
- Docker & Docker Compose
- Redis (can be run via Docker)

### Installation
```bash
# Sync dependencies and create virtual environment
uv sync

# Activate the environment
source .venv/bin/activate
```

---

## 📊 2. MLflow Tracking

We use MLflow to track hyperparameters, metrics, and model artifacts.

### Start MLflow Server
Run this in a separate terminal to view the UI:
```bash
uv run mlflow ui --host 0.0.0.0 --port 5000
```
Then access the dashboard at [http://localhost:5000](http://localhost:5000).

---

## 🚂 3. Model Training Pipeline

The training script handles data ingestion, feature engineering, cross-validation, and MLflow logging.

### Basic Training
```bash
uv run python src/models/train.py --data_path data/fraud_sample.csv
```

### Advanced Training with Custom Params
```bash
uv run python src/models/train.py \
    --data_path data/fraud_sample.csv \
    --experiment_name fraud_v2 \
    --min_recall 0.85 \
    --output_dir models/
```

| Argument | Description | Default |
| :--- | :--- | :--- |
| `--data_path` | Path to CSV/Parquet training data | (Required) |
| `--params_path` | Path to model config YAML | `configs/model_config.yaml` |
| `--experiment_name` | MLflow experiment grouping | `fraud_detection` |
| `--min_recall` | Target recall for threshold optimization | `0.80` |

---

## 🔌 4. Running Services Locally

For rapid iteration without rebuilding Docker images.

### Start Redis (Required)
```bash
docker run -d --name payshield-redis -p 6379:6379 redis:7-alpine
```

### Start FastAPI Backend
```bash
# Running with hot-reload
uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
```
- **Swagger Docs:** [http://localhost:8000/docs](http://localhost:8000/docs)

### Start Streamlit Dashboard
```bash
export API_URL="http://localhost:8000/v1/predict"
uv run streamlit run src/frontend/app.py --server.port 8501
```
- **Dashboard:** [http://localhost:8501](http://localhost:8501)

---

## 🐳 5. Full-Stack Development (Docker Compose)

The easiest way to replicate production-like environment.

### Build and Launch
```bash
docker-compose up --build
```

### Useful Commands
```bash
# Run in background
docker-compose up -d

# Check all service logs
docker-compose logs -f

# Stop and remove containers
docker-compose down
```

### Service Map
| Service | Port | Endpoint |
| :--- | :--- | :--- |
| **API** | 8000 | `http://localhost:8000` |
| **Dashboard** | 8501 | `http://localhost:8501` |
| **Redis** | 6379 | `localhost:6379` |

---

## 🧪 6. Testing

We use `pytest` for unit and integration tests.

```bash
# Run all tests
uv run pytest

# Run with coverage report
uv run pytest --cov=src
```

---
**Note:** Ensure you have the `models/fraud_model.pkl` and `models/threshold.json` artifacts present before starting the API. These are generated by the training pipeline.