File size: 3,207 Bytes
8a08300
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# 🛠️ Development & MLOps Guide

This guide provides detailed instructions on setting up the environment, running experiments, and developing the PayShield-ML system.

---

## 🏗️ 1. Environment Setup

The project uses `uv` for lightning-fast Python package and project management.

### Prerequisites
- [uv](https://github.com/astral-sh/uv) installed
- Docker & Docker Compose
- Redis (can be run via Docker)

### Installation
```bash
# Sync dependencies and create virtual environment
uv sync

# Activate the environment
source .venv/bin/activate
```

---

## 📊 2. MLflow Tracking

We use MLflow to track hyperparameters, metrics, and model artifacts.

### Start MLflow Server
Run this in a separate terminal to view the UI:
```bash
uv run mlflow ui --host 0.0.0.0 --port 5000
```
Then access the dashboard at [http://localhost:5000](http://localhost:5000).

---

## 🚂 3. Model Training Pipeline

The training script handles data ingestion, feature engineering, cross-validation, and MLflow logging.

### Basic Training
```bash
uv run python src/models/train.py --data_path data/fraud_sample.csv
```

### Advanced Training with Custom Params
```bash
uv run python src/models/train.py \
    --data_path data/fraud_sample.csv \
    --experiment_name fraud_v2 \
    --min_recall 0.85 \
    --output_dir models/
```

| Argument | Description | Default |
| :--- | :--- | :--- |
| `--data_path` | Path to CSV/Parquet training data | (Required) |
| `--params_path` | Path to model config YAML | `configs/model_config.yaml` |
| `--experiment_name` | MLflow experiment grouping | `fraud_detection` |
| `--min_recall` | Target recall for threshold optimization | `0.80` |

---

## 🔌 4. Running Services Locally

For rapid iteration without rebuilding Docker images.

### Start Redis (Required)
```bash
docker run -d --name payshield-redis -p 6379:6379 redis:7-alpine
```

### Start FastAPI Backend
```bash
# Running with hot-reload
uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
```
- **Swagger Docs:** [http://localhost:8000/docs](http://localhost:8000/docs)

### Start Streamlit Dashboard
```bash
export API_URL="http://localhost:8000/v1/predict"
uv run streamlit run src/frontend/app.py --server.port 8501
```
- **Dashboard:** [http://localhost:8501](http://localhost:8501)

---

## 🐳 5. Full-Stack Development (Docker Compose)

The easiest way to replicate production-like environment.

### Build and Launch
```bash
docker-compose up --build
```

### Useful Commands
```bash
# Run in background
docker-compose up -d

# Check all service logs
docker-compose logs -f

# Stop and remove containers
docker-compose down
```

### Service Map
| Service | Port | Endpoint |
| :--- | :--- | :--- |
| **API** | 8000 | `http://localhost:8000` |
| **Dashboard** | 8501 | `http://localhost:8501` |
| **Redis** | 6379 | `localhost:6379` |

---

## 🧪 6. Testing

We use `pytest` for unit and integration tests.

```bash
# Run all tests
uv run pytest

# Run with coverage report
uv run pytest --cov=src
```

---
**Note:** Ensure you have the `models/fraud_model.pkl` and `models/threshold.json` artifacts present before starting the API. These are generated by the training pipeline.