Spaces:
Sleeping
Sleeping
File size: 3,207 Bytes
8a08300 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# 🛠️ Development & MLOps Guide
This guide provides detailed instructions on setting up the environment, running experiments, and developing the PayShield-ML system.
---
## 🏗️ 1. Environment Setup
The project uses `uv` for lightning-fast Python package and project management.
### Prerequisites
- [uv](https://github.com/astral-sh/uv) installed
- Docker & Docker Compose
- Redis (can be run via Docker)
### Installation
```bash
# Sync dependencies and create virtual environment
uv sync
# Activate the environment
source .venv/bin/activate
```
---
## 📊 2. MLflow Tracking
We use MLflow to track hyperparameters, metrics, and model artifacts.
### Start MLflow Server
Run this in a separate terminal to view the UI:
```bash
uv run mlflow ui --host 0.0.0.0 --port 5000
```
Then access the dashboard at [http://localhost:5000](http://localhost:5000).
---
## 🚂 3. Model Training Pipeline
The training script handles data ingestion, feature engineering, cross-validation, and MLflow logging.
### Basic Training
```bash
uv run python src/models/train.py --data_path data/fraud_sample.csv
```
### Advanced Training with Custom Params
```bash
uv run python src/models/train.py \
--data_path data/fraud_sample.csv \
--experiment_name fraud_v2 \
--min_recall 0.85 \
--output_dir models/
```
| Argument | Description | Default |
| :--- | :--- | :--- |
| `--data_path` | Path to CSV/Parquet training data | (Required) |
| `--params_path` | Path to model config YAML | `configs/model_config.yaml` |
| `--experiment_name` | MLflow experiment grouping | `fraud_detection` |
| `--min_recall` | Target recall for threshold optimization | `0.80` |
---
## 🔌 4. Running Services Locally
For rapid iteration without rebuilding Docker images.
### Start Redis (Required)
```bash
docker run -d --name payshield-redis -p 6379:6379 redis:7-alpine
```
### Start FastAPI Backend
```bash
# Running with hot-reload
uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
```
- **Swagger Docs:** [http://localhost:8000/docs](http://localhost:8000/docs)
### Start Streamlit Dashboard
```bash
export API_URL="http://localhost:8000/v1/predict"
uv run streamlit run src/frontend/app.py --server.port 8501
```
- **Dashboard:** [http://localhost:8501](http://localhost:8501)
---
## 🐳 5. Full-Stack Development (Docker Compose)
The easiest way to replicate production-like environment.
### Build and Launch
```bash
docker-compose up --build
```
### Useful Commands
```bash
# Run in background
docker-compose up -d
# Check all service logs
docker-compose logs -f
# Stop and remove containers
docker-compose down
```
### Service Map
| Service | Port | Endpoint |
| :--- | :--- | :--- |
| **API** | 8000 | `http://localhost:8000` |
| **Dashboard** | 8501 | `http://localhost:8501` |
| **Redis** | 6379 | `localhost:6379` |
---
## 🧪 6. Testing
We use `pytest` for unit and integration tests.
```bash
# Run all tests
uv run pytest
# Run with coverage report
uv run pytest --cov=src
```
---
**Note:** Ensure you have the `models/fraud_model.pkl` and `models/threshold.json` artifacts present before starting the API. These are generated by the training pipeline.
|