PayShield-ML / docs /DEVELOPMENT.md
Sibi Krishnamoorthy
prod
8a08300
# 🛠️ Development & MLOps Guide
This guide provides detailed instructions on setting up the environment, running experiments, and developing the PayShield-ML system.
---
## 🏗️ 1. Environment Setup
The project uses `uv` for lightning-fast Python package and project management.
### Prerequisites
- [uv](https://github.com/astral-sh/uv) installed
- Docker & Docker Compose
- Redis (can be run via Docker)
### Installation
```bash
# Sync dependencies and create virtual environment
uv sync
# Activate the environment
source .venv/bin/activate
```
---
## 📊 2. MLflow Tracking
We use MLflow to track hyperparameters, metrics, and model artifacts.
### Start MLflow Server
Run this in a separate terminal to view the UI:
```bash
uv run mlflow ui --host 0.0.0.0 --port 5000
```
Then access the dashboard at [http://localhost:5000](http://localhost:5000).
---
## 🚂 3. Model Training Pipeline
The training script handles data ingestion, feature engineering, cross-validation, and MLflow logging.
### Basic Training
```bash
uv run python src/models/train.py --data_path data/fraud_sample.csv
```
### Advanced Training with Custom Params
```bash
uv run python src/models/train.py \
--data_path data/fraud_sample.csv \
--experiment_name fraud_v2 \
--min_recall 0.85 \
--output_dir models/
```
| Argument | Description | Default |
| :--- | :--- | :--- |
| `--data_path` | Path to CSV/Parquet training data | (Required) |
| `--params_path` | Path to model config YAML | `configs/model_config.yaml` |
| `--experiment_name` | MLflow experiment grouping | `fraud_detection` |
| `--min_recall` | Target recall for threshold optimization | `0.80` |
---
## 🔌 4. Running Services Locally
For rapid iteration without rebuilding Docker images.
### Start Redis (Required)
```bash
docker run -d --name payshield-redis -p 6379:6379 redis:7-alpine
```
### Start FastAPI Backend
```bash
# Running with hot-reload
uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload
```
- **Swagger Docs:** [http://localhost:8000/docs](http://localhost:8000/docs)
### Start Streamlit Dashboard
```bash
export API_URL="http://localhost:8000/v1/predict"
uv run streamlit run src/frontend/app.py --server.port 8501
```
- **Dashboard:** [http://localhost:8501](http://localhost:8501)
---
## 🐳 5. Full-Stack Development (Docker Compose)
The easiest way to replicate production-like environment.
### Build and Launch
```bash
docker-compose up --build
```
### Useful Commands
```bash
# Run in background
docker-compose up -d
# Check all service logs
docker-compose logs -f
# Stop and remove containers
docker-compose down
```
### Service Map
| Service | Port | Endpoint |
| :--- | :--- | :--- |
| **API** | 8000 | `http://localhost:8000` |
| **Dashboard** | 8501 | `http://localhost:8501` |
| **Redis** | 6379 | `localhost:6379` |
---
## 🧪 6. Testing
We use `pytest` for unit and integration tests.
```bash
# Run all tests
uv run pytest
# Run with coverage report
uv run pytest --cov=src
```
---
**Note:** Ensure you have the `models/fraud_model.pkl` and `models/threshold.json` artifacts present before starting the API. These are generated by the training pipeline.