PayShield-ML / README.md
sibikrish's picture
Update README.md
94c7d2e verified
---
title: PayShield-ML
emoji:
colorFrom: yellow
colorTo: red
sdk: docker
pinned: true
app_port: 7860
license: mit
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/664345bd8b3a005a73e3e430/cRvhV8zNVwjQuHvp42boL.jpeg
---
# 🛡️ PayShield-ML:Real-Time Fraud Engine
![Banner](assets/banner_0.jpeg)
![Python](https://img.shields.io/badge/python-3.12-blue.svg)
![FastAPI](https://img.shields.io/badge/FastAPI-0.100+-009688.svg)
![Redis](https://img.shields.io/badge/redis-7.0+-DC382D.svg)
![XGBoost](https://img.shields.io/badge/XGBoost-2.0+-FF6F00.svg)
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg)
**A production-grade MLOps system providing low-latency fraud detection with stateful feature hydration and human-in-the-loop explainability.**
---
## 🏗️ System Architecture
The system follows a **Lambda Architecture** pattern, decoupling the high-latency model training from the low-latency inference path.
```mermaid
graph TD
User[Payment Gateway] -->|POST /predict| API[FastAPI Inference Service]
subgraph "Online Serving Layer (<50ms)"
API -->|1. Hydrate Velocity| Redis[(Redis Feature Store)]
API -->|2. Preprocess| Pipeline[Scikit-Learn Pipeline]
Pipeline -->|3. Predict| Model[XGBoost Classifier]
Model -->|4. Explain| SHAP[SHAP Engine]
end
subgraph "Offline Training Layer"
Data[(Transactional Data)] -->|Batch Load| Trainer[Training Pipeline]
Trainer -->|CV & Tuning| MLflow[MLflow Tracking]
Trainer -->|Export Artifact| Registry[Model Registry]
end
API -->|Async Log| ShadowLogs[Shadow Mode Logs]
API -->|Visualize| Dashboard[Analyst Dashboard]
```
### 🖥️ Analyst Workbench
![Dashboard Screenshot](assets/dashboard_screenshot.png)
*Real-time interface for fraud analysts to review blocked transactions and SHAP explanations.*
---
## 🚀 Key Features
### 1. Stateful Feature Store (Redis)
Traditional stateless APIs struggle with "Velocity Features" (e.g., *how many times did this user swipe in 24 hours?*). Our engine utilizes **Redis Sorted Sets (ZSET)** to maintain rolling windows, allowing feature hydration in **<2ms** with $O(\log N)$ complexity.
### 2. Shadow Mode (Dark Launch)
To mitigate the risk of model drift or false positives, the system supports a **Shadow Mode** configuration. The model runs in production, receives real traffic, and logs decisions, but never blocks a transaction. This allows for risk-free A/B testing against legacy rule engines.
### 3. Business-Aligned Optimization
Standard accuracy is misleading in fraud detection due to class imbalance. We implement a **Recall-Constraint Strategy** (Target: 80% Recall) ensuring the model captures the vast majority of fraud while maintaining a strict upper bound on False Positive Rates, as required by financial compliance.
### 4. Cold-Start Handling
Engineered logic to handle new users with zero history by defaulting to global medians and "warm-up" priors, preventing the system from unfairly blocking legitimate first-time customers.
---
## 📊 Performance & Metrics
The following metrics were achieved on a hold-out temporal test set (out-of-time validation):
| Metric | Result | Target / Bench |
| :--- | :--- | :--- |
| **PR-AUC** | 0.9245 | Excellent |
| **Precision** | 93.18% | Low False Positives |
| **Recall** | 80.06% | Target: 80% |
| **Inference Latency (p95)** | ~30ms | < 50ms |
---
## 🛠️ Quick Start
### Prerequisites
- Docker & Docker Compose
- (Optional) Python 3.12+ (managed via `uv`)
### Installation & Deployment
1. **Clone the Repository**
```bash
git clone https://github.com/Sibikrish3000/realtime-fraud-engine.git
cd realtime-fraud-engine
```
2. **Launch the Stack**
```bash
docker-compose up --build
```
*This starts the FastAPI Backend, Redis Feature Store, and Streamlit Dashboard.*
3. **Access the Services**
- **Analyst Dashboard:** [http://localhost:8501](http://localhost:8501)
- **API Documentation:** [http://localhost:8000/docs](http://localhost:8000/docs)
- **Redis Instance:** `localhost:6379`
---
## 📂 Project Structure
```text
src/
├── api/ # FastAPI service, schemas (Pydantic), and config
├── features/ # Redis feature store logic and sliding window constants
├── models/ # Training pipelines, metrics calculation, and XGBoost wrappers
├── frontend/ # Streamlit-based analyst workbench
├── data/ # Data ingestion and cleaning utilities
└── explainability.py # SHAP-based waterfall plots and global importance
```
See the [Development & MLOps Guide](docs/DEVELOPMENT.md) for detailed instructions on training and local development.
---
## 📡 API Reference
### Predict Transaction
`POST /v1/predict`
**Request:**
```bash
curl -X 'POST' \
'http://localhost:8000/v1/predict' \
-H 'Content-Type: application/json' \
-d '{
"user_id": "u12345",
"trans_date_trans_time": "2024-01-20 14:30:00",
"amt": 150.75,
"lat": 40.7128,
"long": -74.0060,
"merch_lat": 40.7306,
"merch_long": -73.9352,
"category": "grocery_pos"
}'
```
**Response:**
```json
{
"decision": "APPROVE",
"probability": 0.12,
"risk_score": 12.0,
"latency_ms": 28.5,
"shadow_mode": false,
"shap_values": {
"amt": 0.05,
"dist": 0.02
}
}
```
---
## 🗺️ Future Roadmap
- [ ] **Kafka Integration:** Transition to an asynchronous event-driven architecture for high-throughput stream processing.
- [ ] **KServe Deployment:** Migrate from standalone FastAPI to KServe for automated scaling and model versioning.
- [ ] **Graph Features:** Incorporate Neo4j-based features to detect fraud rings and synthetic identities.
---
**Author:** [Sibi Krishnamoorthy](https://github.com/Sibikrish3000)
*Machine Learning Engineer | Fintech & Risk Analytics*