File size: 3,104 Bytes
d69f481
1c59e33
 
d69f481
1c59e33
 
4cfd9c3
d69f481
1c59e33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d69f481
 
1c59e33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: GraphRAG vs Vector RAG  Fraud Detection Benchmark
emoji: 🕸️
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.44.0
app_file: app.py
pinned: true
tags:
  - graph-neural-networks
  - fraud-detection
  - neo4j
  - rag
  - llm
  - groq
  - mlops
---

# 🕸️ GraphRAG vs Vector RAG — Live Fraud Detection Benchmark

**By [Daniel Fonseca](https://linkedin.com/in/daniel-fonsecaai) · AI/ML Engineer · Graph Neural Networks · Fraud Detection**

[![Neo4j](https://img.shields.io/badge/Neo4j-Aura-00ED64?style=flat&logo=neo4j)](https://neo4j.com)
[![Groq](https://img.shields.io/badge/LLM-Groq%20%2F%20Llama%203.1-blueviolet)](https://groq.com)
[![Streamlit](https://img.shields.io/badge/Frontend-Streamlit-FF4B4B)](https://streamlit.io)

---

## What this demo shows

A live benchmark comparing two RAG architectures on **fraud detection queries**:

| | GraphRAG | Vector RAG |
|---|---|---|
| Retrieval | Cypher → Neo4j graph traversal | Embedding → cosine similarity |
| Precision | ~94% on relational queries | ~38% |
| Latency | ~60ms | ~300ms |
| Money mule chains | ✅ Full path | ❌ Cannot traverse |
| Shared device cluster | ✅ Exact | ⚠️ Approximate |

**Core insight**: Fraud lives in *connections*. A device shared by 3 customers, a money mule chain with 3 hops, 6 accounts from the same IP — these patterns are invisible to embeddings but trivially discoverable with a single Cypher traversal.

---

## Architecture

```
User question (natural language)


Groq/Llama 3.1 ──► Cypher query generation


Neo4j Aura ──► Graph traversal (2-5 hops)


Structured records ──► Groq/Llama ──► Fraud analysis answer
```

---

## Graph schema

```
(Customer)-[:HAS_ACCOUNT]->(Account)
(Customer)-[:USED]->(Device)
(Account)-[:ACCESSED_FROM]->(IP)
(Account)-[:TRANSFER {amount, date}]->(Account)
(Account)-[:TRANSACTION {amount, type}]->(Merchant)
```

Fraud patterns detectable:
- 🔴 **Shared device cluster** — emulator farms, identity theft
- 🔴 **IP overlap** — account opening fraud
- 🔴 **Money mule chain** — layering (A-102 → A-445 → A-667 → A-890)
- 🔴 **Card testing** — micro-transactions on merchants

---

## Setup (add to HF Secrets)

| Secret | Description |
|--------|-------------|
| `NEO4J_URI` | Neo4j Aura connection URI (`neo4j+s://...`) |
| `NEO4J_USER` | Usually `neo4j` |
| `NEO4J_PASSWORD` | Your Aura password |
| `GROQ_API_KEY` | Free at [console.groq.com](https://console.groq.com) |

After adding secrets: click **"Seed fraud graph"** in the sidebar to populate Neo4j.

> Without credentials the app runs in demo mode with realistic simulated responses.

---

## Related projects

- [IBM Safer Payments — AUC-ROC 0.9591](https://huggingface.co/spaces/daniel-fonsecaai)
- [HetGNN Fraud Graph Explorer](https://huggingface.co/spaces/daniel-fonsecaai)
- [Agentic RAG Pipeline on Kubernetes](https://huggingface.co/spaces/daniel-fonsecaai)

---

*Built with Neo4j Aura · Groq · Llama 3.1 · Streamlit · PyVis · Plotly*