File size: 10,880 Bytes
f022dec
59abb4f
 
 
f022dec
 
59abb4f
 
f022dec
 
59abb4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
---
title: ClinicalMatch AI
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: true
---

# ClinicalMatch AI β€” Precision Clinical Trial Matching & Recruitment Agent

**"Agents Assemble: Healthcare AI Endgame Challenge"** β€” Prompt Opinion platform  
Standards: **FHIR R4 Β· MCP Β· A2A**

> 80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that.

---

## What it does

ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring.

**Key capabilities:**

| Feature | Description |
|---|---|
| **Eligibility Check** | Individual enters raw clinical data (age, labs in SI units, biomarkers) β€” no patient ID required β€” and receives ranked, explainable trial matches |
| **Trial Finder** | Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph |
| **Graph Intelligence** | Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk |
| **A2A Pipeline** | 5-state orchestration (INGEST β†’ PARSE β†’ MATCH β†’ SCORE β†’ RECRUIT) for FHIR patient profiles |
| **Recruitment Hub** | Kanban board tracking patients through IDENTIFIED β†’ ENROLLED; generates personalized outreach (PCP letter, patient email, social post) |
| **GraphRAG** | Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?") |
| **MCP Server** | 6 tools callable by Prompt Opinion directly via stdio transport |

---

## Architecture

```
Prompt Opinion Platform
        β”‚  MCP Protocol (stdio)
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MCP Server (mcp_server.py)                        β”‚
β”‚  find_trials Β· screen_patient Β· match_patient      β”‚
β”‚  generate_outreach Β· get_analytics Β· summarize     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ A2A Orchestration
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FastAPI Backend  (main.py, port 8000)             β”‚
β”‚  30+ REST endpoints                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ CT.gov   β”‚  FHIR R4   β”‚  Claude    β”‚  Neo4j Graph  β”‚
β”‚ live API β”‚  adapter   β”‚  LLM       β”‚  RAG + match  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Next.js 16 Frontend  (port 3000)                  β”‚
β”‚  Trial Finder Β· Eligibility Check Β· Screening      β”‚
β”‚  Recruitment Hub Β· Dashboard Β· Map Β· GraphRAG      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚  Nginx (port 7860)
                       β–Ό
              HuggingFace Spaces
```

**Data sources (all free, no auth):**

| Source | Data |
|---|---|
| ClinicalTrials.gov v2 | Real recruiting NCT trials, sorted by recency |
| RxNorm (NIH) | Medication RxCUI codes |
| ICD-10 CM (NLM) | Cancer diagnosis codes |
| PubMed (NCBI) | Supporting literature PMIDs |
| OpenFDA | Drug labels and adverse events |
| Synthetic | 500 realistic patient profiles matched to real trials |

---

## Graph Knowledge Base

After seeding, the Neo4j graph contains:

| Node type | Count | Key properties |
|---|---|---|
| Patient | 500 | age, sex, ECOG, condition, city, biomarkers[], medications[] |
| Trial | ~250 | NCT ID, eligibility criteria, phase, last_updated |
| Diagnosis | ~130 | ICD-10 codes across 10 oncology conditions |
| Biomarker | 20 | HER2+/βˆ’, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc. |
| Medication | 16 | Trastuzumab, Pembrolizumab, Olaparib, etc. |
| StudySite | ~200 | lat/lon coordinates |
| **ELIGIBLE_FOR edges** | **~9,100** | score, linking patients to trials |

The graph grows passively β€” every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges.

---

## Clinical Eligibility Check (SI Units)

The `/intake` page accepts raw clinical data β€” no patient ID or account required. Fields:

**Demographics:** Age (years), Sex, ECOG performance status (0–4), Disease stage (I–IV)

**Biomarker status (toggles):**
- Breast/Gynecologic: HER2+/βˆ’, ER+, PR+, BRCA1/2 mutation, Triple-Negative
- Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1
- GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E
- Hematology: FLT3, IDH1/2, BCR-ABL

**Lab values (SI units):**

| Field | Unit | Conversion |
|---|---|---|
| Haemoglobin | g/dL | β€” |
| WBC | Γ—10⁹/L | β€” |
| ANC | Γ—10⁹/L | β€” |
| Platelets | Γ—10⁹/L | β€” |
| Creatinine | **ΞΌmol/L** | auto-converted Γ·88.4 β†’ mg/dL for trial text |
| eGFR | mL/min/1.73mΒ² | β€” |
| Bilirubin | **ΞΌmol/L** | auto-converted Γ·17.1 β†’ mg/dL for trial text |
| ALT / AST | U/L | β€” |

Matching score breakdown:
- **Age** 25 pts β€” compared against trial min/max age
- **Sex** 15 pts β€” compared against trial sex restriction
- **ECOG** 15 pts β€” extracted via regex from eligibility criteria text
- **Biomarkers** 30 pts β€” checks whether biomarker terms appear in trial eligibility text
- **Lab values** 15 pts β€” parses thresholds from text, converts SI units, checks patient values

Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links.

---

## Running Locally (no Docker)

```bash
# 1. Start Neo4j
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community

# 2. Backend
cd backend
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cp ../.env.example ../.env.local   # fill in credentials
uvicorn main:app --reload --port 8000

# 3. Schema setup (once)
curl -X POST http://localhost:8000/setup

# 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients)
curl -X POST http://localhost:8000/seed

# 5. Frontend
cd frontend
npm install --legacy-peer-deps
npm run dev        # http://localhost:3000  (uses --webpack, not Turbopack)

# 6. MCP server (for Prompt Opinion integration)
cd backend
python mcp_server.py
```

---

## Running with Docker Compose

```bash
cp .env.example .env.local   # fill in OPENAI_API_KEY etc.
docker compose up -d

# Wait ~60s for Neo4j to be healthy, then:
curl -X POST http://localhost:7860/setup
curl -X POST http://localhost:7860/seed
```

Services: app β†’ http://localhost:7860 | API docs β†’ http://localhost:7860/api/docs | Neo4j β†’ http://localhost:7474

---

## Deploying to HuggingFace Spaces

1. Create a Space β†’ **Docker SDK** β†’ blank template
2. Push repo to the Space:
   ```bash
   git remote add hf https://huggingface.co/spaces/<username>/<space-name>
   git push hf main
   ```
3. Set **Repository Secrets**:
   ```
   OPENAI_API_KEY    = <aimlapi.com key>
   OPENAI_BASE_URL   = https://ai.aimlapi.com/v1
   OPENAI_MODEL      = claude-opus-4-7
   NEO4J_PASSWORD    = clinicalmatch2024
   ```
4. After first boot, seed data:
   ```
   POST https://<space>.hf.space/seed
   ```

---

## MCP Tools (Prompt Opinion integration)

```bash
python backend/mcp_server.py   # stdio transport
```

| Tool | Arguments | Description |
|---|---|---|
| `find_trials` | `condition, phase?` | Real-time trial search |
| `screen_patient` | `patient_id, nct_id` | Eligibility screening |
| `match_patient_to_trials` | `patient_id` | Top-N trial matches |
| `generate_recruitment_outreach` | `patient_id, nct_id, channel` | Personalized outreach |
| `get_trial_analytics` | β€” | Enrollment funnel + KPIs |
| `summarize_trial_protocol` | `nct_id` | AI-parsed protocol summary |

---

## Key API Endpoints

| Method | Path | Description |
|---|---|---|
| POST | `/api/v1/intake/match` | SI-unit intake β†’ ranked trial matches |
| GET | `/api/v1/intake/biomarkers` | Biomarker registry |
| GET | `/api/v1/trials/search` | Real-time CT.gov search (recency-sorted, graph-enriched) |
| GET | `/api/v1/trials/{nct_id}/intelligence` | Graph intelligence per trial |
| GET | `/api/v1/graph/patients` | Query seeded patient IDs from Neo4j |
| POST | `/api/v1/patients/{id}/screen/{nct_id}` | Screen FHIR patient against trial |
| POST | `/api/v1/workflow/run` | Run full A2A pipeline |
| GET | `/api/v1/analytics/kpi` | Dashboard KPIs |
| GET | `/api/v1/map/data` | Site coordinates + patient clusters |
| POST | `/api/v1/graph/query` | GraphRAG natural language query |
| POST | `/seed` | Seed full graph from live APIs |
| GET | `/api/v1/graph/stats` | Node and edge counts |

Full interactive docs: `http://localhost:8000/docs`

---

## Environment Variables

| Variable | Description | Default |
|---|---|---|
| `NEO4J_URI` | Neo4j bolt URI | `bolt://localhost:7687` |
| `NEO4J_USERNAME` | Neo4j username | `neo4j` |
| `NEO4J_PASSWORD` | Neo4j password | `clinicalmatch2024` |
| `NEO4J_DATABASE` | Database name | `neo4j` |
| `OPENAI_API_KEY` | aimlapi.com API key | β€” |
| `OPENAI_BASE_URL` | LLM base URL | `https://ai.aimlapi.com/v1` |
| `OPENAI_MODEL` | Model identifier | `claude-opus-4-7` |
| `NEXT_PUBLIC_API_URL` | Frontend API base URL | `""` (relative, via Nginx) |

---

## Frontend Pages

| Route | Page | Description |
|---|---|---|
| `/` | Trial Finder | Real-time CT.gov search, recency-sorted, graph intelligence on expand |
| `/intake` | Eligibility Check | SI-unit clinical intake form, no patient ID required |
| `/screening` | Patient Screening | FHIR patient + trial combobox, A2A pipeline with state tracker |
| `/recruitment` | Recruitment Hub | Kanban board, AI outreach generation (PCP / email / social) |
| `/dashboard` | Dashboard | KPI cards, enrollment funnel, demographics, site performance |
| `/map` | Site Map | Leaflet map of trial sites and patient density clusters |
| `/graph` | GraphRAG | Natural language queries over the knowledge graph |