File size: 12,630 Bytes
59abb4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
# ClinicalMatch AI β€” Agent Instructions

> Project memory (build state, completed features, constraints) is also tracked in `.claude/project_memory.md` in this repo.

This is a hackathon submission for **"Agents Assemble: Healthcare AI Endgame Challenge"** on the Prompt Opinion platform. Judging criteria: MCP compliance, A2A workflow, FHIR R4 standards, AI quality, impact, feasibility.

## Stack at a glance

| Layer | Technology |
|---|---|
| Backend | FastAPI (Python 3.12), uvicorn |
| Graph DB | Neo4j Community 5.x via bolt |
| LLM | claude-opus-4-7 via aimlapi.com (OpenAI-compatible) |
| GraphRAG | LangChain `GraphCypherQAChain` + custom Cypher prompt |
| Frontend | Next.js 16 (webpack mode), React 19, Tailwind CSS 3, Recharts, Leaflet |
| Standards | FHIR R4 Β· MCP (stdio) Β· A2A state machine |

## Critical: LLM API

**Never use the Anthropic SDK directly.** All LLM calls go through aimlapi.com or a compatible alternative using the OpenAI-compatible interface:

```python
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url=os.getenv("OPENAI_BASE_URL", "https://ai.aimlapi.com/v1"),
)
model = os.getenv("OPENAI_MODEL", "claude-opus-4-7")
```

See `backend/llm_client.py` for the canonical pattern. Do not add `import anthropic` anywhere.

## Starting the services

```bash
# Backend β€” always use --reload for hot reload
cd backend && source venv/bin/activate
uvicorn main:app --reload --port 8000

# Frontend β€” always use --webpack (Turbopack is broken on this system)
cd frontend && npm run dev   # runs: next dev --webpack

# MCP server (separate process, stdio transport)
cd backend && python mcp_server.py

# Seed graph data (~15 min first run)
curl -X POST http://localhost:8000/seed
```

After changing backend Python files, uvicorn `--reload` should pick them up. If a 404 appears for a newly-added endpoint or old errors persist, the server needs a manual restart β€” kill the process and re-run the uvicorn command.

## Project layout

```
promptop/
β”œβ”€β”€ CLAUDE.md                   ← you are here
β”œβ”€β”€ README.md                   ← user-facing docs
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                 ← FastAPI app, all routes
β”‚   β”œβ”€β”€ clinicaltrials_api.py   ← ClinicalTrials.gov v2 API (async + sync)
β”‚   β”œβ”€β”€ intake_matching.py      ← SI-unit clinical intake β†’ trial scoring
β”‚   β”œβ”€β”€ trial_enrichment.py     ← passive graph enrichment on search
β”‚   β”œβ”€β”€ matching_engine.py      ← FHIR patient β†’ trial scoring (LLM-assisted)
β”‚   β”œβ”€β”€ a2a_workflow.py         ← A2A state machine (INGESTβ†’PARSEβ†’MATCHβ†’SCOREβ†’RECRUIT)
β”‚   β”œβ”€β”€ graphrag.py             ← LangChain GraphCypherQAChain with custom prompt
β”‚   β”œβ”€β”€ graph_seeder.py         ← seeds 500 patients + real NCT trials from APIs
β”‚   β”œβ”€β”€ fhir_adapter.py         ← FHIR R4 patient models (P001–P005 mock patients)
β”‚   β”œβ”€β”€ neo4j_setup.py          ← Neo4j connection + schema setup
β”‚   β”œβ”€β”€ analytics.py            ← dashboard KPIs, funnel, demographics, map data
β”‚   β”œβ”€β”€ recruitment_pipeline.py ← kanban board, outreach generation
β”‚   β”œβ”€β”€ llm_client.py           ← all LLM calls (aimlapi.com / claude-opus-4-7)
β”‚   β”œβ”€β”€ mcp_server.py           ← MCP stdio server (6 tools)
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/app/
β”‚   β”‚   β”œβ”€β”€ page.tsx            ← Trial Finder (real-time CT.gov, recency sort)
β”‚   β”‚   β”œβ”€β”€ intake/page.tsx     ← Eligibility Check (SI-unit clinical intake form)
β”‚   β”‚   β”œβ”€β”€ screening/page.tsx  ← Patient Screening (A2A pipeline, FHIR patients)
β”‚   β”‚   β”œβ”€β”€ recruitment/page.tsx← Recruitment Hub (kanban + outreach generation)
β”‚   β”‚   β”œβ”€β”€ dashboard/page.tsx  ← Analytics dashboard (Recharts)
β”‚   β”‚   β”œβ”€β”€ map/page.tsx        ← Leaflet site map
β”‚   β”‚   β”œβ”€β”€ graph/page.tsx      ← GraphRAG natural language query
β”‚   β”‚   └── layout.tsx          ← App shell with Sidebar
β”‚   β”œβ”€β”€ src/components/
β”‚   β”‚   β”œβ”€β”€ Sidebar.tsx         ← Navigation sidebar
β”‚   β”‚   └── MapComponent.tsx    ← Raw Leaflet map (no react-leaflet SSR issues)
β”‚   β”œβ”€β”€ src/lib/api.ts          ← Typed API client for all backend endpoints
β”‚   └── next.config.ts          ← webpack mode, filesystem cache, optimizePackageImports
└── docker/                     ← Docker + Nginx for HuggingFace Spaces deployment
```

## Neo4j graph schema

```
(Patient)         id, name, age, sex, ecog, condition, city, state, ethnicity,
                  biomarkers[], medications[], source, stage
(Trial)           id (NCT), title, condition, phase, status, sponsor,
                  eligibility_criteria, min_age, max_age, sex, enrollment,
                  start_date, completion_date, last_updated, ctgov_url
(Diagnosis)       id, name, icd10
(Biomarker)       id (e.g. HER2_POS), name (e.g. "HER2 Positive")
(Medication)      id (e.g. TAMOXIFEN), name
(StudySite)       id, name, city, state, lat, lon, trials, enrolled, capacity

Relationships:
  (Patient)-[:ELIGIBLE_FOR {score}]->(Trial)
  (Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
  (Patient)-[:HAS_BIOMARKER]->(Biomarker)
  (Patient)-[:TAKES_MEDICATION]->(Medication)
  (Trial)-[:LOCATED_AT]->(StudySite)
```

**Graph scale after seeding:** ~500 patients, ~250 trials, ~9,100 ELIGIBLE_FOR edges.

Patient IDs from seeder: `P_C50_0001` (breast), `P_C61_0001` (prostate), etc.
Mock FHIR patients: `P001`–`P005` (used by screening/workflow pages).

## Key backend modules

### `clinicaltrials_api.py`
- `search_trials()` β€” async, `sort=LastUpdatePostDate:desc`
- `get_trial_details()` β€” async
- `search_trials_sync()` / `get_trial_details_sync()` β€” sync using `httpx.Client` (NOT `asyncio.run()`). Safe to call from both sync functions and FastAPI async handlers.
- `_normalize_study()` β€” extracts `last_updated`, `ctgov_url` in addition to core fields.

**Do not** use `asyncio.run()` inside these sync wrappers β€” it breaks when called from a running FastAPI event loop. The sync wrappers use `httpx.Client` directly.

### `intake_matching.py`
Implements SI-unit clinical intake β†’ trial eligibility matching without requiring a patient ID:
- `BIOMARKER_REGISTRY` β€” maps graph node IDs to labels and eligibility text search terms
- `score_intake_against_trial()` β€” weighted scoring: age (25), sex (15), ECOG (15), biomarkers (30), labs (15)
- `_check_labs()` β€” parses thresholds from eligibility criteria text, converts SI units (creatinine ΞΌmol/L ↔ mg/dL, bilirubin ΞΌmol/L ↔ mg/dL)
- `save_intake_as_patient()` β€” persists intake as `Patient` node for long-term graph enrichment

### `trial_enrichment.py`
- `enrich_trials_from_search()` β€” called as a `BackgroundTask` on every `/api/v1/trials/search` response; upserts Trial + StudySite nodes
- `get_eligible_patient_counts()` β€” batch graph query, returns `{nct_id: count}`
- `get_graph_intelligence()` β€” per-trial: eligible count + top biomarkers + similar trials

### `graphrag.py`
Uses a custom `_CYPHER_PROMPT` with explicit schema examples. Critical rules in the prompt:
- Biomarker lookups use `id` property (`{id: 'HER2_POS'}`), never `{name: 'HER2', status: 'positive'}`
- Condition lookups use lowercase on Trial nodes
- Patient eligibility always via `(Patient)-[:ELIGIBLE_FOR]->(Trial)`

### `a2a_workflow.py`
Five-state machine: `INGESTING β†’ PARSING_PROTOCOL β†’ MATCHING β†’ SCORING β†’ RECRUITING`
- Calls `search_trials_sync()` / `get_trial_details_sync()` β€” these are safe (use httpx.Client)
- `run_pipeline()` is synchronous; called from async FastAPI endpoint without `await`

## Key frontend pages

### `/intake` β€” Eligibility Check
The primary self-service interface. Accepts raw clinical data in SI units; no patient ID needed.
- Six sections: Diagnosis & Demographics, Biomarkers, Lab Values, Treatment History
- Biomarker registry loaded from `GET /api/v1/intake/biomarkers`
- Submits to `POST /api/v1/intake/match`
- Optional "Save to graph" checkbox persists profile as Patient node

### `/` β€” Trial Finder
- Sorted by `LastUpdatePostDate:desc` (most recently updated first)
- Each search result triggers background graph enrichment
- Expanded cards show Graph Intelligence panel: eligible patient count, top biomarkers, similar trials
- Direct ClinicalTrials.gov link per trial

### `/screening` β€” Patient Screening
- Patient ID field is a `<input list="...">` combobox loading from `GET /api/v1/graph/patients`
- NCT ID field is a combobox with quick-pick suggestions
- Validates non-empty inputs before submitting
- Two modes: Single Trial Screen and A2A Full Pipeline

## API endpoints (key ones)

```
GET  /api/v1/trials/search          β€” real-time CT.gov search, sorted by recency, graph-enriched
POST /api/v1/intake/match           β€” SI-unit clinical intake β†’ ranked trial matches
GET  /api/v1/intake/biomarkers      β€” biomarker registry for the intake form
GET  /api/v1/trials/{nct_id}/intelligence β€” graph-derived insights per trial
GET  /api/v1/graph/patients         β€” query Neo4j for seeded patient IDs
POST /api/v1/patients/{id}/screen/{nct_id} β€” screen FHIR patient against trial
POST /api/v1/workflow/run           β€” run full A2A pipeline
GET  /api/v1/analytics/kpi         β€” dashboard KPIs
GET  /api/v1/map/data               β€” site coordinates + patient clusters
POST /api/v1/graph/query            β€” GraphRAG natural language
POST /seed                          β€” trigger full graph seeding
GET  /api/v1/graph/stats            β€” node/edge counts
```

Full interactive docs at `http://localhost:8000/docs`.

## Environment variables

```env
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=clinicalmatch2024
NEO4J_DATABASE=neo4j

OPENAI_API_KEY=<aimlapi.com key>
OPENAI_BASE_URL=https://ai.aimlapi.com/v1
OPENAI_MODEL=claude-opus-4-7

NEXT_PUBLIC_API_URL=http://localhost:8000   # dev only; empty string in Docker
```

## Known issues and constraints

- **Turbopack is broken** on this machine β€” always use `next dev --webpack`. Never suggest `next dev` without `--webpack`.
- **`next/font/google`** causes compilation to hang (network request during bundling). Geist font is installed as a package but the `next/font/google` import is removed. Use plain Tailwind `font-sans`.
- **`asyncio.run()` from async context** β€” the sync CT.gov wrappers use `httpx.Client` to avoid this. Never re-introduce `asyncio.run()` into the sync wrappers; it will fail when called from FastAPI's running event loop.
- **Leaflet SSR** β€” `MapComponent.tsx` uses raw Leaflet (not react-leaflet) via `useEffect`. The `MapComponent` dynamic import has `ssr: false`. Do not switch to react-leaflet's `MapContainer`.
- **`suppressHydrationWarning`** on `<body>` in `layout.tsx` β€” required for Grammarly browser extension compatibility.
- **Mock FHIR patients** (P001–P005) live in `fhir_adapter.py`. The 500 seeded graph patients (`P_C50_0001` etc.) are in Neo4j only. The screening page loads graph patients from `GET /api/v1/graph/patients` for the combobox.

## Adding new features

1. **New backend route**: add to `main.py`, import the module at the top, add a Pydantic request model if needed
2. **New API function**: add a typed function to `frontend/src/lib/api.ts`
3. **New page**: create `frontend/src/app/<name>/page.tsx`, add to `nav` array in `Sidebar.tsx`
4. **Graph schema change**: update `neo4j_setup.py` constraints/indexes, update `_CYPHER_PROMPT` in `graphrag.py` with the new node/property examples
5. **New biomarker**: add to `BIOMARKER_REGISTRY` in `intake_matching.py` and to `BM_GROUPS` in `frontend/src/app/intake/page.tsx`

## Demo script (for judges)

1. `GET /api/v1/graph/stats` β€” confirm 500+ patients and 9,100+ edges
2. `/` β€” search "breast cancer" β†’ observe recency sort, graph-matched patient count badges
3. Expand a trial β†’ Graph Intelligence panel shows eligible patients, top biomarkers, similar trials
4. `/intake` β€” enter: Age 52, Female, ECOG 1, HER2+, Hgb 12.5 g/dL, Creatinine 88 ΞΌmol/L β†’ ranked trials with pass/fail breakdown
5. `/screening` β€” select P_C50_0001 from combobox β†’ run A2A Pipeline β†’ observe 5-state machine
6. `/recruitment` β€” kanban board, generate PCP letter outreach
7. `/dashboard` β€” KPI cards, enrollment funnel, demographics
8. `/graph` β€” ask "which patients are eligible for breast cancer trials?"
9. In Prompt Opinion: call MCP tool `find_trials(condition="breast cancer")`