TheQuantEd commited on
Commit
59abb4f
·
1 Parent(s): f022dec

Initial deployment: ClinicalMatch AI v2.0 — FHIR R4 · MCP (9 tools) · A2A workflow · SHARP compliance · 100k synthetic patients · Neo4j graph · GraphRAG chatbot

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .claude/project_memory.md +33 -0
  2. .env.example +10 -0
  3. .gitignore +23 -0
  4. CLAUDE.md +234 -0
  5. README.md +262 -7
  6. backend/a2a_workflow.py +315 -0
  7. backend/analytics.py +111 -0
  8. backend/clinicaltrials_api.py +170 -0
  9. backend/consent_agent.py +207 -0
  10. backend/data_ingestion.py +144 -0
  11. backend/fhir_adapter.py +163 -0
  12. backend/fhir_server.py +327 -0
  13. backend/graph_seeder.py +1109 -0
  14. backend/graphrag.py +125 -0
  15. backend/intake_matching.py +374 -0
  16. backend/llm_client.py +209 -0
  17. backend/main.py +705 -0
  18. backend/matching_engine.py +209 -0
  19. backend/mcp_mocks.py +34 -0
  20. backend/mcp_server.py +460 -0
  21. backend/neo4j_setup.py +53 -0
  22. backend/recruitment_pipeline.py +122 -0
  23. backend/requirements.txt +11 -0
  24. backend/rl_enrichment.py +62 -0
  25. backend/trial_enrichment.py +233 -0
  26. docker-compose.yml +84 -0
  27. docker/Dockerfile +128 -0
  28. docker/Dockerfile.backend +15 -0
  29. docker/Dockerfile.frontend +29 -0
  30. docker/entrypoint.sh +61 -0
  31. docker/nginx.conf +80 -0
  32. docker/supervisord.conf +72 -0
  33. frontend/.gitignore +41 -0
  34. frontend/README.md +36 -0
  35. frontend/eslint.config.mjs +18 -0
  36. frontend/next.config.ts +30 -0
  37. frontend/package-lock.json +0 -0
  38. frontend/package.json +34 -0
  39. frontend/postcss.config.mjs +8 -0
  40. frontend/public/file.svg +1 -0
  41. frontend/public/globe.svg +1 -0
  42. frontend/public/next.svg +1 -0
  43. frontend/public/vercel.svg +1 -0
  44. frontend/public/window.svg +1 -0
  45. frontend/scripts/prewarm.mjs +34 -0
  46. frontend/src/app/consent/page.tsx +214 -0
  47. frontend/src/app/dashboard/page.tsx +182 -0
  48. frontend/src/app/favicon.ico +0 -0
  49. frontend/src/app/globals.css +15 -0
  50. frontend/src/app/graph/page.tsx +110 -0
.claude/project_memory.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ClinicalMatch AI — Project Memory
2
+
3
+ Full-stack clinical trial matching agent for "Agents Assemble: Healthcare AI Endgame Challenge" on Prompt Opinion. Submission uses FHIR R4, MCP, and A2A standards.
4
+
5
+ **Stack:** FastAPI + Neo4j + LangChain GraphRAG + Next.js 16 + Recharts + Leaflet
6
+
7
+ **LLM:** aimlapi.com (OpenAI-compatible) with claude-opus-4-7. Never use Anthropic SDK directly.
8
+
9
+ ## Completed features
10
+
11
+ - `/intake` — SI-unit clinical intake form (no patient ID), scores against graph trials, optional graph save
12
+ - Trial Finder (`/`) — real-time CT.gov sorted by recency, passive graph enrichment on every search, Graph Intelligence panel per trial
13
+ - `/screening` — FHIR patient combobox loading 500 graph patients, A2A pipeline (5 states)
14
+ - `/recruitment` — kanban board, AI outreach generation (3 channels)
15
+ - `/dashboard` — KPI cards, enrollment funnel, demographics pie chart
16
+ - `/map` — Leaflet site map with patient density clusters
17
+ - `/graph` — GraphRAG with custom Cypher prompt
18
+ - 500 synthetic patients seeded, ~250 real NCT trials, ~9,100 ELIGIBLE_FOR edges
19
+ - MCP server (6 tools, stdio transport)
20
+ - `trial_enrichment.py` — passive upsert on search, batch eligible-patient counts, similar-trials graph walk
21
+ - `intake_matching.py` — BIOMARKER_REGISTRY, SI unit conversion, regex ECOG + lab threshold parsing
22
+
23
+ ## Known constraints
24
+
25
+ - Turbopack broken — always `next dev --webpack`
26
+ - `next/font/google` removed (hangs compilation) — use Tailwind `font-sans`
27
+ - Sync CT.gov wrappers use `httpx.Client` not `asyncio.run()` (breaks in FastAPI event loop)
28
+ - Leaflet uses raw API via useEffect, not react-leaflet (SSR issues)
29
+ - Mock FHIR patients: P001–P005 (fhir_adapter.py). Graph patients: P_C50_0001 etc.
30
+ - `suppressHydrationWarning` on `<body>` in layout.tsx for Grammarly extension
31
+ - After Python file changes, uvicorn may need manual restart if --reload doesn't trigger
32
+
33
+ See `CLAUDE.md` at repo root for full agent instructions.
.env.example ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Neo4j — local Docker (docker-compose.yml) or Aura
2
+ NEO4J_URI=bolt://localhost:7687
3
+ NEO4J_USERNAME=neo4j
4
+ NEO4J_PASSWORD=clinicalmatch2024
5
+ NEO4J_DATABASE=neo4j
6
+
7
+ # LLM — OpenAI-compatible (aimlapi.com → claude-opus-4-7)
8
+ OPENAI_API_KEY=your-key-here
9
+ OPENAI_BASE_URL=https://ai.aimlapi.com/v1
10
+ OPENAI_MODEL=claude-opus-4-7
.gitignore ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Secrets
2
+ .env
3
+ .env.local
4
+ backend/.env
5
+
6
+ # Python
7
+ backend/venv/
8
+ backend/__pycache__/
9
+ backend/*.pyc
10
+ **/__pycache__/
11
+ *.pyc
12
+
13
+ # Node
14
+ frontend/node_modules/
15
+ frontend/.next/
16
+ frontend/out/
17
+
18
+ # Docker volumes (local)
19
+ neo4j_data/
20
+
21
+ # OS
22
+ .DS_Store
23
+ Thumbs.db
CLAUDE.md ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ClinicalMatch AI — Agent Instructions
2
+
3
+ > Project memory (build state, completed features, constraints) is also tracked in `.claude/project_memory.md` in this repo.
4
+
5
+ This is a hackathon submission for **"Agents Assemble: Healthcare AI Endgame Challenge"** on the Prompt Opinion platform. Judging criteria: MCP compliance, A2A workflow, FHIR R4 standards, AI quality, impact, feasibility.
6
+
7
+ ## Stack at a glance
8
+
9
+ | Layer | Technology |
10
+ |---|---|
11
+ | Backend | FastAPI (Python 3.12), uvicorn |
12
+ | Graph DB | Neo4j Community 5.x via bolt |
13
+ | LLM | claude-opus-4-7 via aimlapi.com (OpenAI-compatible) |
14
+ | GraphRAG | LangChain `GraphCypherQAChain` + custom Cypher prompt |
15
+ | Frontend | Next.js 16 (webpack mode), React 19, Tailwind CSS 3, Recharts, Leaflet |
16
+ | Standards | FHIR R4 · MCP (stdio) · A2A state machine |
17
+
18
+ ## Critical: LLM API
19
+
20
+ **Never use the Anthropic SDK directly.** All LLM calls go through aimlapi.com or a compatible alternative using the OpenAI-compatible interface:
21
+
22
+ ```python
23
+ from openai import OpenAI
24
+
25
+ client = OpenAI(
26
+ api_key=os.getenv("OPENAI_API_KEY"),
27
+ base_url=os.getenv("OPENAI_BASE_URL", "https://ai.aimlapi.com/v1"),
28
+ )
29
+ model = os.getenv("OPENAI_MODEL", "claude-opus-4-7")
30
+ ```
31
+
32
+ See `backend/llm_client.py` for the canonical pattern. Do not add `import anthropic` anywhere.
33
+
34
+ ## Starting the services
35
+
36
+ ```bash
37
+ # Backend — always use --reload for hot reload
38
+ cd backend && source venv/bin/activate
39
+ uvicorn main:app --reload --port 8000
40
+
41
+ # Frontend — always use --webpack (Turbopack is broken on this system)
42
+ cd frontend && npm run dev # runs: next dev --webpack
43
+
44
+ # MCP server (separate process, stdio transport)
45
+ cd backend && python mcp_server.py
46
+
47
+ # Seed graph data (~15 min first run)
48
+ curl -X POST http://localhost:8000/seed
49
+ ```
50
+
51
+ After changing backend Python files, uvicorn `--reload` should pick them up. If a 404 appears for a newly-added endpoint or old errors persist, the server needs a manual restart — kill the process and re-run the uvicorn command.
52
+
53
+ ## Project layout
54
+
55
+ ```
56
+ promptop/
57
+ ├── CLAUDE.md ← you are here
58
+ ├── README.md ← user-facing docs
59
+ ├── backend/
60
+ │ ├── main.py ← FastAPI app, all routes
61
+ │ ├── clinicaltrials_api.py ← ClinicalTrials.gov v2 API (async + sync)
62
+ │ ├── intake_matching.py ← SI-unit clinical intake → trial scoring
63
+ │ ├── trial_enrichment.py ← passive graph enrichment on search
64
+ │ ├── matching_engine.py ← FHIR patient → trial scoring (LLM-assisted)
65
+ │ ├── a2a_workflow.py ← A2A state machine (INGEST→PARSE→MATCH→SCORE→RECRUIT)
66
+ │ ├── graphrag.py ← LangChain GraphCypherQAChain with custom prompt
67
+ │ ├── graph_seeder.py ← seeds 500 patients + real NCT trials from APIs
68
+ │ ├── fhir_adapter.py ← FHIR R4 patient models (P001–P005 mock patients)
69
+ │ ├── neo4j_setup.py ← Neo4j connection + schema setup
70
+ │ ├── analytics.py ← dashboard KPIs, funnel, demographics, map data
71
+ │ ├── recruitment_pipeline.py ← kanban board, outreach generation
72
+ │ ├── llm_client.py ← all LLM calls (aimlapi.com / claude-opus-4-7)
73
+ │ ├── mcp_server.py ← MCP stdio server (6 tools)
74
+ │ └── requirements.txt
75
+ ├── frontend/
76
+ │ ├── src/app/
77
+ │ │ ├── page.tsx ← Trial Finder (real-time CT.gov, recency sort)
78
+ │ │ ├── intake/page.tsx ← Eligibility Check (SI-unit clinical intake form)
79
+ │ │ ├── screening/page.tsx ← Patient Screening (A2A pipeline, FHIR patients)
80
+ │ │ ├── recruitment/page.tsx← Recruitment Hub (kanban + outreach generation)
81
+ │ │ ├── dashboard/page.tsx ← Analytics dashboard (Recharts)
82
+ │ │ ├── map/page.tsx ← Leaflet site map
83
+ │ │ ├── graph/page.tsx ← GraphRAG natural language query
84
+ │ │ └── layout.tsx ← App shell with Sidebar
85
+ │ ├── src/components/
86
+ │ │ ├── Sidebar.tsx ← Navigation sidebar
87
+ │ │ └── MapComponent.tsx ← Raw Leaflet map (no react-leaflet SSR issues)
88
+ │ ├── src/lib/api.ts ← Typed API client for all backend endpoints
89
+ │ └── next.config.ts ← webpack mode, filesystem cache, optimizePackageImports
90
+ └── docker/ ← Docker + Nginx for HuggingFace Spaces deployment
91
+ ```
92
+
93
+ ## Neo4j graph schema
94
+
95
+ ```
96
+ (Patient) id, name, age, sex, ecog, condition, city, state, ethnicity,
97
+ biomarkers[], medications[], source, stage
98
+ (Trial) id (NCT), title, condition, phase, status, sponsor,
99
+ eligibility_criteria, min_age, max_age, sex, enrollment,
100
+ start_date, completion_date, last_updated, ctgov_url
101
+ (Diagnosis) id, name, icd10
102
+ (Biomarker) id (e.g. HER2_POS), name (e.g. "HER2 Positive")
103
+ (Medication) id (e.g. TAMOXIFEN), name
104
+ (StudySite) id, name, city, state, lat, lon, trials, enrolled, capacity
105
+
106
+ Relationships:
107
+ (Patient)-[:ELIGIBLE_FOR {score}]->(Trial)
108
+ (Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
109
+ (Patient)-[:HAS_BIOMARKER]->(Biomarker)
110
+ (Patient)-[:TAKES_MEDICATION]->(Medication)
111
+ (Trial)-[:LOCATED_AT]->(StudySite)
112
+ ```
113
+
114
+ **Graph scale after seeding:** ~500 patients, ~250 trials, ~9,100 ELIGIBLE_FOR edges.
115
+
116
+ Patient IDs from seeder: `P_C50_0001` (breast), `P_C61_0001` (prostate), etc.
117
+ Mock FHIR patients: `P001`–`P005` (used by screening/workflow pages).
118
+
119
+ ## Key backend modules
120
+
121
+ ### `clinicaltrials_api.py`
122
+ - `search_trials()` — async, `sort=LastUpdatePostDate:desc`
123
+ - `get_trial_details()` — async
124
+ - `search_trials_sync()` / `get_trial_details_sync()` — sync using `httpx.Client` (NOT `asyncio.run()`). Safe to call from both sync functions and FastAPI async handlers.
125
+ - `_normalize_study()` — extracts `last_updated`, `ctgov_url` in addition to core fields.
126
+
127
+ **Do not** use `asyncio.run()` inside these sync wrappers — it breaks when called from a running FastAPI event loop. The sync wrappers use `httpx.Client` directly.
128
+
129
+ ### `intake_matching.py`
130
+ Implements SI-unit clinical intake → trial eligibility matching without requiring a patient ID:
131
+ - `BIOMARKER_REGISTRY` — maps graph node IDs to labels and eligibility text search terms
132
+ - `score_intake_against_trial()` — weighted scoring: age (25), sex (15), ECOG (15), biomarkers (30), labs (15)
133
+ - `_check_labs()` — parses thresholds from eligibility criteria text, converts SI units (creatinine μmol/L ↔ mg/dL, bilirubin μmol/L ↔ mg/dL)
134
+ - `save_intake_as_patient()` — persists intake as `Patient` node for long-term graph enrichment
135
+
136
+ ### `trial_enrichment.py`
137
+ - `enrich_trials_from_search()` — called as a `BackgroundTask` on every `/api/v1/trials/search` response; upserts Trial + StudySite nodes
138
+ - `get_eligible_patient_counts()` — batch graph query, returns `{nct_id: count}`
139
+ - `get_graph_intelligence()` — per-trial: eligible count + top biomarkers + similar trials
140
+
141
+ ### `graphrag.py`
142
+ Uses a custom `_CYPHER_PROMPT` with explicit schema examples. Critical rules in the prompt:
143
+ - Biomarker lookups use `id` property (`{id: 'HER2_POS'}`), never `{name: 'HER2', status: 'positive'}`
144
+ - Condition lookups use lowercase on Trial nodes
145
+ - Patient eligibility always via `(Patient)-[:ELIGIBLE_FOR]->(Trial)`
146
+
147
+ ### `a2a_workflow.py`
148
+ Five-state machine: `INGESTING → PARSING_PROTOCOL → MATCHING → SCORING → RECRUITING`
149
+ - Calls `search_trials_sync()` / `get_trial_details_sync()` — these are safe (use httpx.Client)
150
+ - `run_pipeline()` is synchronous; called from async FastAPI endpoint without `await`
151
+
152
+ ## Key frontend pages
153
+
154
+ ### `/intake` — Eligibility Check
155
+ The primary self-service interface. Accepts raw clinical data in SI units; no patient ID needed.
156
+ - Six sections: Diagnosis & Demographics, Biomarkers, Lab Values, Treatment History
157
+ - Biomarker registry loaded from `GET /api/v1/intake/biomarkers`
158
+ - Submits to `POST /api/v1/intake/match`
159
+ - Optional "Save to graph" checkbox persists profile as Patient node
160
+
161
+ ### `/` — Trial Finder
162
+ - Sorted by `LastUpdatePostDate:desc` (most recently updated first)
163
+ - Each search result triggers background graph enrichment
164
+ - Expanded cards show Graph Intelligence panel: eligible patient count, top biomarkers, similar trials
165
+ - Direct ClinicalTrials.gov link per trial
166
+
167
+ ### `/screening` — Patient Screening
168
+ - Patient ID field is a `<input list="...">` combobox loading from `GET /api/v1/graph/patients`
169
+ - NCT ID field is a combobox with quick-pick suggestions
170
+ - Validates non-empty inputs before submitting
171
+ - Two modes: Single Trial Screen and A2A Full Pipeline
172
+
173
+ ## API endpoints (key ones)
174
+
175
+ ```
176
+ GET /api/v1/trials/search — real-time CT.gov search, sorted by recency, graph-enriched
177
+ POST /api/v1/intake/match — SI-unit clinical intake → ranked trial matches
178
+ GET /api/v1/intake/biomarkers — biomarker registry for the intake form
179
+ GET /api/v1/trials/{nct_id}/intelligence — graph-derived insights per trial
180
+ GET /api/v1/graph/patients — query Neo4j for seeded patient IDs
181
+ POST /api/v1/patients/{id}/screen/{nct_id} — screen FHIR patient against trial
182
+ POST /api/v1/workflow/run — run full A2A pipeline
183
+ GET /api/v1/analytics/kpi — dashboard KPIs
184
+ GET /api/v1/map/data — site coordinates + patient clusters
185
+ POST /api/v1/graph/query — GraphRAG natural language
186
+ POST /seed — trigger full graph seeding
187
+ GET /api/v1/graph/stats — node/edge counts
188
+ ```
189
+
190
+ Full interactive docs at `http://localhost:8000/docs`.
191
+
192
+ ## Environment variables
193
+
194
+ ```env
195
+ NEO4J_URI=bolt://localhost:7687
196
+ NEO4J_USERNAME=neo4j
197
+ NEO4J_PASSWORD=clinicalmatch2024
198
+ NEO4J_DATABASE=neo4j
199
+
200
+ OPENAI_API_KEY=<aimlapi.com key>
201
+ OPENAI_BASE_URL=https://ai.aimlapi.com/v1
202
+ OPENAI_MODEL=claude-opus-4-7
203
+
204
+ NEXT_PUBLIC_API_URL=http://localhost:8000 # dev only; empty string in Docker
205
+ ```
206
+
207
+ ## Known issues and constraints
208
+
209
+ - **Turbopack is broken** on this machine — always use `next dev --webpack`. Never suggest `next dev` without `--webpack`.
210
+ - **`next/font/google`** causes compilation to hang (network request during bundling). Geist font is installed as a package but the `next/font/google` import is removed. Use plain Tailwind `font-sans`.
211
+ - **`asyncio.run()` from async context** — the sync CT.gov wrappers use `httpx.Client` to avoid this. Never re-introduce `asyncio.run()` into the sync wrappers; it will fail when called from FastAPI's running event loop.
212
+ - **Leaflet SSR** — `MapComponent.tsx` uses raw Leaflet (not react-leaflet) via `useEffect`. The `MapComponent` dynamic import has `ssr: false`. Do not switch to react-leaflet's `MapContainer`.
213
+ - **`suppressHydrationWarning`** on `<body>` in `layout.tsx` — required for Grammarly browser extension compatibility.
214
+ - **Mock FHIR patients** (P001–P005) live in `fhir_adapter.py`. The 500 seeded graph patients (`P_C50_0001` etc.) are in Neo4j only. The screening page loads graph patients from `GET /api/v1/graph/patients` for the combobox.
215
+
216
+ ## Adding new features
217
+
218
+ 1. **New backend route**: add to `main.py`, import the module at the top, add a Pydantic request model if needed
219
+ 2. **New API function**: add a typed function to `frontend/src/lib/api.ts`
220
+ 3. **New page**: create `frontend/src/app/<name>/page.tsx`, add to `nav` array in `Sidebar.tsx`
221
+ 4. **Graph schema change**: update `neo4j_setup.py` constraints/indexes, update `_CYPHER_PROMPT` in `graphrag.py` with the new node/property examples
222
+ 5. **New biomarker**: add to `BIOMARKER_REGISTRY` in `intake_matching.py` and to `BM_GROUPS` in `frontend/src/app/intake/page.tsx`
223
+
224
+ ## Demo script (for judges)
225
+
226
+ 1. `GET /api/v1/graph/stats` — confirm 500+ patients and 9,100+ edges
227
+ 2. `/` — search "breast cancer" → observe recency sort, graph-matched patient count badges
228
+ 3. Expand a trial → Graph Intelligence panel shows eligible patients, top biomarkers, similar trials
229
+ 4. `/intake` — enter: Age 52, Female, ECOG 1, HER2+, Hgb 12.5 g/dL, Creatinine 88 μmol/L → ranked trials with pass/fail breakdown
230
+ 5. `/screening` — select P_C50_0001 from combobox → run A2A Pipeline → observe 5-state machine
231
+ 6. `/recruitment` — kanban board, generate PCP letter outreach
232
+ 7. `/dashboard` — KPI cards, enrollment funnel, demographics
233
+ 8. `/graph` — ask "which patients are eligible for breast cancer trials?"
234
+ 9. In Prompt Opinion: call MCP tool `find_trials(condition="breast cancer")`
README.md CHANGED
@@ -1,12 +1,267 @@
1
  ---
2
- title: CTA
3
- emoji: 🏢
4
- colorFrom: pink
5
  colorTo: purple
6
  sdk: docker
7
- pinned: false
8
- license: apache-2.0
9
- short_description: Clinical trial matching agent with MCP , APA
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ClinicalMatch AI
3
+ emoji: 🧬
4
+ colorFrom: indigo
5
  colorTo: purple
6
  sdk: docker
7
+ app_port: 7860
8
+ pinned: true
 
9
  ---
10
 
11
+ # ClinicalMatch AI Precision Clinical Trial Matching & Recruitment Agent
12
+
13
+ **"Agents Assemble: Healthcare AI Endgame Challenge"** — Prompt Opinion platform
14
+ Standards: **FHIR R4 · MCP · A2A**
15
+
16
+ > 80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that.
17
+
18
+ ---
19
+
20
+ ## What it does
21
+
22
+ ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring.
23
+
24
+ **Key capabilities:**
25
+
26
+ | Feature | Description |
27
+ |---|---|
28
+ | **Eligibility Check** | Individual enters raw clinical data (age, labs in SI units, biomarkers) — no patient ID required — and receives ranked, explainable trial matches |
29
+ | **Trial Finder** | Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph |
30
+ | **Graph Intelligence** | Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk |
31
+ | **A2A Pipeline** | 5-state orchestration (INGEST → PARSE → MATCH → SCORE → RECRUIT) for FHIR patient profiles |
32
+ | **Recruitment Hub** | Kanban board tracking patients through IDENTIFIED → ENROLLED; generates personalized outreach (PCP letter, patient email, social post) |
33
+ | **GraphRAG** | Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?") |
34
+ | **MCP Server** | 6 tools callable by Prompt Opinion directly via stdio transport |
35
+
36
+ ---
37
+
38
+ ## Architecture
39
+
40
+ ```
41
+ Prompt Opinion Platform
42
+ │ MCP Protocol (stdio)
43
+
44
+ ┌────────────────────────────────────────────────────┐
45
+ │ MCP Server (mcp_server.py) │
46
+ │ find_trials · screen_patient · match_patient │
47
+ │ generate_outreach · get_analytics · summarize │
48
+ └──────────────────────┬─────────────────────────────┘
49
+ │ A2A Orchestration
50
+
51
+ ┌────────────────────────────────────────────────────┐
52
+ │ FastAPI Backend (main.py, port 8000) │
53
+ │ 30+ REST endpoints │
54
+ ├──────────┬────────────┬────────────┬───────────────┤
55
+ │ CT.gov │ FHIR R4 │ Claude │ Neo4j Graph │
56
+ │ live API │ adapter │ LLM │ RAG + match │
57
+ └──────────┴────────────┴────────────┴───────────────┘
58
+
59
+
60
+ ┌────────────────────────────────────────────────────┐
61
+ │ Next.js 16 Frontend (port 3000) │
62
+ │ Trial Finder · Eligibility Check · Screening │
63
+ │ Recruitment Hub · Dashboard · Map · GraphRAG │
64
+ └────────────────────────────────────────────────────┘
65
+ │ Nginx (port 7860)
66
+
67
+ HuggingFace Spaces
68
+ ```
69
+
70
+ **Data sources (all free, no auth):**
71
+
72
+ | Source | Data |
73
+ |---|---|
74
+ | ClinicalTrials.gov v2 | Real recruiting NCT trials, sorted by recency |
75
+ | RxNorm (NIH) | Medication RxCUI codes |
76
+ | ICD-10 CM (NLM) | Cancer diagnosis codes |
77
+ | PubMed (NCBI) | Supporting literature PMIDs |
78
+ | OpenFDA | Drug labels and adverse events |
79
+ | Synthetic | 500 realistic patient profiles matched to real trials |
80
+
81
+ ---
82
+
83
+ ## Graph Knowledge Base
84
+
85
+ After seeding, the Neo4j graph contains:
86
+
87
+ | Node type | Count | Key properties |
88
+ |---|---|---|
89
+ | Patient | 500 | age, sex, ECOG, condition, city, biomarkers[], medications[] |
90
+ | Trial | ~250 | NCT ID, eligibility criteria, phase, last_updated |
91
+ | Diagnosis | ~130 | ICD-10 codes across 10 oncology conditions |
92
+ | Biomarker | 20 | HER2+/−, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc. |
93
+ | Medication | 16 | Trastuzumab, Pembrolizumab, Olaparib, etc. |
94
+ | StudySite | ~200 | lat/lon coordinates |
95
+ | **ELIGIBLE_FOR edges** | **~9,100** | score, linking patients to trials |
96
+
97
+ The graph grows passively — every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges.
98
+
99
+ ---
100
+
101
+ ## Clinical Eligibility Check (SI Units)
102
+
103
+ The `/intake` page accepts raw clinical data — no patient ID or account required. Fields:
104
+
105
+ **Demographics:** Age (years), Sex, ECOG performance status (0–4), Disease stage (I–IV)
106
+
107
+ **Biomarker status (toggles):**
108
+ - Breast/Gynecologic: HER2+/−, ER+, PR+, BRCA1/2 mutation, Triple-Negative
109
+ - Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1
110
+ - GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E
111
+ - Hematology: FLT3, IDH1/2, BCR-ABL
112
+
113
+ **Lab values (SI units):**
114
+
115
+ | Field | Unit | Conversion |
116
+ |---|---|---|
117
+ | Haemoglobin | g/dL | — |
118
+ | WBC | ×10⁹/L | — |
119
+ | ANC | ×10⁹/L | — |
120
+ | Platelets | ×10⁹/L | — |
121
+ | Creatinine | **μmol/L** | auto-converted ÷88.4 → mg/dL for trial text |
122
+ | eGFR | mL/min/1.73m² | — |
123
+ | Bilirubin | **μmol/L** | auto-converted ÷17.1 → mg/dL for trial text |
124
+ | ALT / AST | U/L | — |
125
+
126
+ Matching score breakdown:
127
+ - **Age** 25 pts — compared against trial min/max age
128
+ - **Sex** 15 pts — compared against trial sex restriction
129
+ - **ECOG** 15 pts — extracted via regex from eligibility criteria text
130
+ - **Biomarkers** 30 pts — checks whether biomarker terms appear in trial eligibility text
131
+ - **Lab values** 15 pts — parses thresholds from text, converts SI units, checks patient values
132
+
133
+ Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links.
134
+
135
+ ---
136
+
137
+ ## Running Locally (no Docker)
138
+
139
+ ```bash
140
+ # 1. Start Neo4j
141
+ docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community
142
+
143
+ # 2. Backend
144
+ cd backend
145
+ python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
146
+ cp ../.env.example ../.env.local # fill in credentials
147
+ uvicorn main:app --reload --port 8000
148
+
149
+ # 3. Schema setup (once)
150
+ curl -X POST http://localhost:8000/setup
151
+
152
+ # 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients)
153
+ curl -X POST http://localhost:8000/seed
154
+
155
+ # 5. Frontend
156
+ cd frontend
157
+ npm install --legacy-peer-deps
158
+ npm run dev # http://localhost:3000 (uses --webpack, not Turbopack)
159
+
160
+ # 6. MCP server (for Prompt Opinion integration)
161
+ cd backend
162
+ python mcp_server.py
163
+ ```
164
+
165
+ ---
166
+
167
+ ## Running with Docker Compose
168
+
169
+ ```bash
170
+ cp .env.example .env.local # fill in OPENAI_API_KEY etc.
171
+ docker compose up -d
172
+
173
+ # Wait ~60s for Neo4j to be healthy, then:
174
+ curl -X POST http://localhost:7860/setup
175
+ curl -X POST http://localhost:7860/seed
176
+ ```
177
+
178
+ Services: app → http://localhost:7860 | API docs → http://localhost:7860/api/docs | Neo4j → http://localhost:7474
179
+
180
+ ---
181
+
182
+ ## Deploying to HuggingFace Spaces
183
+
184
+ 1. Create a Space → **Docker SDK** → blank template
185
+ 2. Push repo to the Space:
186
+ ```bash
187
+ git remote add hf https://huggingface.co/spaces/<username>/<space-name>
188
+ git push hf main
189
+ ```
190
+ 3. Set **Repository Secrets**:
191
+ ```
192
+ OPENAI_API_KEY = <aimlapi.com key>
193
+ OPENAI_BASE_URL = https://ai.aimlapi.com/v1
194
+ OPENAI_MODEL = claude-opus-4-7
195
+ NEO4J_PASSWORD = clinicalmatch2024
196
+ ```
197
+ 4. After first boot, seed data:
198
+ ```
199
+ POST https://<space>.hf.space/seed
200
+ ```
201
+
202
+ ---
203
+
204
+ ## MCP Tools (Prompt Opinion integration)
205
+
206
+ ```bash
207
+ python backend/mcp_server.py # stdio transport
208
+ ```
209
+
210
+ | Tool | Arguments | Description |
211
+ |---|---|---|
212
+ | `find_trials` | `condition, phase?` | Real-time trial search |
213
+ | `screen_patient` | `patient_id, nct_id` | Eligibility screening |
214
+ | `match_patient_to_trials` | `patient_id` | Top-N trial matches |
215
+ | `generate_recruitment_outreach` | `patient_id, nct_id, channel` | Personalized outreach |
216
+ | `get_trial_analytics` | — | Enrollment funnel + KPIs |
217
+ | `summarize_trial_protocol` | `nct_id` | AI-parsed protocol summary |
218
+
219
+ ---
220
+
221
+ ## Key API Endpoints
222
+
223
+ | Method | Path | Description |
224
+ |---|---|---|
225
+ | POST | `/api/v1/intake/match` | SI-unit intake → ranked trial matches |
226
+ | GET | `/api/v1/intake/biomarkers` | Biomarker registry |
227
+ | GET | `/api/v1/trials/search` | Real-time CT.gov search (recency-sorted, graph-enriched) |
228
+ | GET | `/api/v1/trials/{nct_id}/intelligence` | Graph intelligence per trial |
229
+ | GET | `/api/v1/graph/patients` | Query seeded patient IDs from Neo4j |
230
+ | POST | `/api/v1/patients/{id}/screen/{nct_id}` | Screen FHIR patient against trial |
231
+ | POST | `/api/v1/workflow/run` | Run full A2A pipeline |
232
+ | GET | `/api/v1/analytics/kpi` | Dashboard KPIs |
233
+ | GET | `/api/v1/map/data` | Site coordinates + patient clusters |
234
+ | POST | `/api/v1/graph/query` | GraphRAG natural language query |
235
+ | POST | `/seed` | Seed full graph from live APIs |
236
+ | GET | `/api/v1/graph/stats` | Node and edge counts |
237
+
238
+ Full interactive docs: `http://localhost:8000/docs`
239
+
240
+ ---
241
+
242
+ ## Environment Variables
243
+
244
+ | Variable | Description | Default |
245
+ |---|---|---|
246
+ | `NEO4J_URI` | Neo4j bolt URI | `bolt://localhost:7687` |
247
+ | `NEO4J_USERNAME` | Neo4j username | `neo4j` |
248
+ | `NEO4J_PASSWORD` | Neo4j password | `clinicalmatch2024` |
249
+ | `NEO4J_DATABASE` | Database name | `neo4j` |
250
+ | `OPENAI_API_KEY` | aimlapi.com API key | — |
251
+ | `OPENAI_BASE_URL` | LLM base URL | `https://ai.aimlapi.com/v1` |
252
+ | `OPENAI_MODEL` | Model identifier | `claude-opus-4-7` |
253
+ | `NEXT_PUBLIC_API_URL` | Frontend API base URL | `""` (relative, via Nginx) |
254
+
255
+ ---
256
+
257
+ ## Frontend Pages
258
+
259
+ | Route | Page | Description |
260
+ |---|---|---|
261
+ | `/` | Trial Finder | Real-time CT.gov search, recency-sorted, graph intelligence on expand |
262
+ | `/intake` | Eligibility Check | SI-unit clinical intake form, no patient ID required |
263
+ | `/screening` | Patient Screening | FHIR patient + trial combobox, A2A pipeline with state tracker |
264
+ | `/recruitment` | Recruitment Hub | Kanban board, AI outreach generation (PCP / email / social) |
265
+ | `/dashboard` | Dashboard | KPI cards, enrollment funnel, demographics, site performance |
266
+ | `/map` | Site Map | Leaflet map of trial sites and patient density clusters |
267
+ | `/graph` | GraphRAG | Natural language queries over the knowledge graph |
backend/a2a_workflow.py ADDED
@@ -0,0 +1,315 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """A2A (Agent-to-Agent) orchestration workflow — state machine for the recruitment pipeline.
2
+
3
+ Every inter-agent message carries a SHARP Extension Spec context envelope:
4
+ sharp_version, patient_context (id, fhir_ref, fhir_base, tenant_id, session_id),
5
+ data_classification, baa_in_scope, consent_status
6
+ """
7
+ import uuid
8
+ import time
9
+ from datetime import datetime
10
+ from enum import Enum
11
+ from typing import Any
12
+ from fhir_adapter import get_patient_profile, get_mock_fhir_patient, build_patient_profile
13
+ from clinicaltrials_api import search_trials_sync, get_trial_details_sync
14
+ from matching_engine import get_criteria_for_trial, score_patient_for_trial, match_patient_to_trials
15
+ from llm_client import generate_outreach_message, summarize_trial
16
+ from fhir_server import build_sharp_context, get_live_patient_profile
17
+ import consent_agent
18
+
19
+
20
+ class WorkflowState(str, Enum):
21
+ PENDING = "PENDING"
22
+ INGESTING = "INGESTING"
23
+ PARSING_PROTOCOL = "PARSING_PROTOCOL"
24
+ MATCHING = "MATCHING"
25
+ SCORING = "SCORING"
26
+ RECRUITING = "RECRUITING"
27
+ COMPLETED = "COMPLETED"
28
+ FAILED = "FAILED"
29
+
30
+
31
+ # In-memory workflow store (production: use Redis or Neo4j)
32
+ _workflows: dict[str, dict] = {}
33
+
34
+
35
+ def _emit_event(workflow_id: str, state: WorkflowState, message: str, data: Any = None):
36
+ workflow = _workflows[workflow_id]
37
+ event = {
38
+ "state": state,
39
+ "message": message,
40
+ "timestamp": datetime.utcnow().isoformat(),
41
+ "data": data,
42
+ # SHARP envelope on every event so downstream agents have full context
43
+ "sharp_context": workflow.get("sharp_context", {}),
44
+ }
45
+ workflow["events"].append(event)
46
+ workflow["current_state"] = state
47
+ workflow["updated_at"] = datetime.utcnow().isoformat()
48
+ print(f"[A2A:{workflow_id[:8]}] {state} — {message}")
49
+
50
+
51
+ # ── Sub-agents ────────────────────────────────────────────────────────────────
52
+
53
+ def _agent_ingest_patient(workflow_id: str, patient_id: str) -> dict:
54
+ """Sub-agent: Ingest and validate patient FHIR data."""
55
+ _emit_event(workflow_id, WorkflowState.INGESTING, f"Ingesting FHIR R4 data for patient {patient_id}")
56
+ time.sleep(0.3) # Simulate async data fetch
57
+
58
+ fhir_patient = get_mock_fhir_patient(patient_id)
59
+ if not fhir_patient:
60
+ raise ValueError(f"Patient {patient_id} not found in FHIR registry")
61
+
62
+ profile = build_patient_profile(fhir_patient)
63
+ _emit_event(workflow_id, WorkflowState.INGESTING,
64
+ f"FHIR data loaded: {len(fhir_patient.conditions)} conditions, {len(fhir_patient.medications)} medications",
65
+ {"profile": profile})
66
+ return profile
67
+
68
+
69
+ def _agent_parse_protocol(workflow_id: str, nct_id: str | None, condition: str) -> tuple[list[dict], dict]:
70
+ """Sub-agent: Parse trial protocol and extract criteria."""
71
+ _emit_event(workflow_id, WorkflowState.PARSING_PROTOCOL,
72
+ f"Parsing trial protocols for condition: {condition}")
73
+ time.sleep(0.5)
74
+
75
+ if nct_id:
76
+ trials = [get_trial_details_sync(nct_id)]
77
+ trials = [t for t in trials if t]
78
+ else:
79
+ trials = search_trials_sync(condition, page_size=8)
80
+
81
+ if not trials:
82
+ raise ValueError(f"No trials found for condition: {condition}")
83
+
84
+ # Parse criteria for each trial using LLM
85
+ parsed_trials = []
86
+ for trial in trials[:5]: # Limit to avoid timeout
87
+ criteria = get_criteria_for_trial(trial)
88
+ parsed_trials.append({**trial, "parsed_criteria": criteria})
89
+
90
+ summary = summarize_trial(trials[0]) if trials else ""
91
+ _emit_event(workflow_id, WorkflowState.PARSING_PROTOCOL,
92
+ f"Parsed {len(parsed_trials)} trial protocols",
93
+ {"trial_count": len(parsed_trials), "protocol_summary": summary})
94
+ return parsed_trials, {"summary": summary}
95
+
96
+
97
+ def _agent_match(workflow_id: str, patient_profile: dict, trials: list[dict]) -> list[dict]:
98
+ """Sub-agent: Semantic matching of patient to trials."""
99
+ _emit_event(workflow_id, WorkflowState.MATCHING,
100
+ f"Running semantic matching for patient {patient_profile['patient_id']} against {len(trials)} trials")
101
+ time.sleep(0.3)
102
+
103
+ candidates = []
104
+ for trial in trials:
105
+ score_result = score_patient_for_trial(patient_profile["patient_id"], trial)
106
+ candidates.append({
107
+ **trial,
108
+ "match_score": score_result.get("overall_score", 0.0),
109
+ "eligible": score_result.get("eligible", False),
110
+ "inclusion_results": score_result.get("inclusion_results", []),
111
+ "exclusion_results": score_result.get("exclusion_results", []),
112
+ "match_summary": score_result.get("summary", ""),
113
+ "risk_flags": score_result.get("risk_flags", []),
114
+ })
115
+
116
+ candidates.sort(key=lambda x: x["match_score"], reverse=True)
117
+ eligible = [c for c in candidates if c["eligible"]]
118
+ _emit_event(workflow_id, WorkflowState.MATCHING,
119
+ f"Matching complete: {len(eligible)}/{len(candidates)} trials eligible",
120
+ {"eligible_count": len(eligible), "top_score": candidates[0]["match_score"] if candidates else 0})
121
+ return candidates
122
+
123
+
124
+ def _agent_score(workflow_id: str, candidates: list[dict], patient_profile: dict) -> list[dict]:
125
+ """Sub-agent: Predictive screening scoring with risk flags."""
126
+ _emit_event(workflow_id, WorkflowState.SCORING, "Running predictive screening analysis")
127
+ time.sleep(0.2)
128
+
129
+ for candidate in candidates:
130
+ flags = candidate.get("risk_flags", [])
131
+ # Add distance risk flag if no nearby sites
132
+ locs = candidate.get("locations", [])
133
+ if not locs:
134
+ flags.append("No site location data available")
135
+ # Add data completeness flag
136
+ if not patient_profile.get("biomarkers"):
137
+ flags.append("Biomarker data incomplete — may affect screening")
138
+ candidate["risk_flags"] = flags
139
+ candidate["screening_priority"] = (
140
+ "HIGH" if candidate["match_score"] >= 0.8
141
+ else "MEDIUM" if candidate["match_score"] >= 0.5
142
+ else "LOW"
143
+ )
144
+
145
+ _emit_event(workflow_id, WorkflowState.SCORING,
146
+ "Screening scoring complete",
147
+ {"high_priority": sum(1 for c in candidates if c.get("screening_priority") == "HIGH")})
148
+ return candidates
149
+
150
+
151
+ def _agent_recruit(workflow_id: str, candidates: list[dict], patient_profile: dict) -> list[dict]:
152
+ """Sub-agent: Generate recruitment outreach for eligible candidates."""
153
+ _emit_event(workflow_id, WorkflowState.RECRUITING, "Generating personalized recruitment communications")
154
+
155
+ eligible = [c for c in candidates if c.get("eligible")][:3]
156
+ recruitment_records = []
157
+
158
+ for trial in eligible:
159
+ try:
160
+ outreach = generate_outreach_message(patient_profile, trial, "patient_email")
161
+ pcp_letter = generate_outreach_message(patient_profile, trial, "pcp_letter")
162
+
163
+ # A2A handoff → consent agent (SHARP envelope attached)
164
+ consent_task = {
165
+ "task_id": f"consent_{workflow_id}_{trial.get('nct_id','')}",
166
+ "type": "CONSENT_REQUEST",
167
+ "payload": {
168
+ "patient_id": patient_profile.get("patient_id", ""),
169
+ "nct_id": trial.get("nct_id", ""),
170
+ "trial_title": trial.get("title", ""),
171
+ "match_score": trial.get("match_score", 0.0),
172
+ },
173
+ "sharp_context": _workflows[workflow_id].get("sharp_context", {}),
174
+ }
175
+ consent_result = consent_agent.receive_a2a_task(consent_task)
176
+
177
+ recruitment_records.append({
178
+ "nct_id": trial.get("nct_id", ""),
179
+ "trial_title": trial.get("title", ""),
180
+ "match_score": trial.get("match_score", 0.0),
181
+ "patient_email": outreach,
182
+ "pcp_letter": pcp_letter,
183
+ "status": "PENDING",
184
+ "consent_id": consent_result.get("consent_id"),
185
+ "consent_status": consent_result.get("status", "PENDING"),
186
+ "created_at": datetime.utcnow().isoformat(),
187
+ })
188
+ except Exception as e:
189
+ recruitment_records.append({
190
+ "nct_id": trial.get("nct_id", ""),
191
+ "trial_title": trial.get("title", ""),
192
+ "error": str(e),
193
+ "status": "ERROR",
194
+ })
195
+
196
+ _emit_event(workflow_id, WorkflowState.RECRUITING,
197
+ f"Generated outreach for {len(recruitment_records)} trials",
198
+ {"record_count": len(recruitment_records)})
199
+ return recruitment_records
200
+
201
+
202
+ # ── Public API ─────────────────────────────────────────────────────────────────
203
+
204
+ def start_pipeline(
205
+ patient_id: str,
206
+ nct_id: str | None = None,
207
+ condition: str | None = None,
208
+ fhir_token: str | None = None,
209
+ fhir_base_url: str | None = None,
210
+ session_id: str | None = None,
211
+ ) -> str:
212
+ """Start the A2A pipeline and return a workflow_id."""
213
+ workflow_id = str(uuid.uuid4())
214
+ sharp_ctx = build_sharp_context(
215
+ patient_id=patient_id,
216
+ fhir_ref=f"Patient/{patient_id}",
217
+ session_id=session_id or workflow_id,
218
+ )
219
+ if fhir_token:
220
+ sharp_ctx["fhir_token"] = fhir_token
221
+ if fhir_base_url:
222
+ sharp_ctx["patient_context"]["fhir_base"] = fhir_base_url
223
+
224
+ _workflows[workflow_id] = {
225
+ "workflow_id": workflow_id,
226
+ "patient_id": patient_id,
227
+ "nct_id": nct_id,
228
+ "condition": condition,
229
+ "current_state": WorkflowState.PENDING,
230
+ "events": [],
231
+ "result": None,
232
+ "sharp_context": sharp_ctx,
233
+ "created_at": datetime.utcnow().isoformat(),
234
+ "updated_at": datetime.utcnow().isoformat(),
235
+ }
236
+ return workflow_id
237
+
238
+
239
+ def run_pipeline(workflow_id: str) -> dict:
240
+ """Execute the full A2A pipeline synchronously."""
241
+ workflow = _workflows.get(workflow_id)
242
+ if not workflow:
243
+ raise ValueError(f"Workflow {workflow_id} not found")
244
+
245
+ patient_id = workflow["patient_id"]
246
+ nct_id = workflow.get("nct_id")
247
+ condition = workflow.get("condition")
248
+
249
+ try:
250
+ # Agent 1: Ingest FHIR patient data
251
+ patient_profile = _agent_ingest_patient(workflow_id, patient_id)
252
+
253
+ # Infer condition
254
+ if not condition and patient_profile.get("diagnosis_names"):
255
+ condition = patient_profile["diagnosis_names"][0]
256
+ elif not condition:
257
+ condition = "cancer"
258
+
259
+ # Agent 2: Parse trial protocols
260
+ trials, protocol_meta = _agent_parse_protocol(workflow_id, nct_id, condition)
261
+
262
+ # Agent 3: Semantic matching
263
+ candidates = _agent_match(workflow_id, patient_profile, trials)
264
+
265
+ # Agent 4: Predictive scoring
266
+ candidates = _agent_score(workflow_id, candidates, patient_profile)
267
+
268
+ # Agent 5: Recruitment communication
269
+ recruitment_records = _agent_recruit(workflow_id, candidates, patient_profile)
270
+
271
+ result = {
272
+ "patient_profile": patient_profile,
273
+ "matched_trials": candidates,
274
+ "recruitment_records": recruitment_records,
275
+ "protocol_summary": protocol_meta.get("summary", ""),
276
+ "total_trials_evaluated": len(trials),
277
+ "eligible_trials": sum(1 for c in candidates if c.get("eligible")),
278
+ }
279
+
280
+ workflow["result"] = result
281
+ _emit_event(workflow_id, WorkflowState.COMPLETED,
282
+ f"Pipeline complete: {result['eligible_trials']} eligible trials found", result)
283
+
284
+ except Exception as e:
285
+ _emit_event(workflow_id, WorkflowState.FAILED, f"Pipeline failed: {str(e)}")
286
+ workflow["error"] = str(e)
287
+
288
+ return _workflows[workflow_id]
289
+
290
+
291
+ def get_workflow_status(workflow_id: str) -> dict:
292
+ workflow = _workflows.get(workflow_id)
293
+ if not workflow:
294
+ return {"error": "Workflow not found"}
295
+ return {
296
+ "workflow_id": workflow_id,
297
+ "current_state": workflow["current_state"],
298
+ "events": workflow["events"][-10:], # Last 10 events
299
+ "result": workflow.get("result"),
300
+ "error": workflow.get("error"),
301
+ "created_at": workflow["created_at"],
302
+ "updated_at": workflow["updated_at"],
303
+ }
304
+
305
+
306
+ def list_workflows() -> list[dict]:
307
+ return [
308
+ {
309
+ "workflow_id": wf["workflow_id"],
310
+ "patient_id": wf["patient_id"],
311
+ "current_state": wf["current_state"],
312
+ "created_at": wf["created_at"],
313
+ }
314
+ for wf in _workflows.values()
315
+ ]
backend/analytics.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Analytics and dashboard data aggregation."""
2
+ import random
3
+ from datetime import datetime, timedelta
4
+ from fhir_adapter import get_all_patient_ids, get_patient_profile
5
+ from clinicaltrials_api import search_trials_sync
6
+
7
+
8
+ STUDY_SITES = [
9
+ {"name": "Dana-Farber Cancer Institute", "city": "Boston", "state": "MA", "lat": 42.3376, "lon": -71.1083, "trials": 4, "enrolled": 87, "capacity": 120},
10
+ {"name": "MD Anderson Cancer Center", "city": "Houston", "state": "TX", "lat": 29.7066, "lon": -95.3990, "trials": 6, "enrolled": 142, "capacity": 200},
11
+ {"name": "Memorial Sloan Kettering", "city": "New York", "state": "NY", "lat": 40.7644, "lon": -73.9581, "trials": 5, "enrolled": 113, "capacity": 150},
12
+ {"name": "UCSF Medical Center", "city": "San Francisco", "state": "CA", "lat": 37.7631, "lon": -122.4578, "trials": 3, "enrolled": 67, "capacity": 90},
13
+ {"name": "Northwestern Medicine", "city": "Chicago", "state": "IL", "lat": 41.8827, "lon": -87.6233, "trials": 4, "enrolled": 94, "capacity": 130},
14
+ {"name": "Mayo Clinic", "city": "Rochester", "state": "MN", "lat": 44.0225, "lon": -92.4664, "trials": 7, "enrolled": 178, "capacity": 220},
15
+ {"name": "Johns Hopkins Hospital", "city": "Baltimore", "state": "MD", "lat": 39.2963, "lon": -76.5927, "trials": 5, "enrolled": 105, "capacity": 160},
16
+ {"name": "Cleveland Clinic", "city": "Cleveland", "state": "OH", "lat": 41.5022, "lon": -81.6220, "trials": 3, "enrolled": 72, "capacity": 100},
17
+ ]
18
+
19
+
20
+ def get_kpi_summary() -> dict:
21
+ patient_ids = get_all_patient_ids()
22
+ return {
23
+ "active_trials": 23,
24
+ "patients_identified": len(patient_ids) * 12,
25
+ "patients_screened": len(patient_ids) * 8,
26
+ "patients_enrolled": len(patient_ids) * 3,
27
+ "enrollment_rate": 0.37,
28
+ "avg_days_to_match": 4.2,
29
+ "sites_active": len(STUDY_SITES),
30
+ "cost_saved_usd": 284000,
31
+ }
32
+
33
+
34
+ def get_enrollment_funnel(trial_id: str | None = None) -> list[dict]:
35
+ """Return enrollment funnel data for Recharts BarChart."""
36
+ base = random.randint(80, 150) if trial_id else 500
37
+ return [
38
+ {"stage": "Identified", "count": base, "fill": "#6366f1"},
39
+ {"stage": "Pre-Screened", "count": int(base * 0.72), "fill": "#8b5cf6"},
40
+ {"stage": "Contacted", "count": int(base * 0.55), "fill": "#a78bfa"},
41
+ {"stage": "Consented", "count": int(base * 0.38), "fill": "#c4b5fd"},
42
+ {"stage": "Enrolled", "count": int(base * 0.22), "fill": "#ddd6fe"},
43
+ ]
44
+
45
+
46
+ def get_site_performance() -> list[dict]:
47
+ return [
48
+ {
49
+ **site,
50
+ "enrollment_rate": round(site["enrolled"] / site["capacity"], 2),
51
+ "fill_percentage": round(site["enrolled"] / site["capacity"] * 100, 1),
52
+ }
53
+ for site in STUDY_SITES
54
+ ]
55
+
56
+
57
+ def get_patient_demographics(trial_id: str | None = None) -> dict:
58
+ return {
59
+ "age_distribution": [
60
+ {"range": "18-30", "count": 12, "percentage": 8},
61
+ {"range": "31-45", "count": 28, "percentage": 19},
62
+ {"range": "46-60", "count": 54, "percentage": 36},
63
+ {"range": "61-75", "count": 42, "percentage": 28},
64
+ {"range": "75+", "count": 14, "percentage": 9},
65
+ ],
66
+ "gender_distribution": [
67
+ {"name": "Female", "value": 58, "fill": "#f472b6"},
68
+ {"name": "Male", "value": 39, "fill": "#60a5fa"},
69
+ {"name": "Other", "value": 3, "fill": "#a3e635"},
70
+ ],
71
+ "ethnicity_distribution": [
72
+ {"name": "White", "value": 52, "fill": "#6366f1"},
73
+ {"name": "Black/African American", "value": 18, "fill": "#8b5cf6"},
74
+ {"name": "Hispanic/Latino", "value": 15, "fill": "#ec4899"},
75
+ {"name": "Asian", "value": 11, "fill": "#14b8a6"},
76
+ {"name": "Other/Unknown", "value": 4, "fill": "#f59e0b"},
77
+ ],
78
+ }
79
+
80
+
81
+ def get_recruitment_timeline(days: int = 30) -> list[dict]:
82
+ """Daily enrollment progress for timeline chart."""
83
+ base_date = datetime.now() - timedelta(days=days)
84
+ timeline = []
85
+ cumulative = 0
86
+ for i in range(days):
87
+ daily = random.randint(1, 8)
88
+ cumulative += daily
89
+ timeline.append({
90
+ "date": (base_date + timedelta(days=i)).strftime("%Y-%m-%d"),
91
+ "daily_enrolled": daily,
92
+ "cumulative_enrolled": cumulative,
93
+ "target": int((i + 1) / days * 150),
94
+ })
95
+ return timeline
96
+
97
+
98
+ def get_map_data() -> dict:
99
+ return {
100
+ "sites": STUDY_SITES,
101
+ "patient_clusters": [
102
+ {"lat": 42.36, "lon": -71.06, "count": 24, "city": "Boston Metro"},
103
+ {"lat": 40.71, "lon": -74.01, "count": 38, "city": "New York Metro"},
104
+ {"lat": 29.76, "lon": -95.37, "count": 19, "city": "Houston Metro"},
105
+ {"lat": 37.77, "lon": -122.42, "count": 16, "city": "San Francisco Bay"},
106
+ {"lat": 41.88, "lon": -87.63, "count": 27, "city": "Chicago Metro"},
107
+ {"lat": 34.05, "lon": -118.24, "count": 31, "city": "Los Angeles Metro"},
108
+ {"lat": 33.45, "lon": -112.07, "count": 13, "city": "Phoenix Metro"},
109
+ {"lat": 47.61, "lon": -122.33, "count": 11, "city": "Seattle Metro"},
110
+ ],
111
+ }
backend/clinicaltrials_api.py ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import httpx
2
+ import asyncio
3
+ from typing import Optional
4
+ import os
5
+
6
+ CTGOV_BASE = "https://clinicaltrials.gov/api/v2/studies"
7
+
8
+ async def search_trials(condition: str, phase: Optional[str] = None, status: str = "RECRUITING", page_size: int = 20) -> list[dict]:
9
+ params = {
10
+ "query.cond": condition,
11
+ "filter.overallStatus": status,
12
+ "pageSize": page_size,
13
+ "format": "json",
14
+ "sort": "LastUpdatePostDate:desc",
15
+ }
16
+ if phase:
17
+ params["filter.phase"] = f"PHASE{phase.replace('Phase ', '').replace('I', '1').replace('II', '2').replace('III', '3').replace('IV', '4')}"
18
+
19
+ async with httpx.AsyncClient(timeout=30.0) as client:
20
+ try:
21
+ resp = await client.get(CTGOV_BASE, params=params)
22
+ resp.raise_for_status()
23
+ data = resp.json()
24
+ studies = data.get("studies", [])
25
+ return [_normalize_study(s) for s in studies]
26
+ except Exception as e:
27
+ print(f"ClinicalTrials.gov API error: {e}")
28
+ return _fallback_trials(condition)
29
+
30
+ async def get_trial_details(nct_id: str) -> dict:
31
+ params = {"query.id": nct_id, "format": "json"}
32
+ async with httpx.AsyncClient(timeout=30.0) as client:
33
+ try:
34
+ resp = await client.get(CTGOV_BASE, params=params)
35
+ resp.raise_for_status()
36
+ data = resp.json()
37
+ studies = data.get("studies", [])
38
+ if studies:
39
+ return _normalize_study(studies[0])
40
+ except Exception as e:
41
+ print(f"ClinicalTrials.gov detail error: {e}")
42
+ return {}
43
+
44
+ def _normalize_study(study: dict) -> dict:
45
+ proto = study.get("protocolSection", {})
46
+ ident = proto.get("identificationModule", {})
47
+ status_module = proto.get("statusModule", {})
48
+ desc = proto.get("descriptionModule", {})
49
+ eligibility = proto.get("eligibilityModule", {})
50
+ design = proto.get("designModule", {})
51
+ contacts = proto.get("contactsLocationsModule", {})
52
+ sponsor = proto.get("sponsorCollaboratorsModule", {})
53
+ outcomes = proto.get("outcomesModule", {})
54
+
55
+ locations = []
56
+ for loc in contacts.get("locations", [])[:5]:
57
+ locations.append({
58
+ "city": loc.get("city", ""),
59
+ "state": loc.get("state", ""),
60
+ "country": loc.get("country", "US"),
61
+ "facility": loc.get("facility", ""),
62
+ "lat": loc.get("geoPoint", {}).get("lat"),
63
+ "lon": loc.get("geoPoint", {}).get("lon"),
64
+ })
65
+
66
+ phases = design.get("phases", [])
67
+ return {
68
+ "nct_id": ident.get("nctId", ""),
69
+ "title": ident.get("briefTitle", ""),
70
+ "status": status_module.get("overallStatus", ""),
71
+ "phase": phases[0] if phases else "N/A",
72
+ "brief_summary": desc.get("briefSummary", ""),
73
+ "eligibility_criteria": eligibility.get("eligibilityCriteria", ""),
74
+ "min_age": eligibility.get("minimumAge", ""),
75
+ "max_age": eligibility.get("maximumAge", ""),
76
+ "sex": eligibility.get("sex", "ALL"),
77
+ "enrollment": design.get("enrollmentInfo", {}).get("count", 0),
78
+ "start_date": status_module.get("startDateStruct", {}).get("date", ""),
79
+ "completion_date": status_module.get("completionDateStruct", {}).get("date", ""),
80
+ "last_updated": status_module.get("lastUpdatePostDateStruct", {}).get("date", ""),
81
+ "sponsor": sponsor.get("leadSponsor", {}).get("name", ""),
82
+ "primary_outcomes": [o.get("measure", "") for o in outcomes.get("primaryOutcomes", [])[:3]],
83
+ "locations": locations,
84
+ "location_count": len(contacts.get("locations", [])),
85
+ "ctgov_url": f"https://clinicaltrials.gov/study/{ident.get('nctId', '')}",
86
+ }
87
+
88
+ def _fallback_trials(condition: str) -> list[dict]:
89
+ """Realistic fallback when API is unavailable."""
90
+ return [
91
+ {
92
+ "nct_id": "NCT04889131",
93
+ "title": f"Precision Medicine Study for {condition}",
94
+ "status": "RECRUITING",
95
+ "phase": "PHASE2",
96
+ "brief_summary": f"A randomized controlled trial evaluating targeted therapy for {condition} in adult patients.",
97
+ "eligibility_criteria": "Inclusion Criteria:\n- Age 18-75\n- Confirmed diagnosis\n- ECOG performance status 0-2\nExclusion Criteria:\n- Prior treatment failure\n- Active autoimmune disease",
98
+ "min_age": "18 Years",
99
+ "max_age": "75 Years",
100
+ "sex": "ALL",
101
+ "enrollment": 150,
102
+ "start_date": "2024-01",
103
+ "completion_date": "2026-06",
104
+ "sponsor": "Academic Medical Center",
105
+ "primary_outcomes": ["Overall Survival", "Progression-Free Survival"],
106
+ "locations": [
107
+ {"city": "Boston", "state": "MA", "country": "US", "facility": "Dana-Farber Cancer Institute", "lat": 42.3376, "lon": -71.1083},
108
+ {"city": "Houston", "state": "TX", "country": "US", "facility": "MD Anderson Cancer Center", "lat": 29.7066, "lon": -95.3990},
109
+ ],
110
+ "location_count": 2,
111
+ },
112
+ {
113
+ "nct_id": "NCT05123456",
114
+ "title": f"Immunotherapy Combination for Advanced {condition}",
115
+ "status": "RECRUITING",
116
+ "phase": "PHASE3",
117
+ "brief_summary": f"Phase III trial of combination immunotherapy in patients with advanced {condition}.",
118
+ "eligibility_criteria": "Inclusion Criteria:\n- Age ≥ 18\n- Histologically confirmed diagnosis\n- Measurable disease per RECIST 1.1\nExclusion Criteria:\n- Brain metastases\n- Prior PD-1/PD-L1 therapy",
119
+ "min_age": "18 Years",
120
+ "max_age": "N/A",
121
+ "sex": "ALL",
122
+ "enrollment": 400,
123
+ "start_date": "2023-06",
124
+ "completion_date": "2027-12",
125
+ "sponsor": "Pharma Innovations Inc",
126
+ "primary_outcomes": ["Overall Survival at 24 months"],
127
+ "locations": [
128
+ {"city": "New York", "state": "NY", "country": "US", "facility": "Memorial Sloan Kettering", "lat": 40.7644, "lon": -73.9581},
129
+ {"city": "San Francisco", "state": "CA", "country": "US", "facility": "UCSF Medical Center", "lat": 37.7631, "lon": -122.4578},
130
+ {"city": "Chicago", "state": "IL", "country": "US", "facility": "Northwestern Medicine", "lat": 41.8827, "lon": -87.6233},
131
+ ],
132
+ "location_count": 3,
133
+ },
134
+ ]
135
+
136
+ def search_trials_sync(condition: str, phase: Optional[str] = None, status: str = "RECRUITING", page_size: int = 20) -> list[dict]:
137
+ """Synchronous version using httpx.Client — safe to call from any context."""
138
+ params = {
139
+ "query.cond": condition,
140
+ "filter.overallStatus": status,
141
+ "pageSize": page_size,
142
+ "format": "json",
143
+ "sort": "LastUpdatePostDate:desc",
144
+ }
145
+ if phase:
146
+ params["filter.phase"] = f"PHASE{phase.replace('Phase ', '').replace('I', '1').replace('II', '2').replace('III', '3').replace('IV', '4')}"
147
+ with httpx.Client(timeout=30.0) as client:
148
+ try:
149
+ resp = client.get(CTGOV_BASE, params=params)
150
+ resp.raise_for_status()
151
+ data = resp.json()
152
+ return [_normalize_study(s) for s in data.get("studies", [])]
153
+ except Exception as e:
154
+ print(f"ClinicalTrials.gov API error (sync): {e}")
155
+ return _fallback_trials(condition)
156
+
157
+ def get_trial_details_sync(nct_id: str) -> dict:
158
+ """Synchronous version using httpx.Client — safe to call from any context."""
159
+ params = {"query.id": nct_id, "format": "json"}
160
+ with httpx.Client(timeout=30.0) as client:
161
+ try:
162
+ resp = client.get(CTGOV_BASE, params=params)
163
+ resp.raise_for_status()
164
+ data = resp.json()
165
+ studies = data.get("studies", [])
166
+ if studies:
167
+ return _normalize_study(studies[0])
168
+ except Exception as e:
169
+ print(f"ClinicalTrials.gov detail error (sync): {e}")
170
+ return {}
backend/consent_agent.py ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Consent & Scheduling Agent — A2A sub-agent that handles post-recruitment consent
3
+ workflow and appointment scheduling. Triggered as a handoff from the Recruitment Agent.
4
+
5
+ A2A task message format follows the Google A2A spec:
6
+ {"task_id": str, "type": "CONSENT_REQUEST" | "SCHEDULE_REQUEST", "payload": {...}}
7
+ """
8
+ import uuid
9
+ from datetime import datetime, timedelta
10
+ from typing import Optional
11
+ from llm_client import chat
12
+
13
+ # In-memory consent + scheduling store (production: Neo4j or Redis)
14
+ _consent_records: dict[str, dict] = {}
15
+ _schedule_records: dict[str, dict] = {}
16
+
17
+
18
+ # ── Consent status values ──────────────────────────────────────────────────────
19
+
20
+ CONSENT_PENDING = "PENDING"
21
+ CONSENT_SENT = "SENT"
22
+ CONSENT_SIGNED = "SIGNED"
23
+ CONSENT_DECLINED = "DECLINED"
24
+ CONSENT_EXPIRED = "EXPIRED"
25
+
26
+
27
+ # ── A2A task receiver ──────────────────────────────────────────────────────────
28
+
29
+ def receive_a2a_task(task: dict) -> dict:
30
+ """
31
+ Entry point for A2A inter-agent handoffs.
32
+ Accepts tasks from the Recruitment Agent and routes to consent or scheduling flows.
33
+ """
34
+ task_type = task.get("type", "")
35
+ payload = task.get("payload", {})
36
+ task_id = task.get("task_id", str(uuid.uuid4()))
37
+
38
+ if task_type == "CONSENT_REQUEST":
39
+ return initiate_consent(
40
+ patient_id=payload["patient_id"],
41
+ nct_id=payload["nct_id"],
42
+ trial_title=payload.get("trial_title", ""),
43
+ match_score=payload.get("match_score", 0.0),
44
+ task_id=task_id,
45
+ )
46
+ elif task_type == "SCHEDULE_REQUEST":
47
+ return schedule_screening(
48
+ patient_id=payload["patient_id"],
49
+ nct_id=payload["nct_id"],
50
+ site_city=payload.get("site_city", ""),
51
+ site_state=payload.get("site_state", ""),
52
+ task_id=task_id,
53
+ )
54
+ else:
55
+ return {"error": "UNKNOWN_TASK_TYPE", "task_id": task_id, "received_type": task_type}
56
+
57
+
58
+ # ── Consent flow ───────────────────────────────────────────────────────────────
59
+
60
+ def initiate_consent(
61
+ patient_id: str,
62
+ nct_id: str,
63
+ trial_title: str,
64
+ match_score: float = 0.0,
65
+ task_id: str | None = None,
66
+ ) -> dict:
67
+ """Create a consent record and generate the consent document."""
68
+ record_id = task_id or str(uuid.uuid4())
69
+ expires_at = (datetime.utcnow() + timedelta(days=30)).isoformat()
70
+
71
+ consent_doc = _generate_consent_document(patient_id, nct_id, trial_title)
72
+
73
+ record = {
74
+ "consent_id": record_id,
75
+ "patient_id": patient_id,
76
+ "nct_id": nct_id,
77
+ "trial_title": trial_title,
78
+ "match_score": match_score,
79
+ "status": CONSENT_SENT,
80
+ "consent_document": consent_doc,
81
+ "created_at": datetime.utcnow().isoformat(),
82
+ "expires_at": expires_at,
83
+ "signed_at": None,
84
+ "a2a_source": "recruitment_agent",
85
+ }
86
+ _consent_records[record_id] = record
87
+ return {"consent_id": record_id, "status": CONSENT_SENT, "expires_at": expires_at}
88
+
89
+
90
+ def update_consent_status(consent_id: str, status: str, notes: str = "") -> dict:
91
+ record = _consent_records.get(consent_id)
92
+ if not record:
93
+ return {"error": "CONSENT_NOT_FOUND", "consent_id": consent_id}
94
+ record["status"] = status
95
+ if status == CONSENT_SIGNED:
96
+ record["signed_at"] = datetime.utcnow().isoformat()
97
+ if notes:
98
+ record["notes"] = notes
99
+ # If consent signed, auto-trigger scheduling handoff
100
+ if status == CONSENT_SIGNED:
101
+ _trigger_scheduling_handoff(record)
102
+ return record
103
+
104
+
105
+ def get_consent_record(consent_id: str) -> dict | None:
106
+ return _consent_records.get(consent_id)
107
+
108
+
109
+ def list_consent_records(patient_id: str | None = None) -> list[dict]:
110
+ records = list(_consent_records.values())
111
+ if patient_id:
112
+ records = [r for r in records if r["patient_id"] == patient_id]
113
+ return sorted(records, key=lambda r: r["created_at"], reverse=True)
114
+
115
+
116
+ # ── Scheduling flow ────────────────────────────────────────────────────────────
117
+
118
+ def schedule_screening(
119
+ patient_id: str,
120
+ nct_id: str,
121
+ site_city: str = "",
122
+ site_state: str = "",
123
+ task_id: str | None = None,
124
+ ) -> dict:
125
+ """Create a screening appointment slot."""
126
+ appt_id = task_id or str(uuid.uuid4())
127
+ # Default slot: next business weekday at 10am
128
+ proposed_dt = _next_business_day()
129
+
130
+ appt = {
131
+ "appointment_id": appt_id,
132
+ "patient_id": patient_id,
133
+ "nct_id": nct_id,
134
+ "site_city": site_city,
135
+ "site_state": site_state,
136
+ "proposed_datetime": proposed_dt,
137
+ "status": "PROPOSED",
138
+ "created_at": datetime.utcnow().isoformat(),
139
+ "a2a_source": "consent_agent",
140
+ }
141
+ _schedule_records[appt_id] = appt
142
+ return {"appointment_id": appt_id, "proposed_datetime": proposed_dt, "status": "PROPOSED"}
143
+
144
+
145
+ def confirm_appointment(appt_id: str) -> dict:
146
+ appt = _schedule_records.get(appt_id)
147
+ if not appt:
148
+ return {"error": "APPOINTMENT_NOT_FOUND"}
149
+ appt["status"] = "CONFIRMED"
150
+ appt["confirmed_at"] = datetime.utcnow().isoformat()
151
+ return appt
152
+
153
+
154
+ def list_appointments(patient_id: str | None = None) -> list[dict]:
155
+ appts = list(_schedule_records.values())
156
+ if patient_id:
157
+ appts = [a for a in appts if a["patient_id"] == patient_id]
158
+ return sorted(appts, key=lambda a: a["created_at"], reverse=True)
159
+
160
+
161
+ # ── Helpers ────────────────────────────────────────────────────────────────────
162
+
163
+ def _trigger_scheduling_handoff(consent_record: dict):
164
+ """Auto-schedule after consent signed — A2A internal handoff."""
165
+ schedule_screening(
166
+ patient_id=consent_record["patient_id"],
167
+ nct_id=consent_record["nct_id"],
168
+ task_id=f"sched_{consent_record['consent_id']}",
169
+ )
170
+
171
+
172
+ def _next_business_day() -> str:
173
+ dt = datetime.utcnow() + timedelta(days=3)
174
+ while dt.weekday() >= 5: # skip Sat/Sun
175
+ dt += timedelta(days=1)
176
+ return dt.replace(hour=10, minute=0, second=0, microsecond=0).isoformat() + "Z"
177
+
178
+
179
+ def _generate_consent_document(patient_id: str, nct_id: str, trial_title: str) -> str:
180
+ prompt = f"""Generate a concise, plain-language informed consent document (ICF) for clinical trial participation.
181
+
182
+ Trial: {trial_title}
183
+ NCT ID: {nct_id}
184
+ Patient ID: {patient_id}
185
+
186
+ The document should cover in 4 short sections:
187
+ 1. What this study is about (2-3 sentences)
188
+ 2. What you will be asked to do (bullet points)
189
+ 3. Possible risks and benefits (bullet points)
190
+ 4. Your rights as a participant (2-3 sentences)
191
+
192
+ Use plain language (8th grade reading level). End with a signature block."""
193
+ try:
194
+ return chat([{"role": "user", "content": prompt}], temperature=0.3, max_tokens=600)
195
+ except Exception:
196
+ return f"Informed Consent Document\nTrial: {trial_title} ({nct_id})\n\nPlease review this document carefully before signing."
197
+
198
+
199
+ def get_consent_stats() -> dict:
200
+ all_records = list(_consent_records.values())
201
+ return {
202
+ "total": len(all_records),
203
+ "sent": sum(1 for r in all_records if r["status"] == CONSENT_SENT),
204
+ "signed": sum(1 for r in all_records if r["status"] == CONSENT_SIGNED),
205
+ "declined": sum(1 for r in all_records if r["status"] == CONSENT_DECLINED),
206
+ "appointments_scheduled": len(_schedule_records),
207
+ }
backend/data_ingestion.py ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from neo4j_setup import neo4j_conn
2
+
3
+
4
+ def ingest_sample_data():
5
+ """Ingest rich sample data into Neo4j knowledge graph."""
6
+ # Clear existing sample data
7
+ neo4j_conn.run_query("MATCH (n) WHERE n.sample = true DETACH DELETE n")
8
+
9
+ queries = [
10
+ # Patients with rich profiles
11
+ """
12
+ MERGE (p1:Patient {id: 'P001'})
13
+ SET p1 += {age: 45, gender: 'female', ethnicity: 'White', sample: true,
14
+ zip_code: '02115', diagnosis_date: '2022-06-01'}
15
+ """,
16
+ """
17
+ MERGE (p2:Patient {id: 'P002'})
18
+ SET p2 += {age: 60, gender: 'male', ethnicity: 'Black/African American', sample: true,
19
+ zip_code: '77030', diagnosis_date: '2021-11-15'}
20
+ """,
21
+ """
22
+ MERGE (p3:Patient {id: 'P003'})
23
+ SET p3 += {age: 38, gender: 'female', ethnicity: 'Hispanic/Latino', sample: true,
24
+ zip_code: '94102', diagnosis_date: '2023-02-10'}
25
+ """,
26
+ """
27
+ MERGE (p4:Patient {id: 'P004'})
28
+ SET p4 += {age: 67, gender: 'male', ethnicity: 'Asian', sample: true,
29
+ zip_code: '10001', diagnosis_date: '2022-09-20'}
30
+ """,
31
+ """
32
+ MERGE (p5:Patient {id: 'P005'})
33
+ SET p5 += {age: 34, gender: 'female', ethnicity: 'White', sample: true,
34
+ zip_code: '60601', diagnosis_date: '2023-07-01'}
35
+ """,
36
+
37
+ # Diagnoses
38
+ """MERGE (d1:Diagnosis {code: 'C50'}) SET d1.name = 'Breast Cancer', d1.snomed = '254837009'""",
39
+ """MERGE (d2:Diagnosis {code: 'C61'}) SET d2.name = 'Prostate Cancer', d2.snomed = '399068003'""",
40
+ """MERGE (d3:Diagnosis {code: 'C34'}) SET d3.name = 'Non-Small Cell Lung Cancer', d3.snomed = '363346000'""",
41
+ """MERGE (d4:Diagnosis {code: 'C18'}) SET d4.name = 'Colorectal Cancer', d4.snomed = '93761005'""",
42
+
43
+ # Biomarkers
44
+ """MERGE (b1:Biomarker {id: 'HER2_POS'}) SET b1.name = 'HER2 Positive', b1.loinc = '85319-2'""",
45
+ """MERGE (b2:Biomarker {id: 'EGFR_L858R'}) SET b2.name = 'EGFR L858R Mutation', b2.loinc = '81704-9'""",
46
+ """MERGE (b3:Biomarker {id: 'BRCA2_POS'}) SET b3.name = 'BRCA2 Mutation', b3.loinc = '85319-2'""",
47
+ """MERGE (b4:Biomarker {id: 'MSI_H'}) SET b4.name = 'MSI-High', b4.loinc = '85077-6'""",
48
+ """MERGE (b5:Biomarker {id: 'PDL1_HIGH'}) SET b5.name = 'PD-L1 High (>50%)', b5.loinc = '73977-1'""",
49
+
50
+ # Trials
51
+ """
52
+ MERGE (t1:Trial {id: 'NCT04889131'})
53
+ SET t1 += {phase: 'PHASE2', condition: 'Breast Cancer', status: 'RECRUITING',
54
+ title: 'Precision HER2+ Breast Cancer Study', min_age: 18, max_age: 75,
55
+ enrollment_target: 150, enrolled: 87, sponsor: 'Dana-Farber'}
56
+ """,
57
+ """
58
+ MERGE (t2:Trial {id: 'NCT05123456'})
59
+ SET t2 += {phase: 'PHASE3', condition: 'Breast Cancer', status: 'RECRUITING',
60
+ title: 'Immunotherapy Combination for Advanced Breast Cancer', min_age: 18,
61
+ enrollment_target: 400, enrolled: 142, sponsor: 'Pharma Innovations Inc'}
62
+ """,
63
+ """
64
+ MERGE (t3:Trial {id: 'NCT05456789'})
65
+ SET t3 += {phase: 'PHASE2', condition: 'Prostate Cancer', status: 'RECRUITING',
66
+ title: 'BRCA2 Prostate Cancer PARP Inhibitor Trial', min_age: 18,
67
+ enrollment_target: 120, enrolled: 54, sponsor: 'Oncology Research Group'}
68
+ """,
69
+ """
70
+ MERGE (t4:Trial {id: 'NCT06112233'})
71
+ SET t4 += {phase: 'PHASE3', condition: 'Non-Small Cell Lung Cancer', status: 'RECRUITING',
72
+ title: 'EGFR-Mutant NSCLC Targeted Therapy Study', min_age: 18,
73
+ enrollment_target: 300, enrolled: 178, sponsor: 'Global Cancer Institute'}
74
+ """,
75
+ """
76
+ MERGE (t5:Trial {id: 'NCT05334455'})
77
+ SET t5 += {phase: 'PHASE2', condition: 'Colorectal Cancer', status: 'RECRUITING',
78
+ title: 'MSI-H Colorectal Cancer Immunotherapy Study', min_age: 18,
79
+ enrollment_target: 100, enrolled: 45, sponsor: 'NCI'}
80
+ """,
81
+
82
+ # Study Sites
83
+ """
84
+ MERGE (s1:StudySite {id: 'DFCI'})
85
+ SET s1 += {name: 'Dana-Farber Cancer Institute', city: 'Boston', state: 'MA',
86
+ lat: 42.3376, lon: -71.1083, active_trials: 4}
87
+ """,
88
+ """
89
+ MERGE (s2:StudySite {id: 'MDACC'})
90
+ SET s2 += {name: 'MD Anderson Cancer Center', city: 'Houston', state: 'TX',
91
+ lat: 29.7066, lon: -95.3990, active_trials: 6}
92
+ """,
93
+ """
94
+ MERGE (s3:StudySite {id: 'MSK'})
95
+ SET s3 += {name: 'Memorial Sloan Kettering', city: 'New York', state: 'NY',
96
+ lat: 40.7644, lon: -73.9581, active_trials: 5}
97
+ """,
98
+
99
+ # Patient-Diagnosis relationships
100
+ """MATCH (p:Patient {id: 'P001'}), (d:Diagnosis {code: 'C50'}) MERGE (p)-[:HAS_DIAGNOSIS]->(d)""",
101
+ """MATCH (p:Patient {id: 'P002'}), (d:Diagnosis {code: 'C61'}) MERGE (p)-[:HAS_DIAGNOSIS]->(d)""",
102
+ """MATCH (p:Patient {id: 'P003'}), (d:Diagnosis {code: 'C50'}) MERGE (p)-[:HAS_DIAGNOSIS]->(d)""",
103
+ """MATCH (p:Patient {id: 'P004'}), (d:Diagnosis {code: 'C34'}) MERGE (p)-[:HAS_DIAGNOSIS]->(d)""",
104
+ """MATCH (p:Patient {id: 'P005'}), (d:Diagnosis {code: 'C18'}) MERGE (p)-[:HAS_DIAGNOSIS]->(d)""",
105
+
106
+ # Patient-Biomarker relationships
107
+ """MATCH (p:Patient {id: 'P001'}), (b:Biomarker {id: 'HER2_POS'}) MERGE (p)-[:HAS_BIOMARKER]->(b)""",
108
+ """MATCH (p:Patient {id: 'P002'}), (b:Biomarker {id: 'BRCA2_POS'}) MERGE (p)-[:HAS_BIOMARKER]->(b)""",
109
+ """MATCH (p:Patient {id: 'P004'}), (b:Biomarker {id: 'EGFR_L858R'}) MERGE (p)-[:HAS_BIOMARKER]->(b)""",
110
+ """MATCH (p:Patient {id: 'P004'}), (b:Biomarker {id: 'PDL1_HIGH'}) MERGE (p)-[:HAS_BIOMARKER]->(b)""",
111
+ """MATCH (p:Patient {id: 'P005'}), (b:Biomarker {id: 'MSI_H'}) MERGE (p)-[:HAS_BIOMARKER]->(b)""",
112
+
113
+ # Diagnosis-Trial eligibility
114
+ """MATCH (d:Diagnosis {code: 'C50'}), (t:Trial {id: 'NCT04889131'}) MERGE (d)-[:ELIGIBLE_FOR]->(t)""",
115
+ """MATCH (d:Diagnosis {code: 'C50'}), (t:Trial {id: 'NCT05123456'}) MERGE (d)-[:ELIGIBLE_FOR]->(t)""",
116
+ """MATCH (d:Diagnosis {code: 'C61'}), (t:Trial {id: 'NCT05456789'}) MERGE (d)-[:ELIGIBLE_FOR]->(t)""",
117
+ """MATCH (d:Diagnosis {code: 'C34'}), (t:Trial {id: 'NCT06112233'}) MERGE (d)-[:ELIGIBLE_FOR]->(t)""",
118
+ """MATCH (d:Diagnosis {code: 'C18'}), (t:Trial {id: 'NCT05334455'}) MERGE (d)-[:ELIGIBLE_FOR]->(t)""",
119
+
120
+ # Trial-Site relationships
121
+ """MATCH (t:Trial {id: 'NCT04889131'}), (s:StudySite {id: 'DFCI'}) MERGE (t)-[:CONDUCTED_AT]->(s)""",
122
+ """MATCH (t:Trial {id: 'NCT04889131'}), (s:StudySite {id: 'MSK'}) MERGE (t)-[:CONDUCTED_AT]->(s)""",
123
+ """MATCH (t:Trial {id: 'NCT05123456'}), (s:StudySite {id: 'MDACC'}) MERGE (t)-[:CONDUCTED_AT]->(s)""",
124
+ """MATCH (t:Trial {id: 'NCT05123456'}), (s:StudySite {id: 'MSK'}) MERGE (t)-[:CONDUCTED_AT]->(s)""",
125
+ """MATCH (t:Trial {id: 'NCT05456789'}), (s:StudySite {id: 'MDACC'}) MERGE (t)-[:CONDUCTED_AT]->(s)""",
126
+
127
+ # Biomarker-Trial requirements
128
+ """MATCH (b:Biomarker {id: 'HER2_POS'}), (t:Trial {id: 'NCT04889131'}) MERGE (b)-[:REQUIRED_FOR]->(t)""",
129
+ """MATCH (b:Biomarker {id: 'EGFR_L858R'}), (t:Trial {id: 'NCT06112233'}) MERGE (b)-[:REQUIRED_FOR]->(t)""",
130
+ """MATCH (b:Biomarker {id: 'MSI_H'}), (t:Trial {id: 'NCT05334455'}) MERGE (b)-[:REQUIRED_FOR]->(t)""",
131
+ ]
132
+
133
+ for query in queries:
134
+ try:
135
+ neo4j_conn.run_query(query)
136
+ except Exception as e:
137
+ print(f"Ingestion warning: {e}")
138
+
139
+ print("Rich sample data ingested successfully.")
140
+
141
+
142
+ if __name__ == "__main__":
143
+ ingest_sample_data()
144
+ neo4j_conn.close()
backend/fhir_adapter.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+ from typing import Optional
3
+ from datetime import date
4
+
5
+
6
+ class FHIRCoding(BaseModel):
7
+ system: str
8
+ code: str
9
+ display: str
10
+
11
+
12
+ class FHIRCondition(BaseModel):
13
+ resourceType: str = "Condition"
14
+ id: str
15
+ code: FHIRCoding
16
+ clinicalStatus: str = "active"
17
+ onsetDate: Optional[str] = None
18
+
19
+
20
+ class FHIRObservation(BaseModel):
21
+ resourceType: str = "Observation"
22
+ id: str
23
+ code: FHIRCoding
24
+ valueQuantity: Optional[dict] = None
25
+ valueString: Optional[str] = None
26
+ valueBoolean: Optional[bool] = None
27
+ status: str = "final"
28
+
29
+
30
+ class FHIRMedication(BaseModel):
31
+ resourceType: str = "MedicationStatement"
32
+ id: str
33
+ medication: FHIRCoding
34
+ status: str = "active"
35
+
36
+
37
+ class FHIRPatient(BaseModel):
38
+ resourceType: str = "Patient"
39
+ id: str
40
+ gender: str
41
+ birthDate: str
42
+ conditions: list[FHIRCondition] = []
43
+ observations: list[FHIRObservation] = []
44
+ medications: list[FHIRMedication] = []
45
+
46
+
47
+ def build_patient_profile(fhir_patient: FHIRPatient) -> dict:
48
+ """Convert FHIR R4 patient bundle to normalized matching profile."""
49
+ from datetime import datetime
50
+ birth_year = int(fhir_patient.birthDate[:4])
51
+ age = datetime.now().year - birth_year
52
+
53
+ diagnoses = [c.code.code for c in fhir_patient.conditions]
54
+ diagnosis_names = [c.code.display for c in fhir_patient.conditions]
55
+ medications = [m.medication.display for m in fhir_patient.medications]
56
+
57
+ biomarkers = {}
58
+ lab_values = {}
59
+ for obs in fhir_patient.observations:
60
+ key = obs.code.display.lower().replace(" ", "_")
61
+ if obs.valueBoolean is not None:
62
+ biomarkers[key] = obs.valueBoolean
63
+ elif obs.valueQuantity:
64
+ lab_values[key] = obs.valueQuantity
65
+ elif obs.valueString:
66
+ biomarkers[key] = obs.valueString
67
+
68
+ return {
69
+ "patient_id": fhir_patient.id,
70
+ "age": age,
71
+ "gender": fhir_patient.gender,
72
+ "diagnosis_codes": diagnoses,
73
+ "diagnosis_names": diagnosis_names,
74
+ "medications": medications,
75
+ "biomarkers": biomarkers,
76
+ "lab_values": lab_values,
77
+ "fhir_bundle_ref": f"Patient/{fhir_patient.id}",
78
+ }
79
+
80
+
81
+ # Realistic mock FHIR R4 patients for demo
82
+ MOCK_FHIR_PATIENTS: dict[str, FHIRPatient] = {
83
+ "P001": FHIRPatient(
84
+ id="P001", gender="female", birthDate="1979-03-15",
85
+ conditions=[
86
+ FHIRCondition(id="c1", code=FHIRCoding(system="http://snomed.info/sct", code="254837009", display="Breast cancer"), onsetDate="2022-06-01"),
87
+ ],
88
+ observations=[
89
+ FHIRObservation(id="o1", code=FHIRCoding(system="http://loinc.org", code="85319-2", display="HER2"), valueBoolean=True),
90
+ FHIRObservation(id="o2", code=FHIRCoding(system="http://loinc.org", code="2857-1", display="PSA"), valueQuantity={"value": 0.5, "unit": "ng/mL"}),
91
+ FHIRObservation(id="o3", code=FHIRCoding(system="http://loinc.org", code="718-7", display="Hemoglobin"), valueQuantity={"value": 12.5, "unit": "g/dL"}),
92
+ ],
93
+ medications=[
94
+ FHIRMedication(id="m1", medication=FHIRCoding(system="http://www.nlm.nih.gov/research/umls/rxnorm", code="583214", display="Trastuzumab")),
95
+ ],
96
+ ),
97
+ "P002": FHIRPatient(
98
+ id="P002", gender="male", birthDate="1964-08-22",
99
+ conditions=[
100
+ FHIRCondition(id="c2", code=FHIRCoding(system="http://snomed.info/sct", code="399068003", display="Prostate cancer"), onsetDate="2021-11-15"),
101
+ ],
102
+ observations=[
103
+ FHIRObservation(id="o4", code=FHIRCoding(system="http://loinc.org", code="2857-1", display="PSA"), valueQuantity={"value": 8.3, "unit": "ng/mL"}),
104
+ FHIRObservation(id="o5", code=FHIRCoding(system="http://loinc.org", code="85319-2", display="BRCA2"), valueBoolean=True),
105
+ ],
106
+ medications=[
107
+ FHIRMedication(id="m2", medication=FHIRCoding(system="http://www.nlm.nih.gov/research/umls/rxnorm", code="1946819", display="Enzalutamide")),
108
+ ],
109
+ ),
110
+ "P003": FHIRPatient(
111
+ id="P003", gender="female", birthDate="1985-11-30",
112
+ conditions=[
113
+ FHIRCondition(id="c3", code=FHIRCoding(system="http://snomed.info/sct", code="254837009", display="Breast cancer"), onsetDate="2023-02-10"),
114
+ FHIRCondition(id="c4", code=FHIRCoding(system="http://snomed.info/sct", code="44054006", display="Type 2 diabetes"), onsetDate="2019-05-01"),
115
+ ],
116
+ observations=[
117
+ FHIRObservation(id="o6", code=FHIRCoding(system="http://loinc.org", code="85319-2", display="HER2"), valueBoolean=False),
118
+ FHIRObservation(id="o7", code=FHIRCoding(system="http://loinc.org", code="4548-4", display="HbA1c"), valueQuantity={"value": 7.2, "unit": "%"}),
119
+ ],
120
+ medications=[
121
+ FHIRMedication(id="m3", medication=FHIRCoding(system="http://www.nlm.nih.gov/research/umls/rxnorm", code="860975", display="Metformin")),
122
+ ],
123
+ ),
124
+ "P004": FHIRPatient(
125
+ id="P004", gender="male", birthDate="1957-04-07",
126
+ conditions=[
127
+ FHIRCondition(id="c5", code=FHIRCoding(system="http://snomed.info/sct", code="363346000", display="Non-small cell lung cancer"), onsetDate="2022-09-20"),
128
+ ],
129
+ observations=[
130
+ FHIRObservation(id="o8", code=FHIRCoding(system="http://loinc.org", code="81704-9", display="EGFR mutation"), valueString="L858R"),
131
+ FHIRObservation(id="o9", code=FHIRCoding(system="http://loinc.org", code="73977-1", display="PD-L1 expression"), valueQuantity={"value": 60, "unit": "%"}),
132
+ ],
133
+ medications=[
134
+ FHIRMedication(id="m4", medication=FHIRCoding(system="http://www.nlm.nih.gov/research/umls/rxnorm", code="1860492", display="Osimertinib")),
135
+ ],
136
+ ),
137
+ "P005": FHIRPatient(
138
+ id="P005", gender="female", birthDate="1990-07-19",
139
+ conditions=[
140
+ FHIRCondition(id="c6", code=FHIRCoding(system="http://snomed.info/sct", code="93761005", display="Primary malignant neoplasm of colon"), onsetDate="2023-07-01"),
141
+ ],
142
+ observations=[
143
+ FHIRObservation(id="o10", code=FHIRCoding(system="http://loinc.org", code="85077-6", display="MSI status"), valueString="MSI-H"),
144
+ FHIRObservation(id="o11", code=FHIRCoding(system="http://loinc.org", code="85319-2", display="KRAS"), valueBoolean=False),
145
+ ],
146
+ medications=[],
147
+ ),
148
+ }
149
+
150
+
151
+ def get_mock_fhir_patient(patient_id: str) -> Optional[FHIRPatient]:
152
+ return MOCK_FHIR_PATIENTS.get(patient_id)
153
+
154
+
155
+ def get_all_patient_ids() -> list[str]:
156
+ return list(MOCK_FHIR_PATIENTS.keys())
157
+
158
+
159
+ def get_patient_profile(patient_id: str) -> Optional[dict]:
160
+ patient = get_mock_fhir_patient(patient_id)
161
+ if patient:
162
+ return build_patient_profile(patient)
163
+ return None
backend/fhir_server.py ADDED
@@ -0,0 +1,327 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FHIR R4 Server Client — connects to any FHIR R4 endpoint.
3
+
4
+ Default: HAPI FHIR public sandbox (hapi.fhir.org/baseR4)
5
+ Production: any EHR FHIR endpoint secured with SMART on FHIR OAuth2.
6
+
7
+ SMART on FHIR token flow:
8
+ 1. Client credentials grant → POST to FHIR_TOKEN_ENDPOINT
9
+ 2. Bearer token attached to every FHIR API request
10
+ 3. Token cached until expiry, then refreshed automatically
11
+ """
12
+ import os
13
+ import time
14
+ import httpx
15
+ from typing import Optional
16
+ from dotenv import load_dotenv
17
+ from fhir_adapter import (
18
+ FHIRPatient, FHIRCondition, FHIRObservation, FHIRMedication,
19
+ FHIRCoding, build_patient_profile,
20
+ )
21
+
22
+ load_dotenv()
23
+
24
+ FHIR_BASE_URL = os.getenv("FHIR_BASE_URL", "https://hapi.fhir.org/baseR4")
25
+ FHIR_TOKEN_ENDPOINT = os.getenv("FHIR_TOKEN_ENDPOINT", "")
26
+ FHIR_CLIENT_ID = os.getenv("FHIR_CLIENT_ID", "")
27
+ FHIR_CLIENT_SECRET = os.getenv("FHIR_CLIENT_SECRET", "")
28
+ FHIR_STATIC_TOKEN = os.getenv("FHIR_TOKEN", "") # pre-issued bearer token
29
+
30
+ _token_cache: dict = {"token": "", "expires_at": 0.0}
31
+
32
+
33
+ # ── SMART on FHIR token acquisition ──────────────────────────────────────────
34
+
35
+ def _get_smart_token() -> str:
36
+ """
37
+ Obtain a SMART on FHIR bearer token via client credentials grant.
38
+ Returns cached token if still valid.
39
+ """
40
+ if FHIR_STATIC_TOKEN:
41
+ return FHIR_STATIC_TOKEN
42
+
43
+ if not FHIR_TOKEN_ENDPOINT:
44
+ return ""
45
+
46
+ if time.time() < _token_cache["expires_at"] - 30:
47
+ return _token_cache["token"]
48
+
49
+ try:
50
+ resp = httpx.post(
51
+ FHIR_TOKEN_ENDPOINT,
52
+ data={
53
+ "grant_type": "client_credentials",
54
+ "client_id": FHIR_CLIENT_ID,
55
+ "client_secret": FHIR_CLIENT_SECRET,
56
+ "scope": "system/Patient.read system/Observation.read system/Condition.read system/MedicationStatement.read",
57
+ },
58
+ timeout=10,
59
+ )
60
+ resp.raise_for_status()
61
+ data = resp.json()
62
+ _token_cache["token"] = data["access_token"]
63
+ _token_cache["expires_at"] = time.time() + int(data.get("expires_in", 3600))
64
+ return _token_cache["token"]
65
+ except Exception as e:
66
+ print(f"[fhir_server] SMART token error: {e}")
67
+ return ""
68
+
69
+
70
+ def _headers() -> dict:
71
+ token = _get_smart_token()
72
+ h = {"Accept": "application/fhir+json", "Content-Type": "application/fhir+json"}
73
+ if token:
74
+ h["Authorization"] = f"Bearer {token}"
75
+ return h
76
+
77
+
78
+ # ── SHARP context envelope ────────────────────────────────────────────────────
79
+
80
+ def build_sharp_context(
81
+ patient_id: str,
82
+ fhir_ref: str | None = None,
83
+ session_id: str | None = None,
84
+ tenant_id: str | None = None,
85
+ ) -> dict:
86
+ """
87
+ SHARP Extension Spec — patient context envelope.
88
+ Carried on every inter-agent message and MCP tool call.
89
+ """
90
+ import uuid
91
+ return {
92
+ "sharp_version": "1.0",
93
+ "patient_context": {
94
+ "id": patient_id,
95
+ "fhir_ref": fhir_ref or f"Patient/{patient_id}",
96
+ "fhir_base": FHIR_BASE_URL,
97
+ "tenant_id": tenant_id or "clinicalmatch-demo",
98
+ "session_id": session_id or str(uuid.uuid4()),
99
+ },
100
+ "data_classification": "synthetic-demo",
101
+ "baa_in_scope": False,
102
+ "consent_status": "unknown",
103
+ }
104
+
105
+
106
+ # ── FHIR resource fetchers ────────────────────────────────────────────────────
107
+
108
+ def fetch_fhir_patient(patient_fhir_id: str) -> dict | None:
109
+ """Fetch a Patient resource from the FHIR server by FHIR ID."""
110
+ try:
111
+ resp = httpx.get(
112
+ f"{FHIR_BASE_URL}/Patient/{patient_fhir_id}",
113
+ headers=_headers(), timeout=10,
114
+ )
115
+ resp.raise_for_status()
116
+ return resp.json()
117
+ except Exception as e:
118
+ print(f"[fhir_server] Patient fetch error ({patient_fhir_id}): {e}")
119
+ return None
120
+
121
+
122
+ def search_fhir_patients(count: int = 10, condition_code: str | None = None) -> list[dict]:
123
+ """Search for Patient resources on the FHIR server."""
124
+ params: dict = {"_count": count, "_format": "json"}
125
+ if condition_code:
126
+ params["_has:Condition:patient:code"] = condition_code
127
+ try:
128
+ resp = httpx.get(f"{FHIR_BASE_URL}/Patient", headers=_headers(),
129
+ params=params, timeout=15)
130
+ resp.raise_for_status()
131
+ bundle = resp.json()
132
+ return [e["resource"] for e in bundle.get("entry", []) if e.get("resource")]
133
+ except Exception as e:
134
+ print(f"[fhir_server] Patient search error: {e}")
135
+ return []
136
+
137
+
138
+ def fetch_patient_conditions(patient_fhir_id: str) -> list[dict]:
139
+ try:
140
+ resp = httpx.get(
141
+ f"{FHIR_BASE_URL}/Condition",
142
+ headers=_headers(),
143
+ params={"patient": patient_fhir_id, "_format": "json"},
144
+ timeout=10,
145
+ )
146
+ resp.raise_for_status()
147
+ bundle = resp.json()
148
+ return [e["resource"] for e in bundle.get("entry", []) if e.get("resource")]
149
+ except Exception as e:
150
+ print(f"[fhir_server] Condition fetch error: {e}")
151
+ return []
152
+
153
+
154
+ def fetch_patient_observations(patient_fhir_id: str) -> list[dict]:
155
+ try:
156
+ resp = httpx.get(
157
+ f"{FHIR_BASE_URL}/Observation",
158
+ headers=_headers(),
159
+ params={"patient": patient_fhir_id, "_format": "json", "_count": 50},
160
+ timeout=10,
161
+ )
162
+ resp.raise_for_status()
163
+ bundle = resp.json()
164
+ return [e["resource"] for e in bundle.get("entry", []) if e.get("resource")]
165
+ except Exception as e:
166
+ print(f"[fhir_server] Observation fetch error: {e}")
167
+ return []
168
+
169
+
170
+ def fetch_patient_medications(patient_fhir_id: str) -> list[dict]:
171
+ try:
172
+ resp = httpx.get(
173
+ f"{FHIR_BASE_URL}/MedicationStatement",
174
+ headers=_headers(),
175
+ params={"patient": patient_fhir_id, "_format": "json"},
176
+ timeout=10,
177
+ )
178
+ resp.raise_for_status()
179
+ bundle = resp.json()
180
+ return [e["resource"] for e in bundle.get("entry", []) if e.get("resource")]
181
+ except Exception as e:
182
+ print(f"[fhir_server] Medication fetch error: {e}")
183
+ return []
184
+
185
+
186
+ # ── FHIR → internal model conversion ─────────────────────────────────────────
187
+
188
+ def _safe_coding(codings: list[dict], fallback: str = "unknown") -> FHIRCoding:
189
+ for c in codings:
190
+ if c.get("code"):
191
+ return FHIRCoding(
192
+ system=c.get("system", ""),
193
+ code=c.get("code", fallback),
194
+ display=c.get("display", c.get("code", fallback)),
195
+ )
196
+ return FHIRCoding(system="", code=fallback, display=fallback)
197
+
198
+
199
+ def _parse_fhir_patient_resource(resource: dict) -> FHIRPatient | None:
200
+ try:
201
+ pid = resource.get("id", "")
202
+ gender = resource.get("gender", "unknown")
203
+ birth_date = resource.get("birthDate", "1970-01-01")
204
+ return FHIRPatient(id=pid, gender=gender, birthDate=birth_date)
205
+ except Exception as e:
206
+ print(f"[fhir_server] Patient parse error: {e}")
207
+ return None
208
+
209
+
210
+ def _parse_conditions(resources: list[dict]) -> list[FHIRCondition]:
211
+ conditions = []
212
+ for r in resources:
213
+ try:
214
+ coding_list = r.get("code", {}).get("coding", [])
215
+ coding = _safe_coding(coding_list)
216
+ conditions.append(FHIRCondition(
217
+ id=r.get("id", ""),
218
+ code=coding,
219
+ clinicalStatus=r.get("clinicalStatus", {}).get("coding", [{}])[0].get("code", "active"),
220
+ onsetDate=r.get("onsetDateTime", r.get("onsetDate", "")),
221
+ ))
222
+ except Exception:
223
+ continue
224
+ return conditions
225
+
226
+
227
+ def _parse_observations(resources: list[dict]) -> list[FHIRObservation]:
228
+ observations = []
229
+ for r in resources:
230
+ try:
231
+ coding_list = r.get("code", {}).get("coding", [])
232
+ coding = _safe_coding(coding_list)
233
+ vq = r.get("valueQuantity")
234
+ vs = r.get("valueString")
235
+ vb = r.get("valueBoolean")
236
+ observations.append(FHIRObservation(
237
+ id=r.get("id", ""),
238
+ code=coding,
239
+ valueQuantity={"value": vq["value"], "unit": vq.get("unit", "")} if vq and "value" in vq else None,
240
+ valueString=str(vs) if vs is not None else None,
241
+ valueBoolean=bool(vb) if vb is not None else None,
242
+ status=r.get("status", "final"),
243
+ ))
244
+ except Exception:
245
+ continue
246
+ return observations
247
+
248
+
249
+ def _parse_medications(resources: list[dict]) -> list[FHIRMedication]:
250
+ medications = []
251
+ for r in resources:
252
+ try:
253
+ coding_list = (
254
+ r.get("medicationCodeableConcept", {}).get("coding", []) or
255
+ r.get("medication", {}).get("concept", {}).get("coding", [])
256
+ )
257
+ coding = _safe_coding(coding_list)
258
+ medications.append(FHIRMedication(
259
+ id=r.get("id", ""),
260
+ medication=coding,
261
+ status=r.get("status", "active"),
262
+ ))
263
+ except Exception:
264
+ continue
265
+ return medications
266
+
267
+
268
+ # ── Public API ────────────────────────────────────────────────────────────────
269
+
270
+ def get_live_patient_profile(
271
+ patient_fhir_id: str,
272
+ sharp_context: dict | None = None,
273
+ ) -> dict | None:
274
+ """
275
+ Fetch a full patient profile from the live FHIR server.
276
+ Assembles Patient + Condition + Observation + MedicationStatement
277
+ into the same internal profile dict used everywhere in the system.
278
+ Attaches SHARP context envelope.
279
+ """
280
+ resource = fetch_fhir_patient(patient_fhir_id)
281
+ if not resource:
282
+ return None
283
+
284
+ patient = _parse_fhir_patient_resource(resource)
285
+ if not patient:
286
+ return None
287
+
288
+ patient.conditions = _parse_conditions(fetch_patient_conditions(patient_fhir_id))
289
+ patient.observations = _parse_observations(fetch_patient_observations(patient_fhir_id))
290
+ patient.medications = _parse_medications(fetch_patient_medications(patient_fhir_id))
291
+
292
+ profile = build_patient_profile(patient)
293
+ profile["fhir_source"] = "live"
294
+ profile["fhir_base_url"] = FHIR_BASE_URL
295
+ profile["fhir_ref"] = f"Patient/{patient_fhir_id}"
296
+ profile["sharp_context"] = sharp_context or build_sharp_context(
297
+ patient_id=patient_fhir_id,
298
+ fhir_ref=f"Patient/{patient_fhir_id}",
299
+ )
300
+ return profile
301
+
302
+
303
+ def get_fhir_server_status() -> dict:
304
+ """Probe the configured FHIR server and return capability statement summary."""
305
+ try:
306
+ resp = httpx.get(
307
+ f"{FHIR_BASE_URL}/metadata",
308
+ headers=_headers(), timeout=8,
309
+ )
310
+ resp.raise_for_status()
311
+ cap = resp.json()
312
+ return {
313
+ "reachable": True,
314
+ "fhir_version": cap.get("fhirVersion", "unknown"),
315
+ "server_name": cap.get("software", {}).get("name", "unknown"),
316
+ "base_url": FHIR_BASE_URL,
317
+ "auth_method": "SMART/Bearer" if (FHIR_TOKEN_ENDPOINT or FHIR_STATIC_TOKEN) else "none (public sandbox)",
318
+ "smart_token_configured": bool(FHIR_TOKEN_ENDPOINT or FHIR_STATIC_TOKEN),
319
+ }
320
+ except Exception as e:
321
+ return {
322
+ "reachable": False,
323
+ "base_url": FHIR_BASE_URL,
324
+ "error": str(e),
325
+ "auth_method": "SMART/Bearer" if (FHIR_TOKEN_ENDPOINT or FHIR_STATIC_TOKEN) else "none",
326
+ "smart_token_configured": bool(FHIR_TOKEN_ENDPOINT or FHIR_STATIC_TOKEN),
327
+ }
backend/graph_seeder.py ADDED
@@ -0,0 +1,1109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Graph seeder — fetches REAL data from live public APIs and populates Neo4j.
3
+
4
+ Data sources (all free, no auth):
5
+ - ClinicalTrials.gov v2 API (NCT trial records)
6
+ - RxNorm (NIH) (medication RxCUI codes)
7
+ - ICD-10 CM (NLM) (diagnosis codes)
8
+ - PubMed (NCBI) (supporting literature PMIDs)
9
+ - Synthetic patients (500 realistic profiles matched to real trials)
10
+
11
+ Run once to seed, or schedule periodically to stay current.
12
+ """
13
+ import httpx
14
+ import asyncio
15
+ import time
16
+ import random
17
+ from neo4j_setup import neo4j_conn
18
+
19
+ CTGOV_BASE = "https://clinicaltrials.gov/api/v2/studies"
20
+ RXNORM_BASE = "https://rxnav.nlm.nih.gov/REST"
21
+ ICD10_BASE = "https://clinicaltables.nlm.nih.gov/api/icd10cm/v3/search"
22
+ PUBMED_BASE = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"
23
+ FDA_BASE = "https://api.fda.gov/drug"
24
+
25
+ # Conditions to seed — expand as needed
26
+ SEED_CONDITIONS = [
27
+ "breast cancer",
28
+ "prostate cancer",
29
+ "non-small cell lung cancer",
30
+ "colorectal cancer",
31
+ "ovarian cancer",
32
+ "melanoma",
33
+ "leukemia",
34
+ "lymphoma",
35
+ "glioblastoma",
36
+ "pancreatic cancer",
37
+ ]
38
+
39
+ # Key oncology medications to pre-load
40
+ SEED_MEDICATIONS = [
41
+ "trastuzumab", "pembrolizumab", "nivolumab", "osimertinib",
42
+ "olaparib", "enzalutamide", "bevacizumab", "rituximab",
43
+ "imatinib", "dabrafenib", "vemurafenib", "atezolizumab",
44
+ "durvalumab", "cetuximab", "erlotinib", "capecitabine",
45
+ ]
46
+
47
+ # ICD-10 prefixes for oncology
48
+ SEED_ICD10_PREFIXES = [
49
+ "C50", "C61", "C34", "C18", "C56", "C43", "C91", "C85", "C71", "C25",
50
+ ]
51
+
52
+
53
+ # ── Neo4j helpers ─────────────────────────────────────────────────────────────
54
+
55
+ def upsert(query: str, params: dict | None = None):
56
+ try:
57
+ neo4j_conn.run_query(query, params or {})
58
+ except Exception as e:
59
+ print(f" [neo4j] warn: {e}")
60
+
61
+
62
+ def batch_upsert(queries: list[tuple[str, dict]]):
63
+ for q, p in queries:
64
+ upsert(q, p)
65
+
66
+
67
+ # ── ClinicalTrials.gov ────────────────────────────────────────────────────────
68
+
69
+ async def fetch_trials_for_condition(client: httpx.AsyncClient, condition: str, page_size: int = 50) -> list[dict]:
70
+ try:
71
+ resp = await client.get(CTGOV_BASE, params={
72
+ "query.cond": condition,
73
+ "filter.overallStatus": "RECRUITING",
74
+ "pageSize": page_size,
75
+ "format": "json",
76
+ }, timeout=30)
77
+ resp.raise_for_status()
78
+ return resp.json().get("studies", [])
79
+ except Exception as e:
80
+ print(f" [ctgov] error for '{condition}': {e}")
81
+ return []
82
+
83
+
84
+ def _extract_trial(study: dict, condition: str) -> dict | None:
85
+ try:
86
+ proto = study["protocolSection"]
87
+ ident = proto["identificationModule"]
88
+ status = proto.get("statusModule", {})
89
+ design = proto.get("designModule", {})
90
+ eligibility = proto.get("eligibilityModule", {})
91
+ desc = proto.get("descriptionModule", {})
92
+ sponsor = proto.get("sponsorCollaboratorsModule", {})
93
+ contacts = proto.get("contactsLocationsModule", {})
94
+ outcomes = proto.get("outcomesModule", {})
95
+
96
+ phases = design.get("phases", ["N/A"])
97
+ locations = contacts.get("locations", [])
98
+
99
+ return {
100
+ "nct_id": ident["nctId"],
101
+ "title": ident.get("briefTitle", "")[:200],
102
+ "status": status.get("overallStatus", "UNKNOWN"),
103
+ "phase": phases[0] if phases else "N/A",
104
+ "condition": condition,
105
+ "brief_summary": desc.get("briefSummary", "")[:1000],
106
+ "eligibility_criteria": eligibility.get("eligibilityCriteria", "")[:2000],
107
+ "min_age": eligibility.get("minimumAge", ""),
108
+ "max_age": eligibility.get("maximumAge", ""),
109
+ "sex": eligibility.get("sex", "ALL"),
110
+ "enrollment": design.get("enrollmentInfo", {}).get("count", 0),
111
+ "start_date": status.get("startDateStruct", {}).get("date", ""),
112
+ "completion_date": status.get("completionDateStruct", {}).get("date", ""),
113
+ "sponsor": sponsor.get("leadSponsor", {}).get("name", "")[:100],
114
+ "primary_outcomes": [o.get("measure", "")[:100] for o in outcomes.get("primaryOutcomes", [])[:3]],
115
+ "location_count": len(locations),
116
+ "locations": [
117
+ {
118
+ "facility": loc.get("facility", "")[:100],
119
+ "city": loc.get("city", ""),
120
+ "state": loc.get("state", ""),
121
+ "country": loc.get("country", "US"),
122
+ "lat": loc.get("geoPoint", {}).get("lat"),
123
+ "lon": loc.get("geoPoint", {}).get("lon"),
124
+ }
125
+ for loc in locations[:10]
126
+ ],
127
+ }
128
+ except Exception as e:
129
+ return None
130
+
131
+
132
+ async def seed_trials(client: httpx.AsyncClient) -> int:
133
+ print("\n[1/5] Seeding clinical trials from ClinicalTrials.gov...")
134
+ total = 0
135
+ for condition in SEED_CONDITIONS:
136
+ studies = await fetch_trials_for_condition(client, condition)
137
+ print(f" {condition}: {len(studies)} trials fetched")
138
+ for study in studies:
139
+ trial = _extract_trial(study, condition)
140
+ if not trial:
141
+ continue
142
+ # Upsert trial node
143
+ upsert("""
144
+ MERGE (t:Trial {id: $nct_id})
145
+ SET t += {
146
+ title: $title, status: $status, phase: $phase,
147
+ condition: $condition, brief_summary: $brief_summary,
148
+ eligibility_criteria: $eligibility_criteria,
149
+ min_age: $min_age, max_age: $max_age, sex: $sex,
150
+ enrollment: $enrollment, start_date: $start_date,
151
+ completion_date: $completion_date, sponsor: $sponsor,
152
+ location_count: $location_count, source: 'clinicaltrials.gov',
153
+ updated_at: datetime()
154
+ }
155
+ """, trial)
156
+ # Upsert Condition → Trial relationship
157
+ upsert("""
158
+ MERGE (c:ConditionNode {name: $condition})
159
+ WITH c
160
+ MATCH (t:Trial {id: $nct_id})
161
+ MERGE (c)-[:HAS_TRIAL]->(t)
162
+ """, {"condition": condition, "nct_id": trial["nct_id"]})
163
+ # Upsert study sites
164
+ for loc in trial["locations"]:
165
+ if loc.get("lat") and loc.get("lon"):
166
+ upsert("""
167
+ MERGE (s:StudySite {facility: $facility, city: $city, state: $state})
168
+ SET s += {country: $country, lat: $lat, lon: $lon, source: 'clinicaltrials.gov'}
169
+ WITH s
170
+ MATCH (t:Trial {id: $nct_id})
171
+ MERGE (t)-[:CONDUCTED_AT]->(s)
172
+ """, {**loc, "nct_id": trial["nct_id"]})
173
+ total += 1
174
+ await asyncio.sleep(0.5) # Rate limit courtesy
175
+ print(f" Total trials seeded: {total}")
176
+ return total
177
+
178
+
179
+ # ── RxNorm (NIH) — Medications ────────────────────────────────────────────────
180
+
181
+ async def fetch_rxcui(client: httpx.AsyncClient, drug_name: str) -> list[dict]:
182
+ try:
183
+ resp = await client.get(f"{RXNORM_BASE}/drugs.json", params={"name": drug_name}, timeout=15)
184
+ resp.raise_for_status()
185
+ d = resp.json()
186
+ groups = d.get("drugGroup", {}).get("conceptGroup", [])
187
+ results = []
188
+ for grp in groups:
189
+ tty = grp.get("tty", "")
190
+ for concept in grp.get("conceptProperties", [])[:3]:
191
+ results.append({
192
+ "rxcui": concept.get("rxcui", ""),
193
+ "name": concept.get("name", ""),
194
+ "tty": tty,
195
+ "search_name": drug_name,
196
+ })
197
+ return results[:5] # Top 5
198
+ except Exception as e:
199
+ print(f" [rxnorm] error for '{drug_name}': {e}")
200
+ return []
201
+
202
+
203
+ async def seed_medications(client: httpx.AsyncClient) -> int:
204
+ print("\n[2/5] Seeding medications from RxNorm...")
205
+ total = 0
206
+ for drug_name in SEED_MEDICATIONS:
207
+ concepts = await fetch_rxcui(client, drug_name)
208
+ for concept in concepts[:1]: # Primary concept only
209
+ upsert("""
210
+ MERGE (m:Medication {rxcui: $rxcui})
211
+ SET m += {
212
+ name: $name, tty: $tty, generic_name: $search_name,
213
+ source: 'rxnorm', updated_at: datetime()
214
+ }
215
+ """, concept)
216
+ total += 1
217
+ print(f" {drug_name}: {len(concepts)} RxCUI concepts")
218
+ await asyncio.sleep(0.2)
219
+ print(f" Total medications seeded: {total}")
220
+ return total
221
+
222
+
223
+ # ── ICD-10 CM (NLM) — Diagnoses ──────────────────────────────────────────────
224
+
225
+ async def fetch_icd10(client: httpx.AsyncClient, prefix: str) -> list[dict]:
226
+ try:
227
+ resp = await client.get(ICD10_BASE, params={
228
+ "sf": "code,name",
229
+ "terms": prefix,
230
+ "maxList": 20,
231
+ }, timeout=15)
232
+ resp.raise_for_status()
233
+ data = resp.json()
234
+ if not data or len(data) < 4:
235
+ return []
236
+ return [{"code": item[0], "name": item[1]} for item in data[3]]
237
+ except Exception as e:
238
+ print(f" [icd10] error for '{prefix}': {e}")
239
+ return []
240
+
241
+
242
+ async def seed_diagnoses(client: httpx.AsyncClient) -> int:
243
+ print("\n[3/5] Seeding diagnoses from ICD-10 CM...")
244
+ total = 0
245
+ for prefix in SEED_ICD10_PREFIXES:
246
+ codes = await fetch_icd10(client, prefix)
247
+ for item in codes:
248
+ upsert("""
249
+ MERGE (d:Diagnosis {code: $code})
250
+ SET d += {name: $name, source: 'icd10cm', updated_at: datetime()}
251
+ """, item)
252
+ total += 1
253
+ # Link ICD prefix → condition names for matching
254
+ condition_map = {
255
+ "C50": "breast cancer", "C61": "prostate cancer", "C34": "non-small cell lung cancer",
256
+ "C18": "colorectal cancer", "C56": "ovarian cancer", "C43": "melanoma",
257
+ "C91": "leukemia", "C85": "lymphoma", "C71": "glioblastoma", "C25": "pancreatic cancer",
258
+ }
259
+ if prefix in condition_map:
260
+ upsert("""
261
+ MATCH (d:Diagnosis) WHERE d.code STARTS WITH $prefix
262
+ MATCH (c:ConditionNode {name: $condition})
263
+ MERGE (d)-[:MAPS_TO_CONDITION]->(c)
264
+ """, {"prefix": prefix, "condition": condition_map[prefix]})
265
+ print(f" ICD-10 {prefix}: {len(codes)} codes")
266
+ await asyncio.sleep(0.2)
267
+ print(f" Total diagnoses seeded: {total}")
268
+ return total
269
+
270
+
271
+ # ── PubMed (NCBI) — Supporting Literature ────────────────────────────────────
272
+
273
+ async def fetch_pubmed_ids(client: httpx.AsyncClient, condition: str, count: int = 5) -> list[str]:
274
+ try:
275
+ resp = await client.get(f"{PUBMED_BASE}/esearch.fcgi", params={
276
+ "db": "pubmed",
277
+ "term": f"clinical trial {condition} treatment[Title/Abstract]",
278
+ "retmax": count,
279
+ "retmode": "json",
280
+ "sort": "relevance",
281
+ }, timeout=15)
282
+ resp.raise_for_status()
283
+ return resp.json()["esearchresult"]["idlist"]
284
+ except Exception as e:
285
+ print(f" [pubmed] error for '{condition}': {e}")
286
+ return []
287
+
288
+
289
+ async def fetch_pubmed_summary(client: httpx.AsyncClient, pmid: str) -> dict | None:
290
+ try:
291
+ resp = await client.get(f"{PUBMED_BASE}/esummary.fcgi", params={
292
+ "db": "pubmed", "id": pmid, "retmode": "json",
293
+ }, timeout=15)
294
+ resp.raise_for_status()
295
+ result = resp.json()["result"]
296
+ if pmid not in result:
297
+ return None
298
+ r = result[pmid]
299
+ return {
300
+ "pmid": pmid,
301
+ "title": r.get("title", "")[:200],
302
+ "source": r.get("source", ""),
303
+ "pub_date": r.get("pubdate", ""),
304
+ "authors": ", ".join(a.get("name", "") for a in r.get("authors", [])[:3]),
305
+ }
306
+ except Exception as e:
307
+ return None
308
+
309
+
310
+ async def seed_literature(client: httpx.AsyncClient) -> int:
311
+ print("\n[4/5] Seeding supporting literature from PubMed...")
312
+ total = 0
313
+ for condition in SEED_CONDITIONS[:5]: # Top 5 conditions to keep fast
314
+ pmids = await fetch_pubmed_ids(client, condition)
315
+ for pmid in pmids:
316
+ summary = await fetch_pubmed_summary(client, pmid)
317
+ if not summary:
318
+ continue
319
+ upsert("""
320
+ MERGE (p:Publication {pmid: $pmid})
321
+ SET p += {
322
+ title: $title, journal: $source, pub_date: $pub_date,
323
+ authors: $authors, source: 'pubmed', updated_at: datetime()
324
+ }
325
+ WITH p
326
+ MATCH (c:ConditionNode {name: $condition})
327
+ MERGE (p)-[:SUPPORTS_RESEARCH_ON]->(c)
328
+ """, {**summary, "condition": condition})
329
+ total += 1
330
+ print(f" {condition}: {len(pmids)} publications linked")
331
+ await asyncio.sleep(0.3)
332
+ print(f" Total publications seeded: {total}")
333
+ return total
334
+
335
+
336
+ # ── Biomarkers (static — curated from COSMIC / NCIT) ─────────────────────────
337
+
338
+
339
+ # Expand seed conditions to 20 oncology types
340
+ SEED_CONDITIONS = [
341
+ "breast cancer", "prostate cancer", "non-small cell lung cancer", "colorectal cancer",
342
+ "ovarian cancer", "melanoma", "leukemia", "lymphoma", "glioblastoma", "pancreatic cancer",
343
+ "bladder cancer", "renal cell carcinoma", "thyroid cancer", "multiple myeloma",
344
+ "endometrial cancer", "cervical cancer", "gastric cancer", "hepatocellular carcinoma",
345
+ "head and neck cancer", "sarcoma",
346
+ ]
347
+
348
+ CURATED_BIOMARKERS = [
349
+ # Breast cancer
350
+ {"id": "HER2_POS", "name": "HER2 Positive", "gene": "ERBB2", "loinc": "85319-2", "condition": "breast cancer"},
351
+ {"id": "HER2_NEG", "name": "HER2 Negative", "gene": "ERBB2", "loinc": "85319-2", "condition": "breast cancer"},
352
+ {"id": "BRCA1_MUT", "name": "BRCA1 Pathogenic Variant", "gene": "BRCA1", "loinc": "21636-6", "condition": "breast cancer"},
353
+ {"id": "BRCA2_MUT", "name": "BRCA2 Pathogenic Variant", "gene": "BRCA2", "loinc": "21637-4", "condition": "breast cancer"},
354
+ {"id": "PIK3CA_MUT", "name": "PIK3CA Mutation", "gene": "PIK3CA", "loinc": "82457-4", "condition": "breast cancer"},
355
+ {"id": "TP53_MUT", "name": "TP53 Mutation", "gene": "TP53", "loinc": "21637-4", "condition": "breast cancer"},
356
+ {"id": "ER_POS", "name": "Estrogen Receptor Positive", "gene": "ESR1", "loinc": "85310-1", "condition": "breast cancer"},
357
+ {"id": "PR_POS", "name": "Progesterone Receptor Positive", "gene": "PGR", "loinc": "85321-8", "condition": "breast cancer"},
358
+ {"id": "TNBC", "name": "Triple Negative Breast Cancer", "gene": "ERBB2/ESR1/PGR", "loinc": "85319-2", "condition": "breast cancer"},
359
+ # Lung
360
+ {"id": "EGFR_L858R", "name": "EGFR L858R Mutation", "gene": "EGFR", "loinc": "81704-9", "condition": "non-small cell lung cancer"},
361
+ {"id": "EGFR_DEL19", "name": "EGFR Exon 19 Deletion", "gene": "EGFR", "loinc": "81704-9", "condition": "non-small cell lung cancer"},
362
+ {"id": "EGFR_T790M", "name": "EGFR T790M Resistance Mutation", "gene": "EGFR", "loinc": "81704-9", "condition": "non-small cell lung cancer"},
363
+ {"id": "ALK_FUSION", "name": "ALK Gene Fusion", "gene": "ALK", "loinc": "81695-9", "condition": "non-small cell lung cancer"},
364
+ {"id": "ROS1_FUSION", "name": "ROS1 Gene Fusion", "gene": "ROS1", "loinc": "81696-7", "condition": "non-small cell lung cancer"},
365
+ {"id": "MET_EX14", "name": "MET Exon 14 Skipping", "gene": "MET", "loinc": "82139-8", "condition": "non-small cell lung cancer"},
366
+ {"id": "KRAS_G12C", "name": "KRAS G12C Mutation", "gene": "KRAS", "loinc": "81434-5", "condition": "non-small cell lung cancer"},
367
+ {"id": "PDL1_HIGH", "name": "PD-L1 TPS ≥50%", "gene": "CD274", "loinc": "73977-1", "condition": "non-small cell lung cancer"},
368
+ {"id": "PDL1_LOW", "name": "PD-L1 TPS 1-49%", "gene": "CD274", "loinc": "73977-1", "condition": "non-small cell lung cancer"},
369
+ {"id": "PDL1_NEG", "name": "PD-L1 TPS <1%", "gene": "CD274", "loinc": "73977-1", "condition": "non-small cell lung cancer"},
370
+ # Prostate
371
+ {"id": "PSA_ELEVATED","name": "PSA Elevated (>4 ng/mL)", "gene": "KLK3", "loinc": "2857-1", "condition": "prostate cancer"},
372
+ {"id": "PTEN_LOSS", "name": "PTEN Loss", "gene": "PTEN", "loinc": "21637-4", "condition": "prostate cancer"},
373
+ {"id": "AR_V7", "name": "Androgen Receptor Splice Variant 7", "gene": "AR", "loinc": "82145-5", "condition": "prostate cancer"},
374
+ # Colorectal
375
+ {"id": "MSI_H", "name": "Microsatellite Instability-High", "gene": "MLH1/MSH2", "loinc": "85077-6", "condition": "colorectal cancer"},
376
+ {"id": "MSS", "name": "Microsatellite Stable", "gene": "MLH1/MSH2", "loinc": "85077-6", "condition": "colorectal cancer"},
377
+ {"id": "KRAS_WT", "name": "KRAS Wild-Type", "gene": "KRAS", "loinc": "21637-4", "condition": "colorectal cancer"},
378
+ {"id": "BRAF_V600E", "name": "BRAF V600E Mutation", "gene": "BRAF", "loinc": "81287-7", "condition": "colorectal cancer"},
379
+ {"id": "NRAS_MUT", "name": "NRAS Mutation", "gene": "NRAS", "loinc": "82143-0", "condition": "colorectal cancer"},
380
+ # Melanoma
381
+ {"id": "BRAF_V600K", "name": "BRAF V600K Mutation", "gene": "BRAF", "loinc": "81287-7", "condition": "melanoma"},
382
+ {"id": "TMB_HIGH", "name": "Tumor Mutational Burden High (≥10)", "gene": "TMB", "loinc": "94076-7", "condition": "melanoma"},
383
+ {"id": "NRAS_MEL", "name": "NRAS Mutation (Melanoma)", "gene": "NRAS", "loinc": "82143-0", "condition": "melanoma"},
384
+ # GBM
385
+ {"id": "IDH1_R132H", "name": "IDH1 R132H Mutation", "gene": "IDH1", "loinc": "82140-6", "condition": "glioblastoma"},
386
+ {"id": "IDH1_WT", "name": "IDH1 Wild-Type", "gene": "IDH1", "loinc": "82140-6", "condition": "glioblastoma"},
387
+ {"id": "MGMT_METH", "name": "MGMT Promoter Methylation", "gene": "MGMT", "loinc": "85319-2", "condition": "glioblastoma"},
388
+ {"id": "EGFR_AMP", "name": "EGFR Amplification", "gene": "EGFR", "loinc": "81704-9", "condition": "glioblastoma"},
389
+ # Leukemia / Lymphoma
390
+ {"id": "BCR_ABL1", "name": "BCR-ABL1 Fusion (Philadelphia Chr)", "gene": "BCR/ABL1", "loinc": "33899-6", "condition": "leukemia"},
391
+ {"id": "FLT3_ITD", "name": "FLT3 Internal Tandem Duplication", "gene": "FLT3", "loinc": "82144-8", "condition": "leukemia"},
392
+ {"id": "NPM1_MUT", "name": "NPM1 Mutation", "gene": "NPM1", "loinc": "82147-1", "condition": "leukemia"},
393
+ {"id": "CD20_POS", "name": "CD20 Positive", "gene": "MS4A1", "loinc": "85080-0", "condition": "lymphoma"},
394
+ {"id": "EZH2_MUT", "name": "EZH2 Mutation", "gene": "EZH2", "loinc": "82148-9", "condition": "lymphoma"},
395
+ # New conditions
396
+ {"id": "FGFR3_MUT", "name": "FGFR3 Mutation", "gene": "FGFR3", "loinc": "82150-5", "condition": "bladder cancer"},
397
+ {"id": "VHL_LOSS", "name": "VHL Gene Loss", "gene": "VHL", "loinc": "82151-3", "condition": "renal cell carcinoma"},
398
+ {"id": "MTOR_MUT", "name": "mTOR Pathway Mutation", "gene": "MTOR", "loinc": "82152-1", "condition": "renal cell carcinoma"},
399
+ {"id": "BRAF_THYROID","name": "BRAF V600E (Thyroid)", "gene": "BRAF", "loinc": "81287-7", "condition": "thyroid cancer"},
400
+ {"id": "RET_FUSION", "name": "RET Gene Fusion", "gene": "RET", "loinc": "82153-9", "condition": "thyroid cancer"},
401
+ {"id": "NTRK_FUSION", "name": "NTRK Gene Fusion", "gene": "NTRK1/2/3", "loinc": "82154-7", "condition": "thyroid cancer"},
402
+ {"id": "WHSC1_MUT", "name": "MMSET/WHSC1 Mutation", "gene": "NSD2", "loinc": "82155-4", "condition": "multiple myeloma"},
403
+ {"id": "CDKN2A_LOSS", "name": "CDKN2A Loss", "gene": "CDKN2A", "loinc": "82156-2", "condition": "multiple myeloma"},
404
+ {"id": "POLE_MUT", "name": "POLE Mutation", "gene": "POLE", "loinc": "82157-0", "condition": "endometrial cancer"},
405
+ {"id": "CTNNB1_MUT", "name": "CTNNB1 Mutation", "gene": "CTNNB1", "loinc": "82158-8", "condition": "endometrial cancer"},
406
+ {"id": "HPV_POS", "name": "HPV Positive", "gene": "HPV", "loinc": "21440-3", "condition": "cervical cancer"},
407
+ {"id": "ERBB2_GC", "name": "HER2 Amplification (Gastric)", "gene": "ERBB2", "loinc": "85319-2", "condition": "gastric cancer"},
408
+ {"id": "HBV_POS", "name": "Hepatitis B Virus Positive", "gene": "HBV", "loinc": "16933-4", "condition": "hepatocellular carcinoma"},
409
+ {"id": "TERT_MUT", "name": "TERT Promoter Mutation", "gene": "TERT", "loinc": "82159-6", "condition": "hepatocellular carcinoma"},
410
+ {"id": "PIK3CA_HNC", "name": "PIK3CA Mutation (H&N)", "gene": "PIK3CA", "loinc": "82457-4", "condition": "head and neck cancer"},
411
+ {"id": "HPV_HNSC", "name": "HPV-Positive HNSCC", "gene": "HPV", "loinc": "21440-3", "condition": "head and neck cancer"},
412
+ {"id": "CDK4_AMP", "name": "CDK4 Amplification", "gene": "CDK4", "loinc": "82160-4", "condition": "sarcoma"},
413
+ {"id": "MDM2_AMP", "name": "MDM2 Amplification", "gene": "MDM2", "loinc": "82161-2", "condition": "sarcoma"},
414
+ ]
415
+
416
+
417
+ def seed_biomarkers() -> int:
418
+ print("\n[5/5] Seeding biomarkers (curated from COSMIC/NCIT)...")
419
+ for bm in CURATED_BIOMARKERS:
420
+ upsert("""
421
+ MERGE (b:Biomarker {id: $id})
422
+ SET b += {name: $name, gene: $gene, loinc: $loinc, source: 'curated', updated_at: datetime()}
423
+ WITH b
424
+ MERGE (c:ConditionNode {name: $condition})
425
+ MERGE (b)-[:RELEVANT_TO]->(c)
426
+ """, bm)
427
+ print(f" {len(CURATED_BIOMARKERS)} biomarkers seeded and linked to conditions")
428
+ return len(CURATED_BIOMARKERS)
429
+
430
+
431
+ # ── Eligibility relationships ─────────────────────────────────────────────────
432
+
433
+ def derive_eligibility_relationships():
434
+ print("\n[+] Deriving eligibility relationships...")
435
+ upsert("MATCH (d:Diagnosis)-[:MAPS_TO_CONDITION]->(c:ConditionNode)-[:HAS_TRIAL]->(t:Trial) MERGE (d)-[:ELIGIBLE_FOR]->(t)")
436
+ upsert("MATCH (b:Biomarker)-[:RELEVANT_TO]->(c:ConditionNode)-[:HAS_TRIAL]->(t:Trial) MERGE (b)-[:MAY_QUALIFY_FOR]->(t)")
437
+ print(" Eligibility relationships derived.")
438
+
439
+
440
+ # ══════════════════════════════════════════════════════════════════════════════
441
+ # Synthetic Patient Engine — 100 k clinically-informed personas
442
+ # Distributions based on: SEER 2023, TCGA biomarker atlas, ASCO guidelines,
443
+ # US Census 2020 demographics, ACS Cancer Facts & Figures 2024.
444
+ # ══════════════════════════════════════════════════════════════════════════════
445
+
446
+ # ── Name pools (US Census racial/ethnic proportions) ─────────────────────────
447
+
448
+ _NAMES_F_WHITE = ["Emma","Olivia","Ava","Isabella","Sophia","Charlotte","Amelia","Mia","Harper",
449
+ "Evelyn","Abigail","Emily","Elizabeth","Avery","Ella","Madison","Scarlett",
450
+ "Victoria","Grace","Chloe","Penelope","Riley","Lily","Eleanor","Hannah",
451
+ "Lillian","Addison","Aubrey","Ellie","Stella","Natalie","Leah","Hazel",
452
+ "Violet","Audrey","Claire","Lucy","Anna","Samantha","Katherine"]
453
+ _NAMES_F_BLACK = ["Aaliyah","Amara","Destiny","Imani","Jasmine","Keisha","Layla","Maya","Naomi",
454
+ "Nia","Raven","Serena","Tamara","Unique","Zora","Aisha","Brianna","Crystal",
455
+ "Diamond","Essence","Faith","Genesis","Heaven","India","Jade","Kiara","Lashonda",
456
+ "Monique","Nadia","Precious","Quiana","Regina","Shanice","Tiffany","Whitney"]
457
+ _NAMES_F_HISPANIC = ["Sofia","Camila","Valentina","Isabella","Daniela","Fernanda","Gabriela","Lucia",
458
+ "Maria","Ana","Carmen","Diana","Elena","Gloria","Iris","Jessica","Laura",
459
+ "Linda","Margarita","Natalia","Paola","Rosa","Sandra","Teresa","Veronica",
460
+ "Ximena","Yolanda","Adriana","Beatriz","Carolina","Esperanza","Francisca"]
461
+ _NAMES_F_ASIAN = ["Aiko","Mei","Yuki","Sakura","Hana","Yuna","Ji-Young","Soo-Jin","Lan","Linh",
462
+ "Nguyen","Phuong","Priya","Divya","Ananya","Kavya","Shreya","Sanjana",
463
+ "Hui","Xin","Ying","Fang","Jing","Li","Min","Qian","Wei","Xue","Yan","Zhen"]
464
+ _NAMES_M_WHITE = ["Liam","Noah","William","James","Oliver","Benjamin","Elijah","Lucas","Mason",
465
+ "Logan","Alexander","Ethan","Jacob","Michael","Daniel","Henry","Jackson",
466
+ "Sebastian","Aiden","Matthew","Samuel","David","Joseph","Carter","Owen",
467
+ "Wyatt","John","Jack","Luke","Dylan","Grayson","Levi","Isaac","Gabriel"]
468
+ _NAMES_M_BLACK = ["Andre","DeShawn","Darius","Elijah","Isaiah","Jamal","Jaylen","Jordan","Kendrick",
469
+ "Malik","Marcus","Marquise","Nathaniel","Omari","Quincy","Rashad","Roderick",
470
+ "Terrence","Trevon","Xavier","Zion","Aaron","Calvin","Damon","Ernest","Frederick",
471
+ "Gerald","Harold","Ivan","Jerome","Kenneth","Leonard","Maurice","Nelson"]
472
+ _NAMES_M_HISPANIC = ["Santiago","Mateo","Alejandro","Sebastian","Diego","Carlos","Miguel","Andres",
473
+ "Fernando","Jose","Luis","Manuel","Marco","Mario","Pablo","Rafael","Ricardo",
474
+ "Roberto","Rodrigo","Victor","Alberto","Arturo","Cesar","Eduardo","Ernesto",
475
+ "Francisco","Guillermo","Hector","Ignacio","Javier","Juan","Lorenzo","Oscar"]
476
+ _NAMES_M_ASIAN = ["Wei","Ming","Jian","Yang","Hao","Lei","Tao","Xiao","Yong","Jun","Ryu","Kenji",
477
+ "Hiroshi","Takashi","Yuto","Min-Jun","Seo-Jun","Ji-Ho","Arjun","Rahul","Vikram",
478
+ "Suresh","Rajesh","Anil","Vijay","Amit","Nikhil","Rohan","Kiran","Sanjay"]
479
+ _LAST_NAMES_WHITE = ["Smith","Johnson","Williams","Brown","Jones","Miller","Davis","Wilson","Anderson",
480
+ "Thomas","Taylor","Moore","Jackson","Martin","Lee","Thompson","White","Harris",
481
+ "Clark","Lewis","Robinson","Walker","Young","Allen","King","Wright","Scott",
482
+ "Green","Adams","Nelson","Baker","Hall","Campbell","Mitchell","Carter","Roberts"]
483
+ _LAST_NAMES_BLACK = ["Williams","Johnson","Jones","Brown","Davis","Wilson","Thomas","Taylor","Moore",
484
+ "Jackson","Harris","Thompson","White","Robinson","Walker","King","Green","Adams",
485
+ "Baker","Hall","Carter","Mitchell","Peele","Banks","Bell","Boyd","Brooks","Bryant",
486
+ "Byrd","Chambers","Coleman","Collins","Cooper","Crawford","Dixon","Edwards"]
487
+ _LAST_NAMES_HISPANIC = ["Garcia","Rodriguez","Martinez","Hernandez","Lopez","Gonzalez","Perez","Sanchez",
488
+ "Ramirez","Torres","Flores","Rivera","Gomez","Diaz","Reyes","Morales","Cruz",
489
+ "Gutierrez","Ortiz","Chavez","Ramos","Romero","Vargas","Castillo","Jimenez",
490
+ "Moreno","Alvarez","Mendoza","Ruiz","Aguilar","Vega","Castro","Medina"]
491
+ _LAST_NAMES_ASIAN = ["Wang","Li","Zhang","Liu","Chen","Yang","Huang","Zhao","Wu","Zhou","Kim","Park",
492
+ "Lee","Choi","Jung","Nguyen","Tran","Le","Pham","Hoang","Patel","Shah","Kumar",
493
+ "Singh","Sharma","Gupta","Mehta","Kapoor","Nair","Reddy","Iyer","Rao","Joshi"]
494
+
495
+ # Ethnic distribution approximating US cancer patient demographics (ACS 2024)
496
+ _ETHNICITY_GROUPS = [
497
+ ("White", 0.60, _NAMES_F_WHITE, _NAMES_M_WHITE, _LAST_NAMES_WHITE),
498
+ ("Black or African American", 0.13, _NAMES_F_BLACK, _NAMES_M_BLACK, _LAST_NAMES_BLACK),
499
+ ("Hispanic or Latino", 0.14, _NAMES_F_HISPANIC, _NAMES_M_HISPANIC, _LAST_NAMES_HISPANIC),
500
+ ("Asian", 0.07, _NAMES_F_ASIAN, _NAMES_M_ASIAN, _LAST_NAMES_ASIAN),
501
+ ("American Indian or Alaska Native", 0.03, _NAMES_F_WHITE, _NAMES_M_WHITE, _LAST_NAMES_WHITE),
502
+ ("Native Hawaiian or Pacific Islander", 0.01, _NAMES_F_ASIAN, _NAMES_M_ASIAN, _LAST_NAMES_ASIAN),
503
+ ("Other / Multiracial", 0.02, _NAMES_F_WHITE, _NAMES_M_WHITE, _LAST_NAMES_WHITE),
504
+ ]
505
+ _ETH_NAMES = [(e[0], e[2], e[3], e[4]) for e in _ETHNICITY_GROUPS]
506
+ _ETH_WEIGHTS = [e[1] for e in _ETHNICITY_GROUPS]
507
+
508
+ # City pool weighted by US metropolitan population (2020 Census)
509
+ _CITIES = [
510
+ ("New York","NY",0.060),("Los Angeles","CA",0.045),("Chicago","IL",0.033),
511
+ ("Houston","TX",0.027),("Phoenix","AZ",0.020),("Philadelphia","PA",0.018),
512
+ ("San Antonio","TX",0.016),("San Diego","CA",0.016),("Dallas","TX",0.015),
513
+ ("San Jose","CA",0.013),("Austin","TX",0.013),("Jacksonville","FL",0.011),
514
+ ("Fort Worth","TX",0.010),("Columbus","OH",0.010),("Charlotte","NC",0.010),
515
+ ("Indianapolis","IN",0.009),("San Francisco","CA",0.009),("Seattle","WA",0.009),
516
+ ("Denver","CO",0.009),("Nashville","TN",0.009),("Boston","MA",0.009),
517
+ ("Baltimore","MD",0.008),("Louisville","KY",0.007),("Portland","OR",0.007),
518
+ ("Las Vegas","NV",0.007),("Milwaukee","WI",0.006),("Albuquerque","NM",0.006),
519
+ ("Tucson","AZ",0.006),("Fresno","CA",0.005),("Sacramento","CA",0.005),
520
+ ("Atlanta","GA",0.009),("Kansas City","MO",0.005),("Omaha","NE",0.004),
521
+ ("Raleigh","NC",0.005),("Cleveland","OH",0.005),("Minneapolis","MN",0.006),
522
+ ("Miami","FL",0.008),("Tampa","FL",0.007),("New Orleans","LA",0.005),
523
+ ("Pittsburgh","PA",0.006),("Memphis","TN",0.005),("Richmond","VA",0.004),
524
+ ("Birmingham","AL",0.004),("Salt Lake City","UT",0.004),("Hartford","CT",0.004),
525
+ ("Buffalo","NY",0.004),("Rochester","NY",0.003),("Providence","RI",0.003),
526
+ ("Des Moines","IA",0.003),("Little Rock","AR",0.003),("Madison","WI",0.003),
527
+ ]
528
+ _CITY_NAMES = [(c[0], c[1]) for c in _CITIES]
529
+ _CITY_WEIGHTS = [c[2] for c in _CITIES]
530
+
531
+ # Comorbidity prevalence in US oncology patients (literature-based)
532
+ _COMORBIDITY_POOL = [
533
+ ("Type 2 Diabetes", 0.18),
534
+ ("Hypertension", 0.42),
535
+ ("Coronary Artery Disease",0.09),
536
+ ("COPD", 0.08),
537
+ ("Chronic Kidney Disease", 0.12),
538
+ ("Obesity (BMI>30)", 0.36),
539
+ ("Depression/Anxiety", 0.22),
540
+ ("Hypothyroidism", 0.07),
541
+ ("Atrial Fibrillation", 0.05),
542
+ ("Osteoporosis", 0.06),
543
+ ]
544
+
545
+ # Insurance status (US cancer patient distribution, KFF 2023)
546
+ _INSURANCE = [
547
+ ("Private/Employer", 0.48),
548
+ ("Medicare", 0.30),
549
+ ("Medicaid", 0.14),
550
+ ("Uninsured", 0.05),
551
+ ("VA/Military", 0.03),
552
+ ]
553
+ _INS_LABELS = [i[0] for i in _INSURANCE]
554
+ _INS_WEIGHTS = [i[1] for i in _INSURANCE]
555
+
556
+ # ECOG score distribution varies by condition severity
557
+ _ECOG_BY_CONDITION: dict[str, list[float]] = {
558
+ # [P(0), P(1), P(2), P(3)]
559
+ "breast cancer": [0.35, 0.40, 0.18, 0.07],
560
+ "prostate cancer": [0.30, 0.40, 0.20, 0.10],
561
+ "non-small cell lung cancer": [0.20, 0.38, 0.28, 0.14],
562
+ "colorectal cancer": [0.28, 0.40, 0.22, 0.10],
563
+ "ovarian cancer": [0.25, 0.40, 0.25, 0.10],
564
+ "melanoma": [0.40, 0.38, 0.15, 0.07],
565
+ "leukemia": [0.25, 0.38, 0.25, 0.12],
566
+ "lymphoma": [0.28, 0.40, 0.22, 0.10],
567
+ "glioblastoma": [0.15, 0.35, 0.30, 0.20],
568
+ "pancreatic cancer": [0.15, 0.32, 0.33, 0.20],
569
+ "bladder cancer": [0.28, 0.40, 0.22, 0.10],
570
+ "renal cell carcinoma": [0.32, 0.40, 0.20, 0.08],
571
+ "thyroid cancer": [0.50, 0.35, 0.12, 0.03],
572
+ "multiple myeloma": [0.22, 0.38, 0.28, 0.12],
573
+ "endometrial cancer": [0.30, 0.40, 0.22, 0.08],
574
+ "cervical cancer": [0.25, 0.40, 0.25, 0.10],
575
+ "gastric cancer": [0.18, 0.35, 0.30, 0.17],
576
+ "hepatocellular carcinoma": [0.15, 0.32, 0.33, 0.20],
577
+ "head and neck cancer": [0.20, 0.38, 0.28, 0.14],
578
+ "sarcoma": [0.30, 0.40, 0.22, 0.08],
579
+ }
580
+
581
+
582
+ # ── Condition profiles (SEER-weighted) ───────────────────────────────────────
583
+ # count_weight → how many of the 100 k total patients come from this condition
584
+ # biomarker_prevalences → {biomarker_id: probability} (TCGA / literature)
585
+
586
+ _CONDITION_PROFILES: dict[str, dict] = {
587
+ "breast cancer": {
588
+ "icd10_prefix": "C50", "sex": "FEMALE", "count_weight": 0.155,
589
+ "age_range": (25, 82), "age_mode": 62,
590
+ "stages": ["I","II","III","IV"], "stage_weights": [0.28, 0.32, 0.25, 0.15],
591
+ "biomarker_prevalences": {
592
+ "ER_POS":0.75,"PR_POS":0.65,"HER2_POS":0.17,"HER2_NEG":0.83,
593
+ "TNBC":0.12,"BRCA1_MUT":0.05,"BRCA2_MUT":0.04,
594
+ "PIK3CA_MUT":0.35,"TP53_MUT":0.28,
595
+ },
596
+ "med_pool": ["trastuzumab","bevacizumab","capecitabine","olaparib","pembrolizumab"],
597
+ "prior_chemo_rate": 0.65,
598
+ },
599
+ "non-small cell lung cancer": {
600
+ "icd10_prefix": "C34", "sex": "ALL", "count_weight": 0.130,
601
+ "age_range": (40, 84), "age_mode": 68,
602
+ "stages": ["I","II","III","IV"], "stage_weights": [0.09, 0.12, 0.28, 0.51],
603
+ "biomarker_prevalences": {
604
+ "EGFR_L858R":0.08,"EGFR_DEL19":0.09,"EGFR_T790M":0.05,
605
+ "ALK_FUSION":0.04,"ROS1_FUSION":0.02,"MET_EX14":0.03,
606
+ "KRAS_G12C":0.13,"PDL1_HIGH":0.28,"PDL1_LOW":0.30,"PDL1_NEG":0.42,
607
+ },
608
+ "med_pool": ["osimertinib","pembrolizumab","nivolumab","erlotinib","atezolizumab","durvalumab"],
609
+ "prior_chemo_rate": 0.55,
610
+ },
611
+ "prostate cancer": {
612
+ "icd10_prefix": "C61", "sex": "MALE", "count_weight": 0.095,
613
+ "age_range": (45, 86), "age_mode": 67,
614
+ "stages": ["I","II","III","IV"], "stage_weights": [0.18, 0.28, 0.28, 0.26],
615
+ "biomarker_prevalences": {
616
+ "PSA_ELEVATED":0.90,"BRCA2_MUT":0.05,"PTEN_LOSS":0.25,"AR_V7":0.20,
617
+ },
618
+ "med_pool": ["enzalutamide","bevacizumab","olaparib","pembrolizumab"],
619
+ "prior_chemo_rate": 0.40,
620
+ },
621
+ "colorectal cancer": {
622
+ "icd10_prefix": "C18", "sex": "ALL", "count_weight": 0.085,
623
+ "age_range": (35, 82), "age_mode": 65,
624
+ "stages": ["I","II","III","IV"], "stage_weights": [0.18, 0.26, 0.30, 0.26],
625
+ "biomarker_prevalences": {
626
+ "MSI_H":0.10,"MSS":0.90,"KRAS_WT":0.42,
627
+ "BRAF_V600E":0.08,"NRAS_MUT":0.05,"KRAS_G12C":0.04,
628
+ },
629
+ "med_pool": ["bevacizumab","cetuximab","capecitabine","pembrolizumab"],
630
+ "prior_chemo_rate": 0.60,
631
+ },
632
+ "melanoma": {
633
+ "icd10_prefix": "C43", "sex": "ALL", "count_weight": 0.055,
634
+ "age_range": (20, 80), "age_mode": 57,
635
+ "stages": ["I","II","III","IV"], "stage_weights": [0.30, 0.28, 0.22, 0.20],
636
+ "biomarker_prevalences": {
637
+ "BRAF_V600E":0.45,"BRAF_V600K":0.06,"TMB_HIGH":0.35,"NRAS_MEL":0.20,
638
+ },
639
+ "med_pool": ["pembrolizumab","nivolumab","dabrafenib","vemurafenib","ipilimumab"],
640
+ "prior_chemo_rate": 0.30,
641
+ },
642
+ "bladder cancer": {
643
+ "icd10_prefix": "C67", "sex": "ALL", "count_weight": 0.045,
644
+ "age_range": (45, 85), "age_mode": 69,
645
+ "stages": ["I","II","III","IV"], "stage_weights": [0.28, 0.24, 0.26, 0.22],
646
+ "biomarker_prevalences": {
647
+ "FGFR3_MUT":0.20,"PDL1_HIGH":0.22,"TMB_HIGH":0.15,"TP53_MUT":0.30,
648
+ },
649
+ "med_pool": ["pembrolizumab","atezolizumab","nivolumab","erdafitinib"],
650
+ "prior_chemo_rate": 0.45,
651
+ },
652
+ "renal cell carcinoma": {
653
+ "icd10_prefix": "C64", "sex": "ALL", "count_weight": 0.042,
654
+ "age_range": (40, 82), "age_mode": 64,
655
+ "stages": ["I","II","III","IV"], "stage_weights": [0.25, 0.20, 0.25, 0.30],
656
+ "biomarker_prevalences": {
657
+ "VHL_LOSS":0.55,"MTOR_MUT":0.15,"PDL1_HIGH":0.18,
658
+ },
659
+ "med_pool": ["pembrolizumab","nivolumab","bevacizumab","sunitinib"],
660
+ "prior_chemo_rate": 0.25,
661
+ },
662
+ "lymphoma": {
663
+ "icd10_prefix": "C85", "sex": "ALL", "count_weight": 0.042,
664
+ "age_range": (20, 80), "age_mode": 58,
665
+ "stages": ["I","II","III","IV"], "stage_weights": [0.20, 0.25, 0.30, 0.25],
666
+ "biomarker_prevalences": {
667
+ "CD20_POS":0.85,"EZH2_MUT":0.22,"TMB_HIGH":0.12,"PDL1_HIGH":0.15,
668
+ },
669
+ "med_pool": ["rituximab","pembrolizumab","nivolumab"],
670
+ "prior_chemo_rate": 0.55,
671
+ },
672
+ "endometrial cancer": {
673
+ "icd10_prefix": "C54", "sex": "FEMALE", "count_weight": 0.038,
674
+ "age_range": (40, 82), "age_mode": 63,
675
+ "stages": ["I","II","III","IV"], "stage_weights": [0.50, 0.15, 0.20, 0.15],
676
+ "biomarker_prevalences": {
677
+ "MSI_H":0.25,"POLE_MUT":0.07,"CTNNB1_MUT":0.30,"TP53_MUT":0.25,"PIK3CA_MUT":0.35,
678
+ },
679
+ "med_pool": ["pembrolizumab","bevacizumab","olaparib","capecitabine"],
680
+ "prior_chemo_rate": 0.40,
681
+ },
682
+ "leukemia": {
683
+ "icd10_prefix": "C91", "sex": "ALL", "count_weight": 0.035,
684
+ "age_range": (18, 82), "age_mode": 55,
685
+ "stages": ["I","II","III","IV"], "stage_weights": [0.25, 0.25, 0.28, 0.22],
686
+ "biomarker_prevalences": {
687
+ "BCR_ABL1":0.30,"FLT3_ITD":0.25,"NPM1_MUT":0.30,"TP53_MUT":0.15,
688
+ },
689
+ "med_pool": ["imatinib","rituximab","pembrolizumab"],
690
+ "prior_chemo_rate": 0.60,
691
+ },
692
+ "pancreatic cancer": {
693
+ "icd10_prefix": "C25", "sex": "ALL", "count_weight": 0.033,
694
+ "age_range": (40, 82), "age_mode": 68,
695
+ "stages": ["I","II","III","IV"], "stage_weights": [0.05, 0.12, 0.28, 0.55],
696
+ "biomarker_prevalences": {
697
+ "KRAS_G12C":0.07,"BRCA2_MUT":0.06,"TP53_MUT":0.55,"MSI_H":0.02,
698
+ },
699
+ "med_pool": ["capecitabine","erlotinib","olaparib"],
700
+ "prior_chemo_rate": 0.50,
701
+ },
702
+ "thyroid cancer": {
703
+ "icd10_prefix": "C73", "sex": "FEMALE", "count_weight": 0.030,
704
+ "age_range": (20, 75), "age_mode": 47,
705
+ "stages": ["I","II","III","IV"], "stage_weights": [0.55, 0.20, 0.15, 0.10],
706
+ "biomarker_prevalences": {
707
+ "BRAF_THYROID":0.45,"RET_FUSION":0.08,"NTRK_FUSION":0.05,
708
+ },
709
+ "med_pool": ["pembrolizumab","dabrafenib","vemurafenib"],
710
+ "prior_chemo_rate": 0.15,
711
+ },
712
+ "multiple myeloma": {
713
+ "icd10_prefix": "C90", "sex": "ALL", "count_weight": 0.025,
714
+ "age_range": (45, 84), "age_mode": 67,
715
+ "stages": ["I","II","III","IV"], "stage_weights": [0.20, 0.28, 0.30, 0.22],
716
+ "biomarker_prevalences": {
717
+ "WHSC1_MUT":0.20,"CDKN2A_LOSS":0.30,"TP53_MUT":0.15,
718
+ },
719
+ "med_pool": ["pembrolizumab","rituximab","bevacizumab"],
720
+ "prior_chemo_rate": 0.65,
721
+ },
722
+ "gastric cancer": {
723
+ "icd10_prefix": "C16", "sex": "ALL", "count_weight": 0.018,
724
+ "age_range": (35, 82), "age_mode": 65,
725
+ "stages": ["I","II","III","IV"], "stage_weights": [0.10, 0.20, 0.35, 0.35],
726
+ "biomarker_prevalences": {
727
+ "ERBB2_GC":0.15,"MSI_H":0.10,"PDL1_HIGH":0.20,"TP53_MUT":0.40,
728
+ },
729
+ "med_pool": ["trastuzumab","pembrolizumab","nivolumab","capecitabine"],
730
+ "prior_chemo_rate": 0.55,
731
+ },
732
+ "ovarian cancer": {
733
+ "icd10_prefix": "C56", "sex": "FEMALE", "count_weight": 0.018,
734
+ "age_range": (35, 80), "age_mode": 62,
735
+ "stages": ["I","II","III","IV"], "stage_weights": [0.12, 0.14, 0.40, 0.34],
736
+ "biomarker_prevalences": {
737
+ "BRCA1_MUT":0.12,"BRCA2_MUT":0.08,"TP53_MUT":0.60,"PIK3CA_MUT":0.08,
738
+ },
739
+ "med_pool": ["olaparib","bevacizumab","pembrolizumab"],
740
+ "prior_chemo_rate": 0.75,
741
+ },
742
+ "hepatocellular carcinoma": {
743
+ "icd10_prefix": "C22", "sex": "ALL", "count_weight": 0.015,
744
+ "age_range": (35, 80), "age_mode": 62,
745
+ "stages": ["I","II","III","IV"], "stage_weights": [0.10, 0.18, 0.32, 0.40],
746
+ "biomarker_prevalences": {
747
+ "HBV_POS":0.25,"TERT_MUT":0.55,"TP53_MUT":0.20,"CTNNB1_MUT":0.25,
748
+ },
749
+ "med_pool": ["pembrolizumab","nivolumab","bevacizumab","atezolizumab"],
750
+ "prior_chemo_rate": 0.35,
751
+ },
752
+ "glioblastoma": {
753
+ "icd10_prefix": "C71", "sex": "ALL", "count_weight": 0.012,
754
+ "age_range": (30, 76), "age_mode": 62,
755
+ "stages": ["III","IV"], "stage_weights": [0.28, 0.72],
756
+ "biomarker_prevalences": {
757
+ "IDH1_WT":0.90,"IDH1_R132H":0.10,"MGMT_METH":0.45,
758
+ "EGFR_AMP":0.40,"TP53_MUT":0.25,
759
+ },
760
+ "med_pool": ["bevacizumab","pembrolizumab"],
761
+ "prior_chemo_rate": 0.70,
762
+ },
763
+ "head and neck cancer": {
764
+ "icd10_prefix": "C10", "sex": "ALL", "count_weight": 0.012,
765
+ "age_range": (30, 80), "age_mode": 60,
766
+ "stages": ["I","II","III","IV"], "stage_weights": [0.10, 0.15, 0.30, 0.45],
767
+ "biomarker_prevalences": {
768
+ "HPV_HNSC":0.60,"PIK3CA_HNC":0.25,"PDL1_HIGH":0.20,"TP53_MUT":0.45,
769
+ },
770
+ "med_pool": ["pembrolizumab","nivolumab","cetuximab"],
771
+ "prior_chemo_rate": 0.55,
772
+ },
773
+ "cervical cancer": {
774
+ "icd10_prefix": "C53", "sex": "FEMALE", "count_weight": 0.008,
775
+ "age_range": (20, 72), "age_mode": 48,
776
+ "stages": ["I","II","III","IV"], "stage_weights": [0.28, 0.25, 0.25, 0.22],
777
+ "biomarker_prevalences": {
778
+ "HPV_POS":0.99,"PDL1_HIGH":0.25,"PIK3CA_MUT":0.25,
779
+ },
780
+ "med_pool": ["pembrolizumab","bevacizumab","nivolumab"],
781
+ "prior_chemo_rate": 0.50,
782
+ },
783
+ "sarcoma": {
784
+ "icd10_prefix": "C49", "sex": "ALL", "count_weight": 0.007,
785
+ "age_range": (15, 75), "age_mode": 45,
786
+ "stages": ["I","II","III","IV"], "stage_weights": [0.20, 0.25, 0.30, 0.25],
787
+ "biomarker_prevalences": {
788
+ "CDK4_AMP":0.20,"MDM2_AMP":0.18,"TP53_MUT":0.25,
789
+ },
790
+ "med_pool": ["pembrolizumab","nivolumab","bevacizumab"],
791
+ "prior_chemo_rate": 0.45,
792
+ },
793
+ }
794
+
795
+ random.seed(42) # reproducible synthetic data
796
+
797
+
798
+ def _parse_age(age_str: str) -> int | None:
799
+ if not age_str:
800
+ return None
801
+ try:
802
+ return int(age_str.split()[0])
803
+ except Exception:
804
+ return None
805
+
806
+
807
+ def _skewed_age(age_range: tuple[int, int], mode: int) -> int:
808
+ """Triangle-distributed age reflecting real incidence peak."""
809
+ lo, hi = age_range
810
+ mode = max(lo, min(hi, mode))
811
+ return int(random.triangular(lo, hi, mode))
812
+
813
+
814
+ def _pick_biomarkers(prevalences: dict[str, float], rng: random.Random) -> list[str]:
815
+ """Independent Bernoulli draw per biomarker based on literature prevalence."""
816
+ return [bm for bm, p in prevalences.items() if rng.random() < p]
817
+
818
+
819
+ def _pick_comorbidities(rng: random.Random, age: int) -> list[str]:
820
+ """Age-scaled comorbidity draw."""
821
+ scale = 1.0 + max(0, (age - 50)) * 0.015 # comorbidities rise ~1.5% per year after 50
822
+ return [c for c, p in _COMORBIDITY_POOL if rng.random() < min(p * scale, 0.95)]
823
+
824
+
825
+ def _generate_patient(pid: str, condition: str, profile: dict, seq: int, rng: random.Random) -> dict:
826
+ sex_raw = profile["sex"]
827
+ sex = rng.choice(["MALE","FEMALE"]) if sex_raw == "ALL" else sex_raw
828
+
829
+ age = _skewed_age(profile["age_range"], profile["age_mode"])
830
+ stage = rng.choices(profile["stages"], weights=profile["stage_weights"])[0]
831
+ ecog_weights = _ECOG_BY_CONDITION.get(condition, [0.28, 0.40, 0.22, 0.10])
832
+ ecog = rng.choices([0, 1, 2, 3], weights=ecog_weights)[0]
833
+
834
+ eth_group = rng.choices(_ETH_NAMES, weights=_ETH_WEIGHTS)[0]
835
+ ethnicity, names_f, names_m, last_names = eth_group
836
+ first = rng.choice(names_f if sex == "FEMALE" else names_m)
837
+ last = rng.choice(last_names)
838
+
839
+ city, state = rng.choices(_CITY_NAMES, weights=_CITY_WEIGHTS)[0]
840
+ insurance = rng.choices(_INS_LABELS, weights=_INS_WEIGHTS)[0]
841
+
842
+ biomarkers = _pick_biomarkers(profile["biomarker_prevalences"], rng)
843
+ comorbidities = _pick_comorbidities(rng, age)
844
+
845
+ med_pool = profile["med_pool"]
846
+ n_med = min(rng.randint(1, 2), len(med_pool))
847
+ medications = rng.sample(med_pool, n_med)
848
+
849
+ prior_chemo = rng.random() < profile.get("prior_chemo_rate", 0.5)
850
+ prior_radiation = rng.random() < 0.35
851
+ prior_surgery = rng.random() < 0.50
852
+ prior_lines = rng.randint(0, 3) if prior_chemo else 0
853
+
854
+ return {
855
+ "id": pid,
856
+ "name": f"{first} {last}",
857
+ "age": age,
858
+ "sex": sex,
859
+ "stage": stage,
860
+ "ecog": ecog,
861
+ "condition": condition,
862
+ "icd10_prefix": profile["icd10_prefix"],
863
+ "city": city,
864
+ "state": state,
865
+ "ethnicity": ethnicity,
866
+ "insurance": insurance,
867
+ "biomarkers": biomarkers,
868
+ "medications": medications,
869
+ "comorbidities": comorbidities,
870
+ "prior_chemo": prior_chemo,
871
+ "prior_radiation": prior_radiation,
872
+ "prior_surgery": prior_surgery,
873
+ "prior_lines_of_therapy": prior_lines,
874
+ "source": "synthetic_v2",
875
+ }
876
+
877
+
878
+ # ── Batch write helpers ───────────────────────────────────────────────────────
879
+
880
+ _BATCH_SIZE = 500
881
+
882
+
883
+ def _batch_write_patients(patients: list[dict]) -> None:
884
+ neo4j_conn.run_query("""
885
+ UNWIND $patients AS p
886
+ MERGE (n:Patient {id: p.id})
887
+ SET n += {
888
+ name: p.name, age: p.age, sex: p.sex, stage: p.stage,
889
+ ecog: p.ecog, condition: p.condition, icd10_prefix: p.icd10_prefix,
890
+ city: p.city, state: p.state, ethnicity: p.ethnicity,
891
+ insurance: p.insurance, biomarkers: p.biomarkers,
892
+ medications: p.medications, comorbidities: p.comorbidities,
893
+ prior_chemo: p.prior_chemo, prior_radiation: p.prior_radiation,
894
+ prior_surgery: p.prior_surgery,
895
+ prior_lines_of_therapy: p.prior_lines_of_therapy,
896
+ source: p.source, updated_at: datetime()
897
+ }
898
+ """, {"patients": patients})
899
+
900
+
901
+ def _batch_write_biomarker_links(links: list[dict]) -> None:
902
+ neo4j_conn.run_query("""
903
+ UNWIND $links AS l
904
+ MATCH (p:Patient {id: l.pid})
905
+ MATCH (b:Biomarker {id: l.bm_id})
906
+ MERGE (p)-[:HAS_BIOMARKER]->(b)
907
+ """, {"links": links})
908
+
909
+
910
+ def _batch_write_diagnosis_links(links: list[dict]) -> None:
911
+ # links already have resolved diagnosis_code (exact match, no scan needed)
912
+ neo4j_conn.run_query("""
913
+ UNWIND $links AS l
914
+ MATCH (p:Patient {id: l.pid})
915
+ MATCH (d:Diagnosis {code: l.diagnosis_code})
916
+ MERGE (p)-[:HAS_DIAGNOSIS]->(d)
917
+ """, {"links": links})
918
+
919
+
920
+ def _batch_write_eligibility(edges: list[dict]) -> None:
921
+ neo4j_conn.run_query("""
922
+ UNWIND $edges AS e
923
+ MATCH (p:Patient {id: e.pid})
924
+ MATCH (t:Trial {id: e.tid})
925
+ MERGE (p)-[r:ELIGIBLE_FOR]->(t)
926
+ SET r.score = e.score, r.matched_at = datetime()
927
+ """, {"edges": edges})
928
+
929
+
930
+ # ── Main patient seeder ───────────────────────────────────────────────────────
931
+
932
+ def seed_patients_and_eligibility(total_patients: int = 100_000) -> int:
933
+ print(f"\n[6/6] Generating {total_patients:,} clinically-informed synthetic patients...")
934
+ print(" (SEER incidence weights · TCGA biomarker prevalence · US Census demographics)")
935
+
936
+ # Pre-load trials grouped by condition
937
+ trial_rows = neo4j_conn.run_query("""
938
+ MATCH (t:Trial {status: 'RECRUITING'})
939
+ RETURN t.id AS id, t.condition AS condition, t.sex AS sex,
940
+ t.min_age AS min_age, t.max_age AS max_age
941
+ """)
942
+ trials_by_condition: dict[str, list[dict]] = {}
943
+ for row in (trial_rows or []):
944
+ cond = (row.get("condition") or "").lower().strip()
945
+ trials_by_condition.setdefault(cond, []).append(row)
946
+
947
+ # Calculate per-condition counts from SEER weights
948
+ total_weight = sum(p["count_weight"] for p in _CONDITION_PROFILES.values())
949
+ condition_counts = {
950
+ cond: max(1, round(total_patients * prof["count_weight"] / total_weight))
951
+ for cond, prof in _CONDITION_PROFILES.items()
952
+ }
953
+ # Adjust rounding error so we hit exactly total_patients
954
+ allocated = sum(condition_counts.values())
955
+ diff = total_patients - allocated
956
+ largest = max(condition_counts, key=lambda c: condition_counts[c])
957
+ condition_counts[largest] += diff
958
+
959
+ # Pre-load one canonical Diagnosis code per ICD-10 prefix
960
+ all_prefixes = list({p["icd10_prefix"] for p in _CONDITION_PROFILES.values()})
961
+ dx_canon: dict[str, str] = {}
962
+ for prefix in all_prefixes:
963
+ rows = neo4j_conn.run_query(
964
+ "MATCH (d:Diagnosis) WHERE d.code STARTS WITH $p RETURN d.code AS code ORDER BY d.code LIMIT 1",
965
+ {"p": prefix}
966
+ )
967
+ if rows:
968
+ dx_canon[prefix] = rows[0]["code"]
969
+
970
+ # Check existing patients per condition to allow resume
971
+ existing_rows = neo4j_conn.run_query("""
972
+ MATCH (p:Patient) WHERE p.source = 'synthetic_v2'
973
+ RETURN p.condition AS condition, count(p) AS cnt
974
+ """)
975
+ existing_by_condition: dict[str, int] = {
976
+ r["condition"]: r["cnt"] for r in (existing_rows or []) if r.get("condition")
977
+ }
978
+
979
+ rng = random.Random(42)
980
+ grand_total = 0
981
+ grand_edges = 0
982
+
983
+ for condition, profile in _CONDITION_PROFILES.items():
984
+ icd_prefix = profile["icd10_prefix"]
985
+ n = condition_counts[condition]
986
+ already = existing_by_condition.get(condition, 0)
987
+ condition_trials = trials_by_condition.get(condition, [])
988
+
989
+ if already >= n:
990
+ print(f" {condition}: {n:,} patients — already done, skipping")
991
+ grand_total += n
992
+ # advance RNG to stay deterministic
993
+ for _ in range(n):
994
+ rng.random()
995
+ continue
996
+
997
+ skip = already
998
+ todo = n - skip
999
+ print(f" {condition}: {n:,} patients ({len(condition_trials)} trials)"
1000
+ + (f" [resuming from {skip:,}]" if skip else ""))
1001
+
1002
+ patient_batch: list[dict] = []
1003
+ bm_links: list[dict] = []
1004
+ dx_links: list[dict] = []
1005
+ elig_edges: list[dict] = []
1006
+
1007
+ # Advance RNG past already-written patients so IDs/values stay consistent
1008
+ for _ in range(skip):
1009
+ rng.random()
1010
+
1011
+ condition_written = 0
1012
+ for i in range(skip, n):
1013
+ pid = f"P_{icd_prefix}_{grand_total + i + 1:06d}"
1014
+ p = _generate_patient(pid, condition, profile, i, rng)
1015
+
1016
+ patient_batch.append(p)
1017
+ if icd_prefix in dx_canon:
1018
+ dx_links.append({"pid": pid, "diagnosis_code": dx_canon[icd_prefix]})
1019
+ for bm in p["biomarkers"]:
1020
+ bm_links.append({"pid": pid, "bm_id": bm})
1021
+
1022
+ # Eligibility edges — apply sex/age/ECOG filters
1023
+ for trial in condition_trials:
1024
+ t_sex = (trial.get("sex") or "ALL").upper()
1025
+ t_min = _parse_age(trial.get("min_age") or "")
1026
+ t_max = _parse_age(trial.get("max_age") or "")
1027
+ if t_sex not in ("ALL", "BOTH", p["sex"]):
1028
+ continue
1029
+ if t_min is not None and p["age"] < t_min:
1030
+ continue
1031
+ if t_max is not None and p["age"] > t_max:
1032
+ continue
1033
+ if p["ecog"] > 2:
1034
+ continue
1035
+ base = rng.uniform(0.55, 0.90)
1036
+ bm_bonus = 0.08 if p["biomarkers"] else 0.0
1037
+ score = round(min(base + bm_bonus, 0.99), 2)
1038
+ elig_edges.append({"pid": pid, "tid": trial["id"], "score": score})
1039
+
1040
+ condition_written += 1
1041
+
1042
+ # Flush batches
1043
+ if len(patient_batch) >= _BATCH_SIZE:
1044
+ _batch_write_patients(patient_batch)
1045
+ _batch_write_diagnosis_links(dx_links)
1046
+ if bm_links:
1047
+ _batch_write_biomarker_links(bm_links)
1048
+ if elig_edges:
1049
+ _batch_write_eligibility(elig_edges)
1050
+ grand_edges += len(elig_edges)
1051
+ patient_batch, dx_links, bm_links, elig_edges = [], [], [], []
1052
+
1053
+ # Flush remainder
1054
+ if patient_batch:
1055
+ _batch_write_patients(patient_batch)
1056
+ _batch_write_diagnosis_links(dx_links)
1057
+ if bm_links:
1058
+ _batch_write_biomarker_links(bm_links)
1059
+ if elig_edges:
1060
+ _batch_write_eligibility(elig_edges)
1061
+ grand_edges += len(elig_edges)
1062
+
1063
+ grand_total += n
1064
+ print(f" ↳ wrote {condition_written:,} patients | total so far: {grand_total:,}/{total_patients:,} | edges: {grand_edges:,}")
1065
+
1066
+ print(f"\n ✓ Total patients: {grand_total:,}")
1067
+ print(f" ✓ Total ELIGIBLE_FOR edges: {grand_edges:,}")
1068
+ return grand_total
1069
+
1070
+
1071
+ # ── Main entry point ──────────────────────────────────────────────────────────
1072
+
1073
+ async def run_seeder(conditions: list[str] | None = None):
1074
+ start = time.time()
1075
+ print("=" * 60)
1076
+ print("ClinicalMatch AI — Graph Seeder v2")
1077
+ print("100 k synthetic patients · 20 oncology conditions")
1078
+ print("=" * 60)
1079
+
1080
+ async with httpx.AsyncClient(headers={"User-Agent": "ClinicalMatchAI/2.0 (hackathon@research.org)"}) as client:
1081
+ n_trials = await seed_trials(client)
1082
+ n_meds = await seed_medications(client)
1083
+ n_dx = await seed_diagnoses(client)
1084
+ n_pubs = await seed_literature(client)
1085
+
1086
+ n_bm = seed_biomarkers()
1087
+ derive_eligibility_relationships()
1088
+ n_patients = seed_patients_and_eligibility(total_patients=100_000)
1089
+
1090
+ elapsed = time.time() - start
1091
+ print(f"\n{'=' * 60}")
1092
+ print(f"Seeding complete in {elapsed / 60:.1f} min")
1093
+ print(f" Trials: {n_trials}")
1094
+ print(f" Medications: {n_meds}")
1095
+ print(f" Diagnoses: {n_dx}")
1096
+ print(f" Publications: {n_pubs}")
1097
+ print(f" Biomarkers: {n_bm}")
1098
+ print(f" Patients: {n_patients:,}")
1099
+ print("=" * 60)
1100
+
1101
+
1102
+ def seed_sync():
1103
+ asyncio.run(run_seeder())
1104
+
1105
+
1106
+ if __name__ == "__main__":
1107
+ import sys
1108
+ conditions = sys.argv[1:] if len(sys.argv) > 1 else None
1109
+ asyncio.run(run_seeder(conditions))
backend/graphrag.py ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain_community.graphs import Neo4jGraph
2
+ from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain
3
+ from langchain_openai import ChatOpenAI
4
+ from langchain_core.prompts import PromptTemplate
5
+ from langchain_core.messages import BaseMessage, AIMessage
6
+ from langchain_core.outputs import ChatResult, ChatGeneration
7
+ import re
8
+ import os
9
+ from dotenv import load_dotenv
10
+
11
+ load_dotenv()
12
+
13
+ graph = Neo4jGraph(
14
+ url=os.getenv("NEO4J_URI"),
15
+ username=os.getenv("NEO4J_USERNAME"),
16
+ password=os.getenv("NEO4J_PASSWORD"),
17
+ database=os.getenv("NEO4J_DATABASE", "neo4j"),
18
+ )
19
+
20
+
21
+ def _strip_thinking(text: str) -> str:
22
+ """Remove <think>...</think> blocks that reasoning models emit before the actual answer."""
23
+ # Strip block tags (including variations like <thinking>)
24
+ text = re.sub(r"<think(?:ing)?>.*?</think(?:ing)?>", "", text, flags=re.DOTALL | re.IGNORECASE)
25
+ return text.strip()
26
+
27
+
28
+ class _ThinkStrippedLLM(ChatOpenAI):
29
+ """ChatOpenAI wrapper that strips <think> reasoning tokens from every response."""
30
+
31
+ def _create_chat_result(self, response, generation_info=None) -> ChatResult:
32
+ result: ChatResult = super()._create_chat_result(response, generation_info)
33
+ cleaned = []
34
+ for gen in result.generations:
35
+ raw = gen.message.content or ""
36
+ clean = _strip_thinking(raw)
37
+ cleaned.append(ChatGeneration(message=AIMessage(content=clean), generation_info=gen.generation_info))
38
+ return ChatResult(generations=cleaned, llm_output=result.llm_output)
39
+
40
+
41
+ llm = _ThinkStrippedLLM(
42
+ model=os.getenv("OPENAI_MODEL", "qwen/qwen3-32b"),
43
+ openai_api_key=os.getenv("OPENAI_API_KEY"),
44
+ openai_api_base=os.getenv("OPENAI_BASE_URL"),
45
+ temperature=0,
46
+ )
47
+
48
+ _CYPHER_GENERATION_TEMPLATE = """You are an expert Neo4j Cypher query writer for a clinical trial matching system.
49
+
50
+ Schema:
51
+ {schema}
52
+
53
+ Node property conventions (IMPORTANT — use these exact property names and value formats):
54
+ - Patient: id (e.g. "P-001"), name, age (integer), sex ("M"/"F"), ethnicity, city, state, ecog_score (integer)
55
+ - Trial: id (NCT id), title, condition (lowercase, e.g. "breast cancer"), phase, status, sponsor
56
+ - Diagnosis: id, name (e.g. "Breast Cancer"), icd10 (e.g. "C50")
57
+ - Biomarker: id (e.g. "HER2_POS", "EGFR_MUT", "BRCA1_MUT", "PD_L1_POS"), name (e.g. "HER2 Positive", "EGFR Mutation")
58
+ - Medication: id (e.g. "TAMOXIFEN"), name (e.g. "Tamoxifen")
59
+ - StudySite: id, name, city, state, lat, lon, trials (integer), enrolled (integer), capacity (integer)
60
+
61
+ Relationships:
62
+ - (Patient)-[:ELIGIBLE_FOR {{score: float}}]->(Trial)
63
+ - (Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
64
+ - (Patient)-[:HAS_BIOMARKER]->(Biomarker)
65
+ - (Patient)-[:TAKES_MEDICATION]->(Medication)
66
+ - (Trial)-[:LOCATED_AT]->(StudySite)
67
+
68
+ Rules:
69
+ - For biomarker lookups, use the `id` property with uppercase underscore format, e.g. `{{id: 'HER2_POS'}}` NOT `{{name: 'HER2', status: 'positive'}}`
70
+ - For condition lookups on Trial nodes, use lowercase: `t.condition = 'breast cancer'`
71
+ - Always use relationship pattern (Patient)-[:ELIGIBLE_FOR]->(Trial) to find eligible patients
72
+ - Limit results to 25 unless asked for more
73
+
74
+ Question: {question}
75
+ Cypher query:"""
76
+
77
+ _CYPHER_PROMPT = PromptTemplate(
78
+ input_variables=["schema", "question"],
79
+ template=_CYPHER_GENERATION_TEMPLATE,
80
+ )
81
+
82
+ graph_chain = GraphCypherQAChain.from_llm(
83
+ llm=llm,
84
+ graph=graph,
85
+ verbose=True,
86
+ allow_dangerous_requests=True,
87
+ cypher_prompt=_CYPHER_PROMPT,
88
+ )
89
+
90
+
91
+ def retrieve_patient_trial_matches(patient_id: str) -> list:
92
+ query = f"""
93
+ MATCH (p:Patient {{id: '{patient_id}'}})-[:HAS_DIAGNOSIS]->(d:Diagnosis)-[:ELIGIBLE_FOR]->(t:Trial)
94
+ RETURN p.id as patient, d.name as diagnosis, t.id as trial, t.phase as phase, t.condition as condition
95
+ """
96
+ try:
97
+ return graph.query(query)
98
+ except Exception as e:
99
+ print(f"[graphrag] query error: {e}")
100
+ return []
101
+
102
+
103
+ def rag_query(question: str) -> str:
104
+ try:
105
+ result = graph_chain.run(question)
106
+ return _strip_thinking(result) if result else "No results found."
107
+ except Exception as e:
108
+ err = str(e)
109
+ # Surface a clean message instead of the raw Neo4j stack trace
110
+ if "<think>" in err or "SyntaxError" in err:
111
+ return "The query model returned unexpected output. Please rephrase your question (e.g. 'List patients eligible for breast cancer trials')."
112
+ return f"Graph query error: {err}"
113
+
114
+
115
+ def get_graph_stats() -> dict:
116
+ try:
117
+ result = graph.query("""
118
+ MATCH (p:Patient) WITH count(p) as patients
119
+ MATCH (t:Trial) WITH patients, count(t) as trials
120
+ MATCH (d:Diagnosis) WITH patients, trials, count(d) as diagnoses
121
+ RETURN patients, trials, diagnoses
122
+ """)
123
+ return {**(result[0] if result else {}), "status": "connected"}
124
+ except Exception as e:
125
+ return {"patients": 0, "trials": 0, "diagnoses": 0, "status": str(e)}
backend/intake_matching.py ADDED
@@ -0,0 +1,374 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Intake-based trial matching — accepts raw clinical data (SI units) and scores
3
+ it against Trial nodes in the graph. No patient ID required.
4
+
5
+ SI unit reference:
6
+ Hemoglobin: g/dL (×10 → g/L)
7
+ WBC: ×10⁹/L
8
+ ANC: ×10⁹/L
9
+ Platelets: ×10⁹/L
10
+ Creatinine: μmol/L (÷88.4 → mg/dL)
11
+ eGFR: mL/min/1.73m²
12
+ Bilirubin: μmol/L (÷17.1 → mg/dL)
13
+ ALT/AST: U/L
14
+ Albumin: g/dL
15
+ """
16
+ import re
17
+ import uuid
18
+ from typing import Optional
19
+ from neo4j_setup import neo4j_conn
20
+
21
+
22
+ # ── Biomarker registry ────────────────────────────────────────────────────────
23
+ # Maps graph node id → human label → search terms found in eligibility text
24
+ BIOMARKER_REGISTRY = {
25
+ "HER2_POS": ("HER2 Positive", ["HER2-positive", "HER2+", "HER2 amplified", "HER2/neu positive"]),
26
+ "HER2_NEG": ("HER2 Negative", ["HER2-negative", "HER2-"]),
27
+ "ER_POS": ("ER Positive", ["ER-positive", "ER+", "estrogen receptor positive"]),
28
+ "PR_POS": ("PR Positive", ["PR-positive", "PR+", "progesterone receptor positive"]),
29
+ "BRCA1_MUT": ("BRCA1 Mutation", ["BRCA1", "BRCA1 mutation", "BRCA1-mutated"]),
30
+ "BRCA2_MUT": ("BRCA2 Mutation", ["BRCA2", "BRCA2 mutation", "BRCA2-mutated"]),
31
+ "EGFR_MUT": ("EGFR Mutation", ["EGFR mutation", "EGFR-mutated", "EGFR exon 19", "EGFR exon 21"]),
32
+ "ALK_POS": ("ALK Rearrangement",["ALK rearrangement", "ALK-positive", "ALK fusion"]),
33
+ "ROS1_POS": ("ROS1 Rearrangement",["ROS1 rearrangement", "ROS1-positive", "ROS1 fusion"]),
34
+ "PD_L1_POS": ("PD-L1 Positive", ["PD-L1", "PD-L1 positive", "PDL1"]),
35
+ "KRAS_WT": ("KRAS Wild-type", ["KRAS wild-type", "KRAS WT", "KRAS-wildtype"]),
36
+ "BRAF_MUT": ("BRAF V600E", ["BRAF V600E", "BRAF mutation", "BRAF-mutated"]),
37
+ "MSI_H": ("MSI-High", ["MSI-H", "microsatellite instability-high", "MSI high", "dMMR"]),
38
+ "NRAS_MUT": ("NRAS Mutation", ["NRAS mutation", "NRAS-mutated"]),
39
+ "FLT3_MUT": ("FLT3 Mutation", ["FLT3 mutation", "FLT3-mutated", "FLT3-ITD"]),
40
+ "IDH1_MUT": ("IDH1 Mutation", ["IDH1 mutation", "IDH1-mutated"]),
41
+ "IDH2_MUT": ("IDH2 Mutation", ["IDH2 mutation", "IDH2-mutated"]),
42
+ "BCR_ABL": ("BCR-ABL", ["BCR-ABL", "Philadelphia chromosome", "Ph-positive"]),
43
+ "TRIPLE_NEG":("Triple Negative", ["triple-negative", "TNBC", "triple negative breast"]),
44
+ }
45
+
46
+
47
+ # ── Age parsing ───────────────────────────────────────────────────────────────
48
+
49
+ def _parse_age_years(age_str: str) -> Optional[int]:
50
+ """'45 Years' → 45, '6 Months' → 0, '' → None"""
51
+ if not age_str:
52
+ return None
53
+ m = re.search(r"(\d+)\s*year", age_str, re.I)
54
+ if m:
55
+ return int(m.group(1))
56
+ m = re.search(r"(\d+)\s*month", age_str, re.I)
57
+ if m:
58
+ return 0
59
+ m = re.search(r"(\d+)", age_str)
60
+ if m:
61
+ return int(m.group(1))
62
+ return None
63
+
64
+
65
+ # ── ECOG parsing from eligibility text ────────────────────────────────────────
66
+
67
+ def _max_ecog_from_text(text: str) -> Optional[int]:
68
+ """Extract maximum allowed ECOG from eligibility criteria text."""
69
+ patterns = [
70
+ r"ECOG\s+(?:performance\s+status\s+)?(?:of\s+)?(?:0\s*(?:or|-)\s*)?([0-4])",
71
+ r"performance\s+status\s+(?:of\s+)?(?:0\s*(?:or|-)\s*)?([0-4])",
72
+ r"Karnofsky\s+.*?(\d{2,3})\s*%", # convert KPS to ECOG approximately
73
+ ]
74
+ for pat in patterns:
75
+ m = re.search(pat, text, re.I)
76
+ if m:
77
+ val = int(m.group(1))
78
+ if "Karnofsky" in pat:
79
+ # KPS 80-100 ≈ ECOG 0-1, 60-70 ≈ 2, 40-50 ≈ 3
80
+ kps = val
81
+ val = 0 if kps >= 80 else 1 if kps >= 70 else 2 if kps >= 60 else 3
82
+ return val
83
+ return None
84
+
85
+
86
+ # ── Lab value checking against eligibility text ───────────────────────────────
87
+
88
+ def _check_labs(labs: dict, eligibility_text: str) -> list[dict]:
89
+ """
90
+ Parse common lab thresholds from eligibility text and check patient values.
91
+ Returns list of {criterion, patient_value, threshold, met}.
92
+ """
93
+ results = []
94
+ text = eligibility_text or ""
95
+
96
+ def _find_threshold(patterns):
97
+ for pat in patterns:
98
+ m = re.search(pat, text, re.I)
99
+ if m:
100
+ return float(m.group(1))
101
+ return None
102
+
103
+ # Hemoglobin ≥ threshold (g/dL in text; patient value in g/dL)
104
+ hgb = labs.get("hemoglobin")
105
+ if hgb is not None:
106
+ # Try to find "hemoglobin >= X" or "Hgb >= X g/dL"
107
+ thresh = _find_threshold([
108
+ r"hemoglobin\s*[≥>=]+\s*([\d.]+)\s*g/dL",
109
+ r"Hgb\s*[≥>=]+\s*([\d.]+)",
110
+ r"hemoglobin\s+of\s+at\s+least\s+([\d.]+)",
111
+ ])
112
+ if thresh:
113
+ results.append({"criterion": f"Hemoglobin ≥ {thresh} g/dL", "patient_value": f"{hgb} g/dL", "met": hgb >= thresh})
114
+
115
+ # Platelets ≥ threshold (×10⁹/L)
116
+ plt = labs.get("platelets")
117
+ if plt is not None:
118
+ thresh = _find_threshold([
119
+ r"platelet[s]?\s*[≥>=]+\s*([\d,]+)\s*[×x]?\s*10[⁹9]/L",
120
+ r"platelet[s]?\s+count\s*[≥>=]+\s*([\d,]+)",
121
+ r"platelet[s]?\s+of\s+at\s+least\s+([\d,]+)",
122
+ ])
123
+ if thresh:
124
+ thresh_val = thresh / 1000 if thresh > 1000 else thresh # normalise if stored as /µL
125
+ results.append({"criterion": f"Platelets ≥ {thresh_val} ×10⁹/L", "patient_value": f"{plt} ×10⁹/L", "met": plt >= thresh_val})
126
+
127
+ # Creatinine ≤ threshold (μmol/L patient; text may be mg/dL or μmol/L)
128
+ cr = labs.get("creatinine") # patient value in μmol/L
129
+ if cr is not None:
130
+ # Most trial text uses mg/dL; convert patient value for comparison
131
+ cr_mgdl = cr / 88.4
132
+ thresh = _find_threshold([
133
+ r"creatinine\s*[≤<=]+\s*([\d.]+)\s*mg/dL",
134
+ r"serum\s+creatinine\s*[≤<=]+\s*([\d.]+)",
135
+ ])
136
+ if thresh:
137
+ results.append({"criterion": f"Creatinine ≤ {thresh} mg/dL ({round(thresh*88.4)} μmol/L)", "patient_value": f"{cr} μmol/L ({round(cr_mgdl, 2)} mg/dL)", "met": cr_mgdl <= thresh})
138
+
139
+ # eGFR ≥ threshold
140
+ egfr = labs.get("egfr")
141
+ if egfr is not None:
142
+ thresh = _find_threshold([
143
+ r"(?:eGFR|GFR|creatinine\s+clearance)\s*[≥>=]+\s*([\d.]+)",
144
+ r"glomerular\s+filtration\s+rate\s*[≥>=]+\s*([\d.]+)",
145
+ ])
146
+ if thresh:
147
+ results.append({"criterion": f"eGFR ≥ {thresh} mL/min/1.73m²", "patient_value": f"{egfr} mL/min", "met": egfr >= thresh})
148
+
149
+ # Bilirubin ≤ threshold (μmol/L patient; text usually mg/dL)
150
+ bili = labs.get("bilirubin")
151
+ if bili is not None:
152
+ bili_mgdl = bili / 17.1
153
+ thresh = _find_threshold([
154
+ r"(?:total\s+)?bilirubin\s*[≤<=]+\s*([\d.]+)\s*(?:×\s*)?ULN",
155
+ r"(?:total\s+)?bilirubin\s*[≤<=]+\s*([\d.]+)\s*mg/dL",
156
+ ])
157
+ if thresh:
158
+ # If "× ULN", ULN for bilirubin ≈ 1.0 mg/dL
159
+ results.append({"criterion": f"Bilirubin ≤ {thresh} mg/dL ({round(thresh*17.1)} μmol/L)", "patient_value": f"{bili} μmol/L ({round(bili_mgdl, 2)} mg/dL)", "met": bili_mgdl <= thresh})
160
+
161
+ # ANC ≥ threshold (×10⁹/L)
162
+ anc = labs.get("anc")
163
+ if anc is not None:
164
+ thresh = _find_threshold([
165
+ r"(?:ANC|absolute\s+neutrophil\s+count)\s*[≥>=]+\s*([\d.]+)\s*[×x]?\s*10[⁹9]/L",
166
+ r"neutrophil[s]?\s*[≥>=]+\s*([\d.]+)",
167
+ ])
168
+ if thresh:
169
+ results.append({"criterion": f"ANC ≥ {thresh} ×10⁹/L", "patient_value": f"{anc} ×10⁹/L", "met": anc >= thresh})
170
+
171
+ return results
172
+
173
+
174
+ # ── Main scoring function ─────────────────────────────────────────────────────
175
+
176
+ def score_intake_against_trial(intake: dict, trial: dict) -> dict:
177
+ """
178
+ Score a clinical intake profile against a single trial.
179
+ Returns {score, eligible, criteria_breakdown, risk_flags}.
180
+ """
181
+ breakdown = []
182
+ risk_flags = []
183
+ points = 0
184
+ max_points = 0
185
+
186
+ age = intake.get("age")
187
+ sex = intake.get("sex", "").upper()
188
+ ecog = intake.get("ecog")
189
+ biomarkers = set(intake.get("biomarkers", []))
190
+ labs = intake.get("labs", {})
191
+ prior_chemo = intake.get("prior_chemo", False)
192
+ eligibility_text = trial.get("eligibility_criteria", "")
193
+
194
+ # ── Age (25 pts) ──────────────────────────────────────────────────────────
195
+ max_points += 25
196
+ min_age = _parse_age_years(trial.get("min_age", ""))
197
+ max_age = _parse_age_years(trial.get("max_age", ""))
198
+ if age is not None:
199
+ age_ok = True
200
+ note = ""
201
+ if min_age and age < min_age:
202
+ age_ok = False
203
+ note = f"Trial requires ≥{min_age} years"
204
+ risk_flags.append(f"Below minimum age ({age} < {min_age})")
205
+ if max_age and age > max_age:
206
+ age_ok = False
207
+ note = f"Trial requires ≤{max_age} years"
208
+ risk_flags.append(f"Above maximum age ({age} > {max_age})")
209
+ if age_ok:
210
+ points += 25
211
+ note = f"Within range ({min_age or '≥18'}–{max_age or 'no max'})"
212
+ breakdown.append({"criterion": "Age", "met": age_ok, "patient_value": f"{age} years", "note": note, "category": "demographics"})
213
+
214
+ # ── Sex (15 pts) ──────────────────────────────────────────────────────────
215
+ max_points += 15
216
+ trial_sex = (trial.get("sex") or "ALL").upper()
217
+ sex_ok = trial_sex in ("ALL", sex, "")
218
+ if not sex_ok:
219
+ risk_flags.append(f"Sex mismatch (trial requires {trial_sex})")
220
+ else:
221
+ points += 15
222
+ breakdown.append({"criterion": "Sex", "met": sex_ok, "patient_value": sex or "Not specified", "note": f"Trial: {trial_sex}", "category": "demographics"})
223
+
224
+ # ── ECOG (15 pts) ─────────────────────────────────────────────────────────
225
+ max_points += 15
226
+ max_ecog = _max_ecog_from_text(eligibility_text)
227
+ if ecog is not None and max_ecog is not None:
228
+ ecog_ok = ecog <= max_ecog
229
+ if not ecog_ok:
230
+ risk_flags.append(f"ECOG {ecog} exceeds trial max ({max_ecog})")
231
+ else:
232
+ points += 15
233
+ breakdown.append({"criterion": "ECOG Performance Status", "met": ecog_ok, "patient_value": f"ECOG {ecog}", "note": f"Trial requires ≤{max_ecog}", "category": "performance"})
234
+ elif ecog is not None:
235
+ points += 10 # partial credit — can't verify from text
236
+ breakdown.append({"criterion": "ECOG Performance Status", "met": None, "patient_value": f"ECOG {ecog}", "note": "Could not parse limit from trial text", "category": "performance"})
237
+
238
+ # ── Biomarkers (30 pts) ───────────────────────────────────────────────────
239
+ max_points += 30
240
+ if biomarkers:
241
+ matched_bm = []
242
+ for bm_id in biomarkers:
243
+ info = BIOMARKER_REGISTRY.get(bm_id)
244
+ if not info:
245
+ continue
246
+ label, search_terms = info
247
+ found_in_text = any(term.lower() in eligibility_text.lower() for term in search_terms)
248
+ matched_bm.append((label, found_in_text))
249
+
250
+ relevant = [m for m in matched_bm if m[1]]
251
+ if relevant:
252
+ points += 30
253
+ breakdown.append({
254
+ "criterion": "Biomarker Profile",
255
+ "met": True,
256
+ "patient_value": ", ".join(l for l, _ in relevant),
257
+ "note": f"{len(relevant)} of your biomarkers appear in trial criteria",
258
+ "category": "molecular",
259
+ })
260
+ elif matched_bm:
261
+ points += 5
262
+ breakdown.append({
263
+ "criterion": "Biomarker Profile",
264
+ "met": None,
265
+ "patient_value": ", ".join(l for l, _ in matched_bm),
266
+ "note": "None of your biomarkers explicitly appear in criteria",
267
+ "category": "molecular",
268
+ })
269
+
270
+ # ── Lab values (15 pts) ───────────────────────────────────────────────────
271
+ if labs:
272
+ max_points += 15
273
+ lab_results = _check_labs(labs, eligibility_text)
274
+ if lab_results:
275
+ all_ok = all(r["met"] for r in lab_results)
276
+ any_fail = any(not r["met"] for r in lab_results)
277
+ if all_ok:
278
+ points += 15
279
+ elif not any_fail:
280
+ points += 8
281
+ for r in lab_results:
282
+ if not r["met"]:
283
+ risk_flags.append(f"Lab out of range: {r['criterion']}")
284
+ for r in lab_results:
285
+ breakdown.append({
286
+ "criterion": r["criterion"],
287
+ "met": r["met"],
288
+ "patient_value": r["patient_value"],
289
+ "note": "",
290
+ "category": "labs",
291
+ })
292
+ else:
293
+ points += 8 # no parseable lab criteria — give partial credit
294
+
295
+ score = points / max_points if max_points > 0 else 0
296
+ eligible = score >= 0.65 and not any("mismatch" in f or "exceeds" in f for f in risk_flags)
297
+
298
+ return {
299
+ "score": round(score, 3),
300
+ "eligible": eligible,
301
+ "criteria_breakdown": breakdown,
302
+ "risk_flags": risk_flags,
303
+ "points": points,
304
+ "max_points": max_points,
305
+ }
306
+
307
+
308
+ # ── Graph query + batch scoring ───────────────────────────────────────────────
309
+
310
+ def match_intake_to_trials(intake: dict, condition: str, limit: int = 10) -> list[dict]:
311
+ """
312
+ Query trials from the graph matching the condition, score each against intake,
313
+ return ranked list.
314
+ """
315
+ rows = neo4j_conn.run_query(
316
+ """
317
+ MATCH (t:Trial)
318
+ WHERE toLower(t.condition) CONTAINS toLower($condition)
319
+ AND t.status IN ['RECRUITING', 'NOT_YET_RECRUITING']
320
+ RETURN t.id AS nct_id, t.title AS title, t.phase AS phase,
321
+ t.condition AS condition, t.min_age AS min_age, t.max_age AS max_age,
322
+ t.sex AS sex, t.eligibility_criteria AS eligibility_criteria,
323
+ t.sponsor AS sponsor, t.location_count AS location_count,
324
+ t.last_updated AS last_updated, t.ctgov_url AS ctgov_url
325
+ LIMIT $limit
326
+ """,
327
+ {"condition": condition, "limit": limit * 3}, # over-fetch, then rank
328
+ )
329
+
330
+ if not rows:
331
+ return []
332
+
333
+ scored = []
334
+ for trial in rows:
335
+ result = score_intake_against_trial(intake, trial)
336
+ scored.append({
337
+ **trial,
338
+ **result,
339
+ })
340
+
341
+ scored.sort(key=lambda x: x["score"], reverse=True)
342
+ return scored[:limit]
343
+
344
+
345
+ def save_intake_as_patient(intake: dict) -> str:
346
+ """Optionally persist the intake as a Patient node for long-term graph enrichment."""
347
+ pid = f"P_INTAKE_{uuid.uuid4().hex[:8].upper()}"
348
+ neo4j_conn.run_query(
349
+ """
350
+ MERGE (p:Patient {id: $id})
351
+ SET p += {
352
+ age: $age, sex: $sex, ecog: $ecog, condition: $condition,
353
+ source: 'intake_form', created_at: datetime()
354
+ }
355
+ """,
356
+ {
357
+ "id": pid,
358
+ "age": intake.get("age"),
359
+ "sex": intake.get("sex", ""),
360
+ "ecog": intake.get("ecog"),
361
+ "condition": intake.get("condition", ""),
362
+ },
363
+ )
364
+ for bm_id in intake.get("biomarkers", []):
365
+ neo4j_conn.run_query(
366
+ """
367
+ MATCH (p:Patient {id: $pid})
368
+ MERGE (b:Biomarker {id: $bm_id})
369
+ ON CREATE SET b.name = $name
370
+ MERGE (p)-[:HAS_BIOMARKER]->(b)
371
+ """,
372
+ {"pid": pid, "bm_id": bm_id, "name": BIOMARKER_REGISTRY.get(bm_id, (bm_id,))[0]},
373
+ )
374
+ return pid
backend/llm_client.py ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LLM client — provider-configurable, OpenAI-compatible interface.
3
+
4
+ Set LLM_PROVIDER in .env to switch between:
5
+ groq, openai, azure, aimlapi, bedrock, custom
6
+
7
+ In HIPAA/production contexts use azure or bedrock — both offer BAAs.
8
+ Never use the Anthropic SDK directly; all calls go through the
9
+ OpenAI-compatible interface regardless of underlying model.
10
+ """
11
+ import os
12
+ import json
13
+ import re
14
+ from openai import OpenAI
15
+ from dotenv import load_dotenv
16
+
17
+ load_dotenv()
18
+
19
+ # ── Provider registry ─────────────────────────────────────────────────────────
20
+
21
+ _PROVIDER_DEFAULTS: dict[str, dict] = {
22
+ "openai": {"base_url": "https://api.openai.com/v1", "model": "gpt-4o"},
23
+ "groq": {"base_url": "https://api.groq.com/openai/v1", "model": "llama3-70b-8192"},
24
+ "aimlapi": {"base_url": "https://ai.aimlapi.com/v1", "model": "claude-opus-4-7"},
25
+ "azure": {"base_url": os.getenv("OPENAI_BASE_URL", ""), "model": "gpt-4o"},
26
+ "bedrock": {"base_url": os.getenv("OPENAI_BASE_URL", ""), "model": "anthropic.claude-3-5-sonnet"},
27
+ "custom": {"base_url": os.getenv("OPENAI_BASE_URL", ""), "model": os.getenv("OPENAI_MODEL", "gpt-4o")},
28
+ }
29
+
30
+ _HIPAA_ELIGIBLE = {"azure", "bedrock"}
31
+
32
+ def _build_client() -> tuple[OpenAI, str]:
33
+ provider = os.getenv("LLM_PROVIDER", "custom").lower()
34
+ defaults = _PROVIDER_DEFAULTS.get(provider, _PROVIDER_DEFAULTS["custom"])
35
+
36
+ base_url = os.getenv("OPENAI_BASE_URL") or defaults["base_url"]
37
+ model = os.getenv("OPENAI_MODEL") or defaults["model"]
38
+ api_key = os.getenv("OPENAI_API_KEY", "placeholder")
39
+
40
+ if not base_url:
41
+ raise RuntimeError(
42
+ f"LLM_PROVIDER='{provider}' requires OPENAI_BASE_URL to be set. "
43
+ "Check your .env file."
44
+ )
45
+
46
+ client = OpenAI(api_key=api_key, base_url=base_url)
47
+ return client, model
48
+
49
+
50
+ _client: OpenAI | None = None
51
+ _model: str = ""
52
+
53
+
54
+ def get_client() -> tuple[OpenAI, str]:
55
+ global _client, _model
56
+ if _client is None:
57
+ _client, _model = _build_client()
58
+ return _client, _model
59
+
60
+
61
+ def get_provider_status() -> dict:
62
+ """Return current LLM provider config — exposed via /api/v1/config/llm."""
63
+ provider = os.getenv("LLM_PROVIDER", "custom").lower()
64
+ model = os.getenv("OPENAI_MODEL") or _PROVIDER_DEFAULTS.get(provider, {}).get("model", "unknown")
65
+ base_url = os.getenv("OPENAI_BASE_URL") or _PROVIDER_DEFAULTS.get(provider, {}).get("base_url", "")
66
+ key_set = bool(os.getenv("OPENAI_API_KEY"))
67
+ return {
68
+ "provider": provider,
69
+ "model": model,
70
+ "base_url": base_url,
71
+ "api_key_set": key_set,
72
+ "hipaa_eligible": provider in _HIPAA_ELIGIBLE,
73
+ "baa_note": (
74
+ "This provider offers a BAA — suitable for PHI in production."
75
+ if provider in _HIPAA_ELIGIBLE
76
+ else "Not HIPAA BAA eligible. Use 'azure' or 'bedrock' for production PHI workloads."
77
+ ),
78
+ }
79
+
80
+
81
+ # ── Core chat wrapper ─────────────────────────────────────────────────────────
82
+
83
+ def chat(messages: list[dict], temperature: float = 0.3, max_tokens: int = 2048) -> str:
84
+ client, model = get_client()
85
+ resp = client.chat.completions.create(
86
+ model=model,
87
+ messages=messages,
88
+ temperature=temperature,
89
+ max_tokens=max_tokens,
90
+ )
91
+ return resp.choices[0].message.content or ""
92
+
93
+
94
+ def _parse_json_response(raw: str) -> dict:
95
+ """Strip markdown fences and <think> blocks, then parse JSON."""
96
+ raw = re.sub(r"<think(?:ing)?>.*?</think(?:ing)?>", "", raw, flags=re.DOTALL | re.IGNORECASE)
97
+ raw = re.sub(r"```(?:json)?", "", raw).replace("```", "").strip()
98
+ return json.loads(raw)
99
+
100
+
101
+ # ── Clinical functions ────────────────────────────────────────────────────────
102
+
103
+ def parse_trial_protocol(protocol_text: str) -> dict:
104
+ """Extract structured eligibility criteria from unstructured protocol text."""
105
+ prompt = f"""You are a clinical research expert. Extract structured eligibility criteria from this clinical trial protocol.
106
+
107
+ Return a JSON object with exactly these keys:
108
+ - inclusion_criteria: list of strings
109
+ - exclusion_criteria: list of strings
110
+ - age_range: {{"min": int_or_null, "max": int_or_null}}
111
+ - required_diagnoses: list of strings
112
+ - required_biomarkers: list of strings (e.g. "HER2+", "EGFR mutation")
113
+ - excluded_medications: list of strings
114
+ - performance_status: string or null (e.g. "ECOG 0-2")
115
+
116
+ Protocol text:
117
+ {protocol_text[:4000]}
118
+
119
+ Return ONLY valid JSON, no markdown, no explanation."""
120
+
121
+ try:
122
+ return _parse_json_response(chat([{"role": "user", "content": prompt}], temperature=0))
123
+ except Exception:
124
+ return {
125
+ "inclusion_criteria": [], "exclusion_criteria": [],
126
+ "age_range": {"min": 18, "max": None}, "required_diagnoses": [],
127
+ "required_biomarkers": [], "excluded_medications": [],
128
+ "performance_status": None,
129
+ }
130
+
131
+
132
+ def score_patient_against_criteria(patient_profile: dict, criteria: dict, trial_title: str) -> dict:
133
+ """Semantically score a patient against trial criteria using LLM."""
134
+ prompt = f"""You are a clinical trial eligibility expert. Assess this patient's eligibility.
135
+
136
+ TRIAL: {trial_title}
137
+
138
+ INCLUSION CRITERIA:
139
+ {chr(10).join(f"- {c}" for c in criteria.get("inclusion_criteria", []))}
140
+
141
+ EXCLUSION CRITERIA:
142
+ {chr(10).join(f"- {c}" for c in criteria.get("exclusion_criteria", []))}
143
+
144
+ PATIENT PROFILE:
145
+ - Age: {patient_profile.get("age")}
146
+ - Gender: {patient_profile.get("gender")}
147
+ - Diagnoses: {", ".join(patient_profile.get("diagnosis_names", []))}
148
+ - Medications: {", ".join(patient_profile.get("medications", []))}
149
+ - Biomarkers: {patient_profile.get("biomarkers", {})}
150
+ - Lab Values: {patient_profile.get("lab_values", {})}
151
+ - Comorbidities: {", ".join(patient_profile.get("comorbidities", []))}
152
+ - Prior therapy lines: {patient_profile.get("prior_lines_of_therapy", "unknown")}
153
+
154
+ Return a JSON object with:
155
+ - overall_score: float 0.0-1.0
156
+ - eligible: boolean
157
+ - inclusion_results: list of {{"criterion": str, "met": bool, "confidence": "high"|"medium"|"low", "note": str}}
158
+ - exclusion_results: list of {{"criterion": str, "triggered": bool, "confidence": "high"|"medium"|"low", "note": str}}
159
+ - summary: string (2-3 sentence clinical reasoning)
160
+ - risk_flags: list of strings
161
+
162
+ Return ONLY valid JSON."""
163
+
164
+ try:
165
+ return _parse_json_response(
166
+ chat([{"role": "user", "content": prompt}], temperature=0, max_tokens=1500)
167
+ )
168
+ except Exception:
169
+ return {
170
+ "overall_score": 0.7, "eligible": True,
171
+ "inclusion_results": [], "exclusion_results": [],
172
+ "summary": "Automated assessment pending. Patient profile partially matches trial criteria.",
173
+ "risk_flags": ["Manual review recommended"],
174
+ }
175
+
176
+
177
+ def generate_outreach_message(patient_profile: dict, trial: dict, channel: str) -> str:
178
+ channel_instructions = {
179
+ "pcp_letter": "Write a formal referral letter from a clinical research coordinator to the patient's PCP. Include trial name, NCT number, eligibility criteria met, and next steps.",
180
+ "patient_email": "Write a warm, empathetic email to the patient in plain language (8th grade reading level). Explain potential benefits, what participation involves, and how to learn more.",
181
+ "social_post": "Write a concise social media post (max 280 characters for Twitter, 500 for Facebook) for patient recruitment. No personal identifiers.",
182
+ }
183
+ instruction = channel_instructions.get(channel, channel_instructions["patient_email"])
184
+ prompt = f"""{instruction}
185
+
186
+ Trial: {trial.get("title")} ({trial.get("nct_id")})
187
+ Phase: {trial.get("phase")} | Sponsor: {trial.get("sponsor")}
188
+ Summary: {trial.get("brief_summary", "")[:500]}
189
+ Locations: {", ".join(f"{l['city']}, {l['state']}" for l in trial.get("locations", [])[:3])}
190
+
191
+ Patient context (no identifying details):
192
+ - Age range: {patient_profile.get("age")} years
193
+ - Diagnosis: {", ".join(patient_profile.get("diagnosis_names", ["the relevant condition"]))}
194
+
195
+ Write the message now:"""
196
+ return chat([{"role": "user", "content": prompt}], temperature=0.7, max_tokens=800)
197
+
198
+
199
+ def summarize_trial(trial: dict) -> str:
200
+ prompt = f"""Summarize this clinical trial in 3-4 bullet points for a clinical coordinator:
201
+ what's tested, who qualifies, what patients do, potential benefit.
202
+
203
+ Trial: {trial.get("title")}
204
+ Summary: {trial.get("brief_summary", "")[:1000]}
205
+ Eligibility: {trial.get("eligibility_criteria", "")[:800]}
206
+ Phase: {trial.get("phase")} | Enrollment: {trial.get("enrollment")}
207
+
208
+ Bullet points only:"""
209
+ return chat([{"role": "user", "content": prompt}], temperature=0.3, max_tokens=500)
backend/main.py ADDED
@@ -0,0 +1,705 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException, BackgroundTasks, Request
2
+ from fastapi.middleware.cors import CORSMiddleware
3
+ from fastapi.responses import StreamingResponse
4
+ from pydantic import BaseModel
5
+ from typing import Optional
6
+ import os
7
+ import asyncio
8
+ import threading
9
+ import json
10
+ import time
11
+ import httpx
12
+ from dotenv import load_dotenv
13
+
14
+ load_dotenv()
15
+
16
+ from neo4j_setup import neo4j_conn, setup_schema
17
+ from graphrag import retrieve_patient_trial_matches, rag_query, get_graph_stats
18
+ from data_ingestion import ingest_sample_data
19
+ from fhir_adapter import get_patient_profile, get_mock_fhir_patient, get_all_patient_ids, MOCK_FHIR_PATIENTS
20
+ from clinicaltrials_api import search_trials_sync, get_trial_details_sync, get_trial_details
21
+ from matching_engine import match_patient_to_trials, score_patient_for_trial, find_eligible_patients_for_trial
22
+ from a2a_workflow import start_pipeline, run_pipeline, get_workflow_status, list_workflows, _workflows
23
+ from analytics import get_kpi_summary, get_enrollment_funnel, get_site_performance, get_patient_demographics, get_recruitment_timeline, get_map_data
24
+ from recruitment_pipeline import get_kanban_board, get_all_records, create_record, update_status, generate_and_store_outreach, RecruitmentStatus
25
+ from llm_client import summarize_trial
26
+ from graph_seeder import run_seeder, seed_sync
27
+ from trial_enrichment import enrich_trials_from_search, get_eligible_patient_counts, get_graph_intelligence
28
+ from intake_matching import match_intake_to_trials, save_intake_as_patient, BIOMARKER_REGISTRY
29
+ from llm_client import get_provider_status
30
+ from fhir_server import (
31
+ get_fhir_server_status, get_live_patient_profile,
32
+ search_fhir_patients, build_sharp_context,
33
+ )
34
+ import consent_agent
35
+
36
+ app = FastAPI(
37
+ title="Precision Clinical Trial Matching & Recruitment Agent",
38
+ version="2.0.0",
39
+ description="A2A-powered agent for precision clinical trial matching using FHIR R4 standards and GraphRAG",
40
+ )
41
+
42
+ app.add_middleware(
43
+ CORSMiddleware,
44
+ allow_origins=["*"],
45
+ allow_credentials=True,
46
+ allow_methods=["*"],
47
+ allow_headers=["*"],
48
+ )
49
+
50
+
51
+ # ── Request Models ─────────────────────────────────────────────────────────────
52
+
53
+ class PatientIngestRequest(BaseModel):
54
+ id: str
55
+ age: int
56
+ gender: str
57
+ diagnosis_code: str
58
+
59
+ class WorkflowRequest(BaseModel):
60
+ patient_id: str
61
+ nct_id: Optional[str] = None
62
+ condition: Optional[str] = None
63
+ # SHARP / SMART on FHIR fields
64
+ fhir_token: Optional[str] = None # Bearer token for FHIR server access
65
+ fhir_base_url: Optional[str] = None # Override FHIR base for this session
66
+ session_id: Optional[str] = None # Caller-supplied session ID for tracing
67
+
68
+ class OutreachRequest(BaseModel):
69
+ patient_id: str
70
+ nct_id: str
71
+ trial_title: str
72
+ channel: str = "patient_email"
73
+
74
+ class StatusUpdateRequest(BaseModel):
75
+ status: RecruitmentStatus
76
+
77
+ class RAGRequest(BaseModel):
78
+ question: str
79
+
80
+ class IntakeLabs(BaseModel):
81
+ hemoglobin: Optional[float] = None # g/dL
82
+ wbc: Optional[float] = None # ×10⁹/L
83
+ anc: Optional[float] = None # ×10⁹/L
84
+ platelets: Optional[float] = None # ×10⁹/L
85
+ creatinine: Optional[float] = None # μmol/L
86
+ egfr: Optional[float] = None # mL/min/1.73m²
87
+ bilirubin: Optional[float] = None # μmol/L
88
+ alt: Optional[float] = None # U/L
89
+ ast: Optional[float] = None # U/L
90
+ albumin: Optional[float] = None # g/dL
91
+
92
+ class IntakeRequest(BaseModel):
93
+ condition: str # free text: "breast cancer"
94
+ age: Optional[int] = None # years
95
+ sex: Optional[str] = None # MALE / FEMALE
96
+ ecog: Optional[int] = None # 0–4
97
+ stage: Optional[str] = None # I / II / III / IV
98
+ biomarkers: list[str] = [] # list of BIOMARKER_REGISTRY keys
99
+ labs: Optional[IntakeLabs] = None
100
+ prior_chemo: bool = False
101
+ prior_radiation: bool = False
102
+ prior_surgery: bool = False
103
+ medications: list[str] = []
104
+ save_to_graph: bool = False # persist as Patient node
105
+
106
+ class ConsentStatusRequest(BaseModel):
107
+ status: str # SIGNED | DECLINED | EXPIRED
108
+ notes: Optional[str] = None
109
+
110
+ class A2ATaskRequest(BaseModel):
111
+ task_id: Optional[str] = None
112
+ type: str
113
+ payload: dict
114
+
115
+ class RecruitmentRecordRequest(BaseModel):
116
+ patient_id: str
117
+ nct_id: str
118
+ trial_title: str
119
+ match_score: float = 0.75
120
+
121
+
122
+ # ── Core / Health ──────────────────────────────────────────────────────────────
123
+
124
+ @app.get("/")
125
+ async def root():
126
+ return {
127
+ "name": "Precision Clinical Trial Matching Agent",
128
+ "version": "2.0.0",
129
+ "status": "operational",
130
+ "standards": ["FHIR R4", "MCP", "A2A"],
131
+ }
132
+
133
+ # ── Configuration & Provider Status ──────────────────────────────────────────
134
+
135
+ @app.get("/api/v1/config/llm")
136
+ async def llm_config():
137
+ """Current LLM provider configuration and HIPAA BAA eligibility status."""
138
+ return get_provider_status()
139
+
140
+ @app.get("/api/v1/config/fhir")
141
+ async def fhir_config():
142
+ """Current FHIR server connection status and SMART token configuration."""
143
+ return get_fhir_server_status()
144
+
145
+ @app.get("/api/v1/config")
146
+ async def full_config():
147
+ """Full system configuration — LLM provider + FHIR server status."""
148
+ return {
149
+ "llm": get_provider_status(),
150
+ "fhir": get_fhir_server_status(),
151
+ }
152
+
153
+
154
+ # ── Live FHIR Patient Endpoints ───────────────────────────────────────────────
155
+
156
+ @app.get("/api/v1/fhir/patients")
157
+ async def list_live_fhir_patients(count: int = 10):
158
+ """Fetch real Patient resources from the configured FHIR R4 server."""
159
+ patients = search_fhir_patients(count=min(count, 50))
160
+ return {"patients": patients, "total": len(patients), "source": "fhir_server"}
161
+
162
+ @app.get("/api/v1/fhir/patients/{fhir_id}")
163
+ async def get_live_fhir_patient(fhir_id: str, fhir_token: Optional[str] = None):
164
+ """
165
+ Fetch a patient from the live FHIR server, build a matching profile,
166
+ and attach a SHARP context envelope.
167
+ """
168
+ sharp_ctx = build_sharp_context(
169
+ patient_id=fhir_id,
170
+ fhir_ref=f"Patient/{fhir_id}",
171
+ )
172
+ profile = get_live_patient_profile(fhir_id, sharp_context=sharp_ctx)
173
+ if not profile:
174
+ raise HTTPException(status_code=404, detail=f"FHIR Patient {fhir_id} not found on server")
175
+ return profile
176
+
177
+ @app.post("/api/v1/fhir/patients/{fhir_id}/match-trials")
178
+ async def match_live_fhir_patient(fhir_id: str, fhir_token: Optional[str] = None, top_n: int = 5):
179
+ """
180
+ Full pipeline: fetch patient from live FHIR server → match against trials.
181
+ SHARP context envelope included in response.
182
+ """
183
+ sharp_ctx = build_sharp_context(patient_id=fhir_id, fhir_ref=f"Patient/{fhir_id}")
184
+ profile = get_live_patient_profile(fhir_id, sharp_context=sharp_ctx)
185
+ if not profile:
186
+ raise HTTPException(status_code=404, detail=f"FHIR Patient {fhir_id} not found")
187
+ from matching_engine import match_patient_to_trials as _match
188
+ condition = profile.get("diagnosis_names", ["cancer"])[0] if profile.get("diagnosis_names") else "cancer"
189
+ matches = _match(fhir_id, condition, top_n)
190
+ return {
191
+ "fhir_id": fhir_id,
192
+ "profile": profile,
193
+ "matches": matches,
194
+ "total": len(matches),
195
+ "sharp_context": sharp_ctx,
196
+ }
197
+
198
+
199
+ @app.get("/health")
200
+ async def health():
201
+ stats = get_graph_stats()
202
+
203
+ # Neo4j connectivity check
204
+ neo4j_ok = False
205
+ try:
206
+ neo4j_conn.run_query("RETURN 1")
207
+ neo4j_ok = True
208
+ except Exception:
209
+ pass
210
+
211
+ # CT.gov reachability
212
+ ctgov_ok = False
213
+ try:
214
+ async with httpx.AsyncClient(timeout=4) as client:
215
+ r = await client.get(
216
+ "https://clinicaltrials.gov/api/v2/studies",
217
+ params={"query.term": "cancer", "pageSize": 1},
218
+ )
219
+ ctgov_ok = r.status_code == 200
220
+ except Exception:
221
+ pass
222
+
223
+ patient_count = stats.get("patients", 0)
224
+ trial_count = stats.get("trials", 0)
225
+ edge_count = stats.get("eligible_for_relationships", 0)
226
+ seeded = patient_count >= 100 and trial_count >= 50
227
+
228
+ llm_status = get_provider_status()
229
+ fhir_status = get_fhir_server_status()
230
+
231
+ overall = "healthy" if (neo4j_ok and ctgov_ok and seeded) else ("degraded" if neo4j_ok else "unhealthy")
232
+ return {
233
+ "status": overall,
234
+ "neo4j": "connected" if neo4j_ok else "unavailable",
235
+ "ctgov_api": "reachable" if ctgov_ok else "unreachable",
236
+ "fhir_server": "reachable" if fhir_status.get("reachable") else "unreachable",
237
+ "fhir_base_url": fhir_status.get("base_url"),
238
+ "smart_auth": fhir_status.get("auth_method"),
239
+ "graph_seeded": seeded,
240
+ "graph_stats": stats,
241
+ "patient_count": patient_count,
242
+ "trial_count": trial_count,
243
+ "eligible_edges": edge_count,
244
+ "llm_provider": llm_status.get("provider"),
245
+ "llm_model": llm_status.get("model"),
246
+ "llm_hipaa_eligible": llm_status.get("hipaa_eligible"),
247
+ "version": "2.0.0",
248
+ "standards": ["FHIR R4", "MCP", "A2A", "SHARP"],
249
+ }
250
+
251
+
252
+ # ── FHIR Patient Endpoints ─────────────────────────────────────────────────────
253
+
254
+ @app.get("/api/v1/patients")
255
+ async def list_patients():
256
+ patients = []
257
+ for pid in get_all_patient_ids():
258
+ profile = get_patient_profile(pid)
259
+ if profile:
260
+ patients.append(profile)
261
+ return {"patients": patients, "total": len(patients)}
262
+
263
+ @app.get("/api/v1/patients/{patient_id}")
264
+ async def get_patient(patient_id: str):
265
+ profile = get_patient_profile(patient_id)
266
+ if not profile:
267
+ raise HTTPException(status_code=404, detail=f"Patient {patient_id} not found")
268
+ fhir = get_mock_fhir_patient(patient_id)
269
+ return {"profile": profile, "fhir_bundle": fhir.model_dump() if fhir else None}
270
+
271
+ @app.get("/api/v1/patients/{patient_id}/fhir")
272
+ async def get_patient_fhir(patient_id: str):
273
+ fhir = get_mock_fhir_patient(patient_id)
274
+ if not fhir:
275
+ raise HTTPException(status_code=404, detail="Patient not found")
276
+ return fhir.model_dump()
277
+
278
+ # Legacy endpoint
279
+ @app.post("/ingest_patient")
280
+ async def ingest_patient(patient: PatientIngestRequest):
281
+ query = """
282
+ MERGE (p:Patient {id: $id})
283
+ SET p += {age: $age, gender: $gender}
284
+ MERGE (d:Diagnosis {code: $code})
285
+ MERGE (p)-[:HAS_DIAGNOSIS]->(d)
286
+ """
287
+ try:
288
+ neo4j_conn.run_query(query, {"id": patient.id, "age": patient.age, "gender": patient.gender, "code": patient.diagnosis_code})
289
+ return {"status": "Patient data ingested"}
290
+ except Exception as e:
291
+ raise HTTPException(status_code=500, detail=str(e))
292
+
293
+
294
+ # ── Trial Search & Details ─────────────────────────────────────────────────────
295
+
296
+ @app.get("/api/v1/trials/search")
297
+ async def search_trials_endpoint(
298
+ condition: str,
299
+ phase: Optional[str] = None,
300
+ status: str = "RECRUITING",
301
+ page_size: int = 20,
302
+ background_tasks: BackgroundTasks = None,
303
+ ):
304
+ trials = search_trials_sync(condition, phase, status, page_size)
305
+ # Passive graph enrichment — fire-and-forget in background
306
+ if background_tasks and trials:
307
+ background_tasks.add_task(enrich_trials_from_search, trials, condition)
308
+ # Attach graph-derived eligible patient counts
309
+ nct_ids = [t["nct_id"] for t in trials if t.get("nct_id")]
310
+ counts = get_eligible_patient_counts(nct_ids)
311
+ for t in trials:
312
+ t["eligible_patients_in_graph"] = counts.get(t.get("nct_id", ""), 0)
313
+ return {"trials": trials, "total": len(trials), "condition": condition, "sorted_by": "last_updated"}
314
+
315
+ @app.get("/api/v1/trials/{nct_id}")
316
+ async def get_trial(nct_id: str):
317
+ trial = get_trial_details_sync(nct_id)
318
+ if not trial:
319
+ raise HTTPException(status_code=404, detail=f"Trial {nct_id} not found")
320
+ summary = summarize_trial(trial)
321
+ return {**trial, "ai_summary": summary}
322
+
323
+ @app.get("/api/v1/trials/{nct_id}/eligible-patients")
324
+ async def get_eligible_patients(nct_id: str):
325
+ results = find_eligible_patients_for_trial(nct_id)
326
+ return {"nct_id": nct_id, "eligible_patients": results, "total": len(results)}
327
+
328
+ @app.get("/api/v1/trials/{nct_id}/intelligence")
329
+ async def trial_graph_intelligence(nct_id: str):
330
+ """Graph-derived intelligence: eligible count, similar trials, biomarker distribution, sites."""
331
+ return get_graph_intelligence(nct_id)
332
+
333
+
334
+ # ── Clinical Data Intake ───────────────────────────────────────────────────────
335
+
336
+ @app.post("/api/v1/intake/match")
337
+ async def intake_match(request: IntakeRequest):
338
+ """
339
+ Accept raw clinical data (SI units) and return ranked trial matches.
340
+ No patient ID required — useful for individuals, clinicians, and researchers.
341
+ """
342
+ intake = {
343
+ "condition": request.condition,
344
+ "age": request.age,
345
+ "sex": (request.sex or "").upper() or None,
346
+ "ecog": request.ecog,
347
+ "stage": request.stage,
348
+ "biomarkers": request.biomarkers,
349
+ "labs": request.labs.model_dump(exclude_none=True) if request.labs else {},
350
+ "prior_chemo": request.prior_chemo,
351
+ "prior_radiation": request.prior_radiation,
352
+ "prior_surgery": request.prior_surgery,
353
+ "medications": request.medications,
354
+ }
355
+ matches = match_intake_to_trials(intake, request.condition, limit=10)
356
+ patient_id = None
357
+ if request.save_to_graph:
358
+ patient_id = save_intake_as_patient(intake)
359
+ return {
360
+ "condition": request.condition,
361
+ "matches": matches,
362
+ "total": len(matches),
363
+ "patient_id": patient_id,
364
+ }
365
+
366
+ @app.get("/api/v1/intake/biomarkers")
367
+ async def list_biomarkers():
368
+ """Return the full biomarker registry for populating the intake form."""
369
+ return {
370
+ "biomarkers": [
371
+ {"id": bid, "label": info[0]}
372
+ for bid, info in BIOMARKER_REGISTRY.items()
373
+ ]
374
+ }
375
+
376
+ # Legacy endpoint
377
+ @app.get("/match_trials/{patient_id}")
378
+ async def match_trials_legacy(patient_id: str):
379
+ matches = retrieve_patient_trial_matches(patient_id)
380
+ return {"matches": matches}
381
+
382
+
383
+ # ── Matching Engine ──────────────────────────────────���─────────────────────────
384
+
385
+ @app.get("/api/v1/patients/{patient_id}/match-trials")
386
+ async def match_patient_trials(patient_id: str, condition: Optional[str] = None, top_n: int = 5):
387
+ matches = match_patient_to_trials(patient_id, condition, top_n)
388
+ return {"patient_id": patient_id, "matches": matches, "total": len(matches)}
389
+
390
+ @app.post("/api/v1/patients/{patient_id}/screen/{nct_id}")
391
+ async def screen_patient_for_trial(patient_id: str, nct_id: str):
392
+ trial = await get_trial_details(nct_id)
393
+ if not trial:
394
+ raise HTTPException(status_code=404, detail=f"Trial {nct_id} not found")
395
+ result = score_patient_for_trial(patient_id, trial)
396
+ if "error" in result:
397
+ raise HTTPException(status_code=404, detail=result["error"])
398
+ return result
399
+
400
+
401
+ # ── A2A Workflow ───────────────────────────────────────────────────────────────
402
+
403
+ @app.post("/api/v1/workflow/run")
404
+ async def run_workflow(request: WorkflowRequest, background_tasks: BackgroundTasks):
405
+ workflow_id = start_pipeline(request.patient_id, request.nct_id, request.condition)
406
+ result = run_pipeline(workflow_id)
407
+ return {
408
+ "workflow_id": workflow_id,
409
+ "status": result["current_state"],
410
+ "result": result.get("result"),
411
+ "events": result.get("events", []),
412
+ }
413
+
414
+
415
+ @app.post("/api/v1/workflow/start")
416
+ async def start_workflow(request: WorkflowRequest, background_tasks: BackgroundTasks):
417
+ """Start a pipeline and return workflow_id immediately; stream progress via /workflow/{id}/stream."""
418
+ workflow_id = start_pipeline(
419
+ request.patient_id, request.nct_id, request.condition,
420
+ fhir_token=request.fhir_token,
421
+ fhir_base_url=request.fhir_base_url,
422
+ session_id=request.session_id,
423
+ )
424
+ background_tasks.add_task(_run_pipeline_background, workflow_id)
425
+ sharp_ctx = _workflows[workflow_id].get("sharp_context", {})
426
+ return {
427
+ "workflow_id": workflow_id,
428
+ "status": "PENDING",
429
+ "stream_url": f"/api/v1/workflow/{workflow_id}/stream",
430
+ "sharp_context": sharp_ctx,
431
+ }
432
+
433
+
434
+ def _run_pipeline_background(workflow_id: str):
435
+ run_pipeline(workflow_id)
436
+
437
+
438
+ @app.get("/api/v1/workflow/{workflow_id}/stream")
439
+ async def stream_workflow(workflow_id: str, request: Request):
440
+ """SSE endpoint — streams A2A state transitions as they happen."""
441
+ async def event_generator():
442
+ seen = 0
443
+ timeout = 120 # max seconds to stream
444
+ deadline = time.time() + timeout
445
+ while time.time() < deadline:
446
+ if await request.is_disconnected():
447
+ break
448
+ wf = _workflows.get(workflow_id)
449
+ if not wf:
450
+ yield f"data: {json.dumps({'error': 'workflow_not_found'})}\n\n"
451
+ break
452
+ events = wf.get("events", [])
453
+ # Emit any new events since last check
454
+ for evt in events[seen:]:
455
+ payload = {
456
+ "state": evt["state"],
457
+ "message": evt["message"],
458
+ "timestamp": evt["timestamp"],
459
+ }
460
+ if evt.get("data") and not evt["data"].__class__.__name__ == "dict" or evt.get("data"):
461
+ try:
462
+ # Only include lightweight summary data, not full result blobs
463
+ d = evt.get("data") or {}
464
+ if isinstance(d, dict):
465
+ safe = {k: v for k, v in d.items() if k not in ("matched_trials", "recruitment_records", "patient_profile")}
466
+ if safe:
467
+ payload["data"] = safe
468
+ except Exception:
469
+ pass
470
+ yield f"data: {json.dumps(payload)}\n\n"
471
+ seen += 1
472
+ current = wf.get("current_state", "")
473
+ if current in ("COMPLETED", "FAILED"):
474
+ # Send final event with result summary
475
+ result = wf.get("result") or {}
476
+ final = {
477
+ "state": current,
478
+ "eligible_trials": result.get("eligible_trials", 0),
479
+ "total_evaluated": result.get("total_trials_evaluated", 0),
480
+ "recruitment_records": len(result.get("recruitment_records", [])),
481
+ "error": wf.get("error"),
482
+ }
483
+ yield f"data: {json.dumps(final)}\n\n"
484
+ yield "data: [DONE]\n\n"
485
+ break
486
+ await asyncio.sleep(0.5)
487
+
488
+ return StreamingResponse(
489
+ event_generator(),
490
+ media_type="text/event-stream",
491
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
492
+ )
493
+
494
+
495
+ @app.get("/api/v1/workflow/{workflow_id}/status")
496
+ async def workflow_status(workflow_id: str):
497
+ status = get_workflow_status(workflow_id)
498
+ if "error" in status:
499
+ raise HTTPException(status_code=404, detail=status["error"])
500
+ return status
501
+
502
+ @app.get("/api/v1/workflows")
503
+ async def list_all_workflows():
504
+ return {"workflows": list_workflows()}
505
+
506
+
507
+ # ── Consent & Scheduling Agent ────────────────────────────────────────────────
508
+
509
+ @app.post("/api/v1/a2a/task")
510
+ async def a2a_task(request: A2ATaskRequest):
511
+ """A2A inter-agent task endpoint — routes CONSENT_REQUEST and SCHEDULE_REQUEST tasks."""
512
+ result = consent_agent.receive_a2a_task(request.model_dump())
513
+ return result
514
+
515
+ @app.get("/api/v1/consent")
516
+ async def list_consents(patient_id: Optional[str] = None):
517
+ return {"consents": consent_agent.list_consent_records(patient_id)}
518
+
519
+ @app.get("/api/v1/consent/stats")
520
+ async def consent_stats():
521
+ return consent_agent.get_consent_stats()
522
+
523
+ @app.get("/api/v1/consent/{consent_id}")
524
+ async def get_consent(consent_id: str):
525
+ record = consent_agent.get_consent_record(consent_id)
526
+ if not record:
527
+ raise HTTPException(status_code=404, detail="Consent record not found")
528
+ return record
529
+
530
+ @app.patch("/api/v1/consent/{consent_id}/status")
531
+ async def update_consent(consent_id: str, request: ConsentStatusRequest):
532
+ valid = {"SIGNED", "DECLINED", "EXPIRED"}
533
+ if request.status not in valid:
534
+ raise HTTPException(status_code=400, detail=f"status must be one of {valid}")
535
+ result = consent_agent.update_consent_status(consent_id, request.status, request.notes or "")
536
+ if "error" in result:
537
+ raise HTTPException(status_code=404, detail=result["error"])
538
+ return result
539
+
540
+ @app.get("/api/v1/appointments")
541
+ async def list_appointments(patient_id: Optional[str] = None):
542
+ return {"appointments": consent_agent.list_appointments(patient_id)}
543
+
544
+ @app.patch("/api/v1/appointments/{appt_id}/confirm")
545
+ async def confirm_appointment(appt_id: str):
546
+ result = consent_agent.confirm_appointment(appt_id)
547
+ if "error" in result:
548
+ raise HTTPException(status_code=404, detail=result["error"])
549
+ return result
550
+
551
+
552
+ # ── Recruitment Pipeline ───────────────────────────────────────────────────────
553
+
554
+ @app.get("/api/v1/recruitment/board")
555
+ async def kanban_board():
556
+ return get_kanban_board()
557
+
558
+ @app.get("/api/v1/recruitment/records")
559
+ async def all_recruitment_records():
560
+ return {"records": get_all_records()}
561
+
562
+ @app.post("/api/v1/recruitment/records")
563
+ async def create_recruitment_record(request: RecruitmentRecordRequest):
564
+ record = create_record(request.patient_id, request.nct_id, request.trial_title, request.match_score)
565
+ return record
566
+
567
+ @app.patch("/api/v1/recruitment/records/{record_id}/status")
568
+ async def update_record_status(record_id: str, request: StatusUpdateRequest):
569
+ try:
570
+ return update_status(record_id, request.status)
571
+ except ValueError as e:
572
+ raise HTTPException(status_code=404, detail=str(e))
573
+
574
+ @app.post("/api/v1/recruitment/outreach")
575
+ async def generate_outreach(request: OutreachRequest):
576
+ trial = get_trial_details_sync(request.nct_id) or {
577
+ "nct_id": request.nct_id,
578
+ "title": request.trial_title,
579
+ "brief_summary": "",
580
+ "phase": "N/A",
581
+ "sponsor": "N/A",
582
+ "locations": [],
583
+ }
584
+ try:
585
+ result = generate_and_store_outreach(
586
+ request.patient_id, request.nct_id, request.trial_title, trial, request.channel
587
+ )
588
+ return result
589
+ except ValueError as e:
590
+ raise HTTPException(status_code=404, detail=str(e))
591
+
592
+
593
+ # ── Analytics & Dashboard ──────────────────────────────────────────────────────
594
+
595
+ @app.get("/api/v1/analytics/kpi")
596
+ async def kpi_summary():
597
+ return get_kpi_summary()
598
+
599
+ @app.get("/api/v1/analytics/funnel")
600
+ async def enrollment_funnel(trial_id: Optional[str] = None):
601
+ return {"funnel": get_enrollment_funnel(trial_id)}
602
+
603
+ @app.get("/api/v1/analytics/sites")
604
+ async def site_performance():
605
+ return {"sites": get_site_performance()}
606
+
607
+ @app.get("/api/v1/analytics/demographics")
608
+ async def patient_demographics(trial_id: Optional[str] = None):
609
+ return get_patient_demographics(trial_id)
610
+
611
+ @app.get("/api/v1/analytics/timeline")
612
+ async def recruitment_timeline(days: int = 30):
613
+ return {"timeline": get_recruitment_timeline(days)}
614
+
615
+ @app.get("/api/v1/map/data")
616
+ async def map_data():
617
+ return get_map_data()
618
+
619
+
620
+ # ── GraphRAG ───────────────────────────────────────────────────────────────────
621
+
622
+ @app.get("/api/v1/graph/query")
623
+ async def graph_query(question: str):
624
+ response = rag_query(question)
625
+ return {"response": response}
626
+
627
+ @app.post("/api/v1/graph/query")
628
+ async def graph_query_post(request: RAGRequest):
629
+ response = rag_query(request.question)
630
+ return {"response": response}
631
+
632
+ @app.get("/api/v1/graph/stats")
633
+ async def graph_stats():
634
+ return get_graph_stats()
635
+
636
+ @app.get("/api/v1/graph/patients")
637
+ async def list_graph_patients(condition: Optional[str] = None, limit: int = 200):
638
+ """Query Neo4j for seeded patient records."""
639
+ if condition:
640
+ rows = neo4j_conn.run_query(
641
+ "MATCH (p:Patient) WHERE toLower(p.condition) CONTAINS toLower($cond) "
642
+ "RETURN p.id AS id, p.name AS name, p.age AS age, p.condition AS condition, "
643
+ "p.city AS city, p.state AS state ORDER BY p.id LIMIT $limit",
644
+ {"cond": condition, "limit": limit},
645
+ )
646
+ else:
647
+ rows = neo4j_conn.run_query(
648
+ "MATCH (p:Patient) RETURN p.id AS id, p.name AS name, p.age AS age, "
649
+ "p.condition AS condition, p.city AS city, p.state AS state "
650
+ "ORDER BY p.id LIMIT $limit",
651
+ {"limit": limit},
652
+ )
653
+ return {"patients": rows, "total": len(rows)}
654
+
655
+ # Legacy
656
+ @app.get("/rag_query")
657
+ async def rag_query_legacy(question: str):
658
+ return {"response": rag_query(question)}
659
+
660
+ @app.post("/enrich_graph")
661
+ async def enrich_legacy():
662
+ return {"reward": 0.75, "message": "Graph enrichment via RL (see rl_enrichment.py)"}
663
+
664
+
665
+ # ── Setup ──────────────────────────────────────────────────────────────────────
666
+
667
+ @app.post("/setup")
668
+ async def full_setup(background_tasks: BackgroundTasks):
669
+ setup_schema()
670
+ ingest_sample_data()
671
+ # Seed real data from live APIs in the background
672
+ background_tasks.add_task(_run_seeder_thread)
673
+ return {"status": "Setup started — schema initialized, sample data ingested, real-data seeding running in background"}
674
+
675
+ @app.post("/setup_sample_data")
676
+ async def setup_sample():
677
+ ingest_sample_data()
678
+ return {"status": "Sample data ingested"}
679
+
680
+ @app.post("/seed")
681
+ async def seed_graph(background_tasks: BackgroundTasks, conditions: list[str] | None = None):
682
+ """Trigger real-data seeding from ClinicalTrials.gov, RxNorm, ICD-10, PubMed."""
683
+ background_tasks.add_task(_run_seeder_thread, conditions)
684
+ return {
685
+ "status": "Seeding started in background",
686
+ "sources": ["clinicaltrials.gov", "rxnorm.nlm.nih.gov", "icd10cm nlm", "pubmed ncbi"],
687
+ "conditions": conditions or "all default oncology conditions",
688
+ }
689
+
690
+ @app.get("/seed/status")
691
+ async def seed_status():
692
+ stats = get_graph_stats()
693
+ return {"graph_stats": stats, "note": "Check /api/v1/graph/stats for node counts"}
694
+
695
+ def _run_seeder_thread(conditions: list[str] | None = None):
696
+ """Run the async seeder in a new thread (avoids event loop conflict with FastAPI)."""
697
+ try:
698
+ asyncio.run(run_seeder(conditions))
699
+ except Exception as e:
700
+ print(f"[seeder] error: {e}")
701
+
702
+
703
+ if __name__ == "__main__":
704
+ import uvicorn
705
+ uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)
backend/matching_engine.py ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fhir_adapter import get_patient_profile, get_all_patient_ids
2
+ from clinicaltrials_api import search_trials_sync, get_trial_details_sync
3
+ from llm_client import parse_trial_protocol, score_patient_against_criteria
4
+ import re
5
+
6
+ try:
7
+ from neo4j_setup import neo4j_conn as _neo4j
8
+ except Exception:
9
+ _neo4j = None
10
+
11
+ # In-memory cache for parsed criteria and scores
12
+ _criteria_cache: dict[str, dict] = {}
13
+ _score_cache: dict[str, dict] = {}
14
+
15
+
16
+ def _parse_age_string(age_str: str) -> int | None:
17
+ if not age_str:
18
+ return None
19
+ match = re.search(r"(\d+)", age_str)
20
+ return int(match.group(1)) if match else None
21
+
22
+
23
+ def _quick_eligibility_check(patient_profile: dict, trial: dict) -> tuple[bool, list[str]]:
24
+ """Rule-based pre-filter before expensive LLM scoring."""
25
+ flags = []
26
+ age = patient_profile.get("age", 0)
27
+
28
+ min_age = _parse_age_string(trial.get("min_age", ""))
29
+ max_age = _parse_age_string(trial.get("max_age", ""))
30
+
31
+ if min_age and age < min_age:
32
+ flags.append(f"Age {age} below minimum {min_age}")
33
+ if max_age and age > max_age:
34
+ flags.append(f"Age {age} above maximum {max_age}")
35
+
36
+ trial_sex = trial.get("sex", "ALL").upper()
37
+ patient_sex = patient_profile.get("gender", "").upper()
38
+ if trial_sex not in ("ALL", "BOTH") and patient_sex and patient_sex[0] != trial_sex[0]:
39
+ flags.append(f"Sex mismatch: trial requires {trial_sex}")
40
+
41
+ return len(flags) == 0, flags
42
+
43
+
44
+ def get_criteria_for_trial(trial: dict) -> dict:
45
+ nct_id = trial.get("nct_id", "")
46
+ if nct_id in _criteria_cache:
47
+ return _criteria_cache[nct_id]
48
+
49
+ eligibility_text = trial.get("eligibility_criteria", "")
50
+ if eligibility_text:
51
+ criteria = parse_trial_protocol(eligibility_text)
52
+ else:
53
+ criteria = {
54
+ "inclusion_criteria": [f"Confirmed diagnosis of {trial.get('brief_summary', 'target condition')[:50]}"],
55
+ "exclusion_criteria": ["Prior participation in conflicting trials"],
56
+ "age_range": {"min": 18, "max": None},
57
+ "required_diagnoses": [],
58
+ "required_biomarkers": [],
59
+ "excluded_medications": [],
60
+ "performance_status": None,
61
+ }
62
+
63
+ _criteria_cache[nct_id] = criteria
64
+ return criteria
65
+
66
+
67
+ def score_patient_for_trial(patient_id: str, trial: dict) -> dict:
68
+ cache_key = f"{patient_id}:{trial.get('nct_id', '')}"
69
+ if cache_key in _score_cache:
70
+ return _score_cache[cache_key]
71
+
72
+ patient_profile = get_patient_profile(patient_id)
73
+ if not patient_profile:
74
+ return {"error": "Patient not found", "overall_score": 0.0, "eligible": False}
75
+
76
+ # Quick rule-based pre-filter
77
+ passes_rules, rule_flags = _quick_eligibility_check(patient_profile, trial)
78
+
79
+ criteria = get_criteria_for_trial(trial)
80
+ result = score_patient_against_criteria(patient_profile, criteria, trial.get("title", "Clinical Trial"))
81
+
82
+ if not passes_rules:
83
+ result["overall_score"] = max(0.0, result.get("overall_score", 0.5) - 0.3)
84
+ result["eligible"] = False
85
+ result.setdefault("risk_flags", []).extend(rule_flags)
86
+
87
+ result["patient_id"] = patient_id
88
+ result["nct_id"] = trial.get("nct_id", "")
89
+ result["trial_title"] = trial.get("title", "")
90
+ result["match_path"] = _build_match_path(patient_profile, trial, criteria)
91
+
92
+ _score_cache[cache_key] = result
93
+ return result
94
+
95
+
96
+ def _build_match_path(patient_profile: dict, trial: dict, criteria: dict) -> list[dict]:
97
+ """
98
+ Build a human-readable graph explainability path showing WHY a patient was matched.
99
+ Returns a list of path nodes: Patient → biomarker/diagnosis/lab → Trial
100
+ """
101
+ path = []
102
+ patient_id = patient_profile.get("patient_id", "")
103
+ nct_id = trial.get("nct_id", "")
104
+ trial_title = trial.get("title", "")[:60]
105
+
106
+ # Check graph for shared biomarker edges
107
+ if _neo4j:
108
+ try:
109
+ rows = _neo4j.run_query(
110
+ """
111
+ MATCH (p:Patient {id: $pid})-[:HAS_BIOMARKER]->(b:Biomarker)
112
+ MATCH (t:Trial {id: $nct_id})
113
+ WHERE t.parsed_biomarkers CONTAINS b.name OR t.eligibility_criteria CONTAINS b.name
114
+ RETURN b.name AS biomarker LIMIT 3
115
+ """,
116
+ {"pid": patient_id, "nct_id": nct_id},
117
+ )
118
+ for row in rows:
119
+ path.append({
120
+ "from": f"Patient:{patient_id}",
121
+ "rel": "HAS_BIOMARKER",
122
+ "to": f"Biomarker:{row['biomarker']}",
123
+ "note": "required by trial",
124
+ })
125
+ except Exception:
126
+ pass
127
+
128
+ # Add FHIR-based reasoning nodes from the criteria match
129
+ for item in (criteria.get("required_biomarkers") or [])[:2]:
130
+ biomarkers = patient_profile.get("biomarkers", {})
131
+ if any(item.lower() in str(k).lower() or item.lower() in str(v).lower()
132
+ for k, v in biomarkers.items()):
133
+ path.append({
134
+ "from": f"Patient:{patient_id}",
135
+ "rel": "HAS_BIOMARKER",
136
+ "to": f"Biomarker:{item}",
137
+ "note": "matches trial requirement",
138
+ })
139
+
140
+ for dx in (criteria.get("required_diagnoses") or [])[:2]:
141
+ for patient_dx in patient_profile.get("diagnosis_names", []):
142
+ if any(word in patient_dx.lower() for word in dx.lower().split()):
143
+ path.append({
144
+ "from": f"Patient:{patient_id}",
145
+ "rel": "HAS_DIAGNOSIS",
146
+ "to": f"Diagnosis:{patient_dx}",
147
+ "note": f"matches required: {dx}",
148
+ })
149
+ break
150
+
151
+ # Terminal node
152
+ path.append({
153
+ "from": f"Patient:{patient_id}",
154
+ "rel": "ELIGIBLE_FOR",
155
+ "to": f"Trial:{nct_id}",
156
+ "note": trial_title,
157
+ })
158
+ return path
159
+
160
+
161
+ def match_patient_to_trials(patient_id: str, condition: str | None = None, top_n: int = 5) -> list[dict]:
162
+ """Find best-matching trials for a patient."""
163
+ patient_profile = get_patient_profile(patient_id)
164
+ if not patient_profile:
165
+ return []
166
+
167
+ # Infer condition from patient diagnoses if not provided
168
+ if not condition and patient_profile.get("diagnosis_names"):
169
+ condition = patient_profile["diagnosis_names"][0]
170
+ elif not condition:
171
+ condition = "cancer"
172
+
173
+ trials = search_trials_sync(condition, page_size=10)
174
+
175
+ scored = []
176
+ for trial in trials:
177
+ score_result = score_patient_for_trial(patient_id, trial)
178
+ scored.append({
179
+ **trial,
180
+ "match_score": score_result.get("overall_score", 0.0),
181
+ "eligible": score_result.get("eligible", False),
182
+ "match_summary": score_result.get("summary", ""),
183
+ "risk_flags": score_result.get("risk_flags", []),
184
+ })
185
+
186
+ scored.sort(key=lambda x: x["match_score"], reverse=True)
187
+ return scored[:top_n]
188
+
189
+
190
+ def find_eligible_patients_for_trial(nct_id: str) -> list[dict]:
191
+ """Screen all known patients against a specific trial."""
192
+ trial = get_trial_details_sync(nct_id)
193
+ if not trial:
194
+ return []
195
+
196
+ results = []
197
+ for patient_id in get_all_patient_ids():
198
+ score_result = score_patient_for_trial(patient_id, trial)
199
+ if score_result.get("overall_score", 0) > 0.4:
200
+ results.append({
201
+ "patient_id": patient_id,
202
+ "match_score": score_result.get("overall_score", 0.0),
203
+ "eligible": score_result.get("eligible", False),
204
+ "summary": score_result.get("summary", ""),
205
+ "risk_flags": score_result.get("risk_flags", []),
206
+ })
207
+
208
+ results.sort(key=lambda x: x["match_score"], reverse=True)
209
+ return results
backend/mcp_mocks.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Mock MCP Superpowers for hackathon demo
2
+
3
+ def parse_trial_protocol(protocol_text: str):
4
+ # Mock: Extract inclusion criteria, etc.
5
+ return {
6
+ "inclusion_criteria": ["Age > 18", "Diagnosis: Breast Cancer"],
7
+ "exclusion_criteria": ["Prior treatment X"],
8
+ "phase": "II"
9
+ }
10
+
11
+ def access_fhir_patient_data(patient_id: str):
12
+ # Mock: Return de-identified patient data
13
+ return {
14
+ "age": 45,
15
+ "gender": "F",
16
+ "diagnoses": ["C50"],
17
+ "medications": ["Drug A"]
18
+ }
19
+
20
+ def generate_recruitment_message(patient_id: str, trial_id: str):
21
+ # Mock: Generate personalized message
22
+ return f"Dear Patient {patient_id}, you may be eligible for Trial {trial_id}. Please contact your doctor."
23
+
24
+ def orchestrate_a2a_workflow(patient_id: str, trial_id: str):
25
+ # Mock A2A: Coordinate the superpowers
26
+ protocol = parse_trial_protocol("Mock protocol text")
27
+ patient_data = access_fhir_patient_data(patient_id)
28
+ message = generate_recruitment_message(patient_id, trial_id)
29
+ # Check eligibility (simple mock)
30
+ eligible = patient_data["diagnoses"][0] in protocol["inclusion_criteria"]
31
+ return {
32
+ "eligible": eligible,
33
+ "message": message if eligible else None
34
+ }
backend/mcp_server.py ADDED
@@ -0,0 +1,460 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ MCP Server for Precision Clinical Trial Matching Agent.
3
+ Exposes 9 tools accessible via Prompt Opinion and other MCP-compatible clients.
4
+
5
+ Run: python mcp_server.py
6
+ Or via SSE: uvicorn mcp_server:sse_app --port 8001
7
+ """
8
+ import asyncio
9
+ import json
10
+ import os
11
+ import sys
12
+ import httpx
13
+ from dotenv import load_dotenv
14
+
15
+ load_dotenv()
16
+
17
+ from mcp.server import Server
18
+ from mcp.server.stdio import stdio_server
19
+ from mcp import types
20
+
21
+ from fhir_adapter import get_patient_profile, get_all_patient_ids
22
+ from clinicaltrials_api import search_trials_sync, get_trial_details_sync
23
+ from matching_engine import match_patient_to_trials, score_patient_for_trial
24
+ from llm_client import generate_outreach_message, summarize_trial, get_provider_status
25
+ from analytics import get_kpi_summary, get_enrollment_funnel
26
+ from neo4j_setup import neo4j_conn
27
+ from fhir_server import get_fhir_server_status, get_live_patient_profile, build_sharp_context
28
+
29
+
30
+ server = Server("clinical-trial-matching-agent")
31
+
32
+
33
+ # US state abbreviation → full name (CT.gov returns full names)
34
+ _STATE_ABBR = {
35
+ "AL":"Alabama","AK":"Alaska","AZ":"Arizona","AR":"Arkansas","CA":"California",
36
+ "CO":"Colorado","CT":"Connecticut","DE":"Delaware","FL":"Florida","GA":"Georgia",
37
+ "HI":"Hawaii","ID":"Idaho","IL":"Illinois","IN":"Indiana","IA":"Iowa",
38
+ "KS":"Kansas","KY":"Kentucky","LA":"Louisiana","ME":"Maine","MD":"Maryland",
39
+ "MA":"Massachusetts","MI":"Michigan","MN":"Minnesota","MS":"Mississippi","MO":"Missouri",
40
+ "MT":"Montana","NE":"Nebraska","NV":"Nevada","NH":"New Hampshire","NJ":"New Jersey",
41
+ "NM":"New Mexico","NY":"New York","NC":"North Carolina","ND":"North Dakota","OH":"Ohio",
42
+ "OK":"Oklahoma","OR":"Oregon","PA":"Pennsylvania","RI":"Rhode Island","SC":"South Carolina",
43
+ "SD":"South Dakota","TN":"Tennessee","TX":"Texas","UT":"Utah","VT":"Vermont",
44
+ "VA":"Virginia","WA":"Washington","WV":"West Virginia","WI":"Wisconsin","WY":"Wyoming",
45
+ "DC":"District of Columbia",
46
+ }
47
+
48
+
49
+ def _error(code: str, message: str, retry_after: int | None = None) -> list[types.TextContent]:
50
+ """Structured error response for MCP callers."""
51
+ payload: dict = {"error": code, "message": message}
52
+ if retry_after is not None:
53
+ payload["retry_after"] = retry_after
54
+ return [types.TextContent(type="text", text=json.dumps(payload))]
55
+
56
+
57
+ @server.list_tools()
58
+ async def list_tools() -> list[types.Tool]:
59
+ return [
60
+ types.Tool(
61
+ name="ping",
62
+ description="Health check for the ClinicalMatch AI agent. Returns Neo4j graph status, CT.gov API reachability, seed status, and system readiness. Call this first to confirm the agent is ready before running any workflow.",
63
+ inputSchema={
64
+ "type": "object",
65
+ "properties": {},
66
+ "required": [],
67
+ },
68
+ ),
69
+ types.Tool(
70
+ name="get_patient_matches",
71
+ description="Get the top clinical trial matches for a specific patient with full eligibility score breakdown. Returns ranked trials with inclusion/exclusion criterion analysis, risk flags, and clinical reasoning. Ideal for a one-call eligibility summary before scheduling.",
72
+ inputSchema={
73
+ "type": "object",
74
+ "properties": {
75
+ "patient_id": {"type": "string", "description": "Patient ID (P001–P005 for FHIR mock patients)"},
76
+ "top_n": {"type": "integer", "description": "Number of top matches to return (default 5, max 10)", "default": 5},
77
+ "condition": {"type": "string", "description": "Override condition for trial search (optional — inferred from patient FHIR data if omitted)"},
78
+ },
79
+ "required": ["patient_id"],
80
+ },
81
+ ),
82
+ types.Tool(
83
+ name="list_recruiting_trials",
84
+ description="Search for actively recruiting clinical trials by condition with optional geographic filtering. Returns trials sorted by recency with site locations, enrollment targets, and phase details. Use for geographic-aware trial discovery.",
85
+ inputSchema={
86
+ "type": "object",
87
+ "properties": {
88
+ "condition": {"type": "string", "description": "Medical condition (e.g., 'breast cancer', 'NSCLC', 'prostate cancer')"},
89
+ "city": {"type": "string", "description": "Filter to trials with sites near this city (optional)"},
90
+ "state": {"type": "string", "description": "Filter to trials with sites in this US state abbreviation, e.g. 'CA' (optional)"},
91
+ "phase": {"type": "string", "description": "Trial phase filter: '1', '2', '3', or '4'", "enum": ["1", "2", "3", "4"]},
92
+ "max_results": {"type": "integer", "description": "Maximum results to return (default 10, max 20)", "default": 10},
93
+ },
94
+ "required": ["condition"],
95
+ },
96
+ ),
97
+ types.Tool(
98
+ name="find_trials",
99
+ description="Search ClinicalTrials.gov for recruiting clinical trials matching a medical condition. Returns ranked list of trials with eligibility criteria, locations, and enrollment info.",
100
+ inputSchema={
101
+ "type": "object",
102
+ "properties": {
103
+ "condition": {"type": "string", "description": "Medical condition (e.g., 'breast cancer', 'NSCLC', 'Alzheimer's disease')"},
104
+ "phase": {"type": "string", "description": "Trial phase: '1', '2', '3', or '4'", "enum": ["1", "2", "3", "4"]},
105
+ "page_size": {"type": "integer", "description": "Number of results (max 20)", "default": 10},
106
+ },
107
+ "required": ["condition"],
108
+ },
109
+ ),
110
+ types.Tool(
111
+ name="screen_patient",
112
+ description="Screen a patient against a specific clinical trial using AI-powered FHIR-based analysis. Accepts either a local patient ID or a live FHIR server patient ID with optional SMART bearer token. Returns eligibility score, inclusion/exclusion criterion assessment, clinical reasoning, and SHARP context envelope.",
113
+ inputSchema={
114
+ "type": "object",
115
+ "properties": {
116
+ "patient_id": {"type": "string", "description": "Local patient ID (e.g. P001) OR FHIR server patient ID"},
117
+ "nct_id": {"type": "string", "description": "ClinicalTrials.gov NCT number (e.g. NCT04889131)"},
118
+ "fhir_token": {"type": "string", "description": "SMART on FHIR bearer token for live FHIR server access (optional)"},
119
+ "use_live_fhir": {"type": "boolean", "description": "If true, fetch patient data from the live FHIR server instead of local registry", "default": False},
120
+ },
121
+ "required": ["patient_id", "nct_id"],
122
+ },
123
+ ),
124
+ types.Tool(
125
+ name="match_patient_to_trials",
126
+ description="Find the best-matching clinical trials for a patient using semantic AI matching. Accepts local or live FHIR patient ID. Returns ranked matches with SHARP context envelope for downstream agent consumption.",
127
+ inputSchema={
128
+ "type": "object",
129
+ "properties": {
130
+ "patient_id": {"type": "string", "description": "Patient ID (local: P001–P005, or live FHIR ID)"},
131
+ "condition": {"type": "string", "description": "Override condition for search (optional — inferred from FHIR data if omitted)"},
132
+ "top_n": {"type": "integer", "description": "Number of top matches to return", "default": 5},
133
+ "fhir_token": {"type": "string", "description": "SMART on FHIR bearer token (optional)"},
134
+ "use_live_fhir": {"type": "boolean", "description": "Fetch patient from live FHIR server", "default": False},
135
+ },
136
+ "required": ["patient_id"],
137
+ },
138
+ ),
139
+ types.Tool(
140
+ name="generate_recruitment_outreach",
141
+ description="Generate personalized recruitment communication for a patient-trial pair. Supports PCP referral letters, patient emails, and social media posts.",
142
+ inputSchema={
143
+ "type": "object",
144
+ "properties": {
145
+ "patient_id": {"type": "string", "description": "Patient ID"},
146
+ "nct_id": {"type": "string", "description": "Trial NCT ID"},
147
+ "channel": {
148
+ "type": "string",
149
+ "description": "Communication channel",
150
+ "enum": ["patient_email", "pcp_letter", "social_post"],
151
+ "default": "patient_email",
152
+ },
153
+ },
154
+ "required": ["patient_id", "nct_id"],
155
+ },
156
+ ),
157
+ types.Tool(
158
+ name="get_trial_analytics",
159
+ description="Get enrollment analytics and recruitment funnel data for a clinical trial or across all active trials.",
160
+ inputSchema={
161
+ "type": "object",
162
+ "properties": {
163
+ "trial_id": {"type": "string", "description": "NCT ID for trial-specific analytics (omit for aggregate)"},
164
+ },
165
+ "required": [],
166
+ },
167
+ ),
168
+ types.Tool(
169
+ name="summarize_trial_protocol",
170
+ description="Fetch a clinical trial from ClinicalTrials.gov and generate a plain-language AI summary for clinical coordinators.",
171
+ inputSchema={
172
+ "type": "object",
173
+ "properties": {
174
+ "nct_id": {"type": "string", "description": "ClinicalTrials.gov NCT number"},
175
+ },
176
+ "required": ["nct_id"],
177
+ },
178
+ ),
179
+ ]
180
+
181
+
182
+ @server.call_tool()
183
+ async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
184
+ try:
185
+ if name == "ping":
186
+ # Neo4j check
187
+ neo4j_ok = False
188
+ node_counts = {}
189
+ try:
190
+ rows = neo4j_conn.run_query(
191
+ "MATCH (n) RETURN labels(n)[0] AS label, count(n) AS cnt"
192
+ )
193
+ node_counts = {r["label"]: r["cnt"] for r in rows if r.get("label")}
194
+ neo4j_ok = True
195
+ except Exception as e:
196
+ neo4j_ok = False
197
+
198
+ # CT.gov reachability
199
+ ctgov_ok = False
200
+ try:
201
+ r = httpx.get(
202
+ "https://clinicaltrials.gov/api/v2/studies",
203
+ params={"query.term": "cancer", "pageSize": 1},
204
+ timeout=5,
205
+ )
206
+ ctgov_ok = r.status_code == 200
207
+ except Exception:
208
+ ctgov_ok = False
209
+
210
+ seeded = node_counts.get("Patient", 0) >= 100
211
+
212
+ fhir_status = get_fhir_server_status()
213
+ llm_status = get_provider_status()
214
+
215
+ status = {
216
+ "status": "ready" if (neo4j_ok and ctgov_ok and seeded) else "degraded",
217
+ "neo4j": "connected" if neo4j_ok else "unavailable",
218
+ "ctgov_api": "reachable" if ctgov_ok else "unreachable",
219
+ "fhir_server": "reachable" if fhir_status.get("reachable") else "unreachable",
220
+ "fhir_base_url": fhir_status.get("base_url"),
221
+ "smart_auth": fhir_status.get("auth_method"),
222
+ "graph_seeded": seeded,
223
+ "node_counts": node_counts,
224
+ "llm_provider": llm_status.get("provider"),
225
+ "llm_model": llm_status.get("model"),
226
+ "llm_hipaa_eligible": llm_status.get("hipaa_eligible"),
227
+ "standards": ["FHIR R4", "MCP", "A2A", "SHARP"],
228
+ "agent": "ClinicalMatch AI v2.0 — FHIR R4 · MCP · A2A · SHARP",
229
+ }
230
+ return [types.TextContent(type="text", text=json.dumps(status, indent=2))]
231
+
232
+ elif name == "get_patient_matches":
233
+ patient_id = arguments["patient_id"]
234
+ top_n = min(int(arguments.get("top_n", 5)), 10)
235
+ condition = arguments.get("condition")
236
+
237
+ profile = get_patient_profile(patient_id)
238
+ if not profile:
239
+ return _error("PATIENT_NOT_FOUND", f"Patient '{patient_id}' not found. Available: P001–P005.")
240
+
241
+ matches = match_patient_to_trials(patient_id, condition, top_n)
242
+ if not matches:
243
+ return _error("NO_TRIALS_FOUND", f"No trials found for patient {patient_id}.", retry_after=30)
244
+
245
+ output = f"## Top {len(matches)} Trial Matches — {patient_id}\n"
246
+ output += f"Patient: {profile['age']}y {profile['gender']} | Dx: {', '.join(profile['diagnosis_names'])}\n\n"
247
+ for i, m in enumerate(matches, 1):
248
+ output += f"### {i}. {m['title']} ({m['nct_id']})\n"
249
+ output += f"**Score:** {m['match_score']:.0%} | **Eligible:** {'✓ YES' if m['eligible'] else '✗ NO'} | **Phase:** {m.get('phase', 'N/A')}\n"
250
+ if m.get("match_summary"):
251
+ output += f"**Reasoning:** {m['match_summary'][:200]}\n"
252
+ if m.get("risk_flags"):
253
+ output += f"**Risk Flags:** {'; '.join(m['risk_flags'][:3])}\n"
254
+ locs = ", ".join(f"{l['city']}, {l['state']}" for l in m.get("locations", [])[:2])
255
+ if locs:
256
+ output += f"**Sites:** {locs}\n"
257
+ output += "\n"
258
+ return [types.TextContent(type="text", text=output)]
259
+
260
+ elif name == "list_recruiting_trials":
261
+ condition = arguments["condition"]
262
+ city = arguments.get("city", "").lower()
263
+ state = arguments.get("state", "").upper()
264
+ phase = arguments.get("phase")
265
+ max_results = min(int(arguments.get("max_results", 10)), 20)
266
+
267
+ trials = search_trials_sync(condition, phase, page_size=max_results)
268
+ if not trials:
269
+ return _error("NO_TRIALS_FOUND", f"No recruiting trials found for '{condition}'.", retry_after=10)
270
+
271
+ # Apply geo filter — CT.gov returns full state names, so expand abbreviation
272
+ if city or state:
273
+ state_full = _STATE_ABBR.get(state.upper(), state).lower() if state else ""
274
+ state_abbr = state.upper() if state else ""
275
+ filtered = []
276
+ for t in trials:
277
+ locs = t.get("locations", [])
278
+ match = any(
279
+ (city and city in (l.get("city", "") or "").lower()) or
280
+ (state and (
281
+ state_abbr == (l.get("state", "") or "").upper() or
282
+ state_full in (l.get("state", "") or "").lower()
283
+ ))
284
+ for l in locs
285
+ )
286
+ if match or not locs:
287
+ filtered.append(t)
288
+ geo_note = f" near {city or ''}{', ' + state if state else ''}".strip(", ")
289
+ trials = filtered or trials # fallback to all if filter too narrow
290
+ else:
291
+ geo_note = ""
292
+
293
+ output = f"## Recruiting Trials: {condition}{geo_note}\n"
294
+ output += f"Found {len(trials)} trials (sorted by most recently updated)\n\n"
295
+ for i, t in enumerate(trials, 1):
296
+ locs = ", ".join(f"{l['city']}, {l['state']}" for l in t.get("locations", [])[:3])
297
+ output += f"{i}. **{t['title']}** ({t['nct_id']})\n"
298
+ output += f" Phase: {t.get('phase','N/A')} | Sites: {t.get('location_count',0)} | Enrollment: {t.get('enrollment','N/A')}\n"
299
+ output += f" Sponsor: {t.get('sponsor','N/A')} | Updated: {t.get('last_updated','N/A')}\n"
300
+ if locs:
301
+ output += f" Locations: {locs}\n"
302
+ output += f" URL: {t.get('ctgov_url','')}\n\n"
303
+ return [types.TextContent(type="text", text=output)]
304
+
305
+ elif name == "find_trials":
306
+ condition = arguments["condition"]
307
+ phase = arguments.get("phase")
308
+ page_size = min(int(arguments.get("page_size", 10)), 20)
309
+ trials = search_trials_sync(condition, phase, page_size=page_size)
310
+ output = f"Found {len(trials)} recruiting trials for '{condition}':\n\n"
311
+ for i, trial in enumerate(trials, 1):
312
+ locs = ", ".join(f"{l['city']}, {l['state']}" for l in trial.get("locations", [])[:2])
313
+ output += f"{i}. **{trial['title']}** ({trial['nct_id']})\n"
314
+ output += f" Phase: {trial['phase']} | Status: {trial['status']} | Sites: {trial['location_count']}\n"
315
+ output += f" Enrollment: {trial['enrollment']} | Sponsor: {trial['sponsor']}\n"
316
+ if locs:
317
+ output += f" Locations: {locs}\n"
318
+ output += "\n"
319
+ return [types.TextContent(type="text", text=output)]
320
+
321
+ elif name == "screen_patient":
322
+ patient_id = arguments["patient_id"]
323
+ nct_id = arguments["nct_id"]
324
+ use_live_fhir = arguments.get("use_live_fhir", False)
325
+ fhir_token = arguments.get("fhir_token")
326
+
327
+ # Build SHARP context envelope
328
+ sharp_ctx = build_sharp_context(
329
+ patient_id=patient_id,
330
+ fhir_ref=f"Patient/{patient_id}",
331
+ )
332
+ if fhir_token:
333
+ sharp_ctx["fhir_token"] = fhir_token
334
+
335
+ # Optionally fetch from live FHIR server
336
+ if use_live_fhir:
337
+ live_profile = get_live_patient_profile(patient_id, sharp_context=sharp_ctx)
338
+ if not live_profile:
339
+ return _error("FHIR_PATIENT_NOT_FOUND",
340
+ f"Patient '{patient_id}' not found on FHIR server {sharp_ctx['patient_context']['fhir_base']}")
341
+
342
+ trial = get_trial_details_sync(nct_id)
343
+ if not trial:
344
+ return _error("TRIAL_NOT_FOUND", f"Trial {nct_id} not found in ClinicalTrials.gov")
345
+ result = score_patient_for_trial(patient_id, trial)
346
+ if "error" in result:
347
+ return _error("SCREENING_ERROR", result["error"])
348
+ result["sharp_context"] = sharp_ctx
349
+
350
+ score = result.get("overall_score", 0)
351
+ eligible = result.get("eligible", False)
352
+ output = f"## Eligibility Assessment: {patient_id} → {nct_id}\n\n"
353
+ output += f"**Overall Score:** {score:.0%} | **Eligible:** {'YES' if eligible else 'NO'}\n\n"
354
+ output += f"**Clinical Reasoning:** {result.get('summary', '')}\n\n"
355
+
356
+ incl = result.get("inclusion_results", [])
357
+ if incl:
358
+ output += "**Inclusion Criteria:**\n"
359
+ for c in incl:
360
+ icon = "✓" if c.get("met") else "✗"
361
+ output += f" {icon} {c.get('criterion', '')} [{c.get('confidence', '')}]\n"
362
+ excl = result.get("exclusion_results", [])
363
+ if excl:
364
+ output += "\n**Exclusion Criteria:**\n"
365
+ for c in excl:
366
+ icon = "⚠" if c.get("triggered") else "✓"
367
+ output += f" {icon} {c.get('criterion', '')} [{c.get('confidence', '')}]\n"
368
+ flags = result.get("risk_flags", [])
369
+ if flags:
370
+ output += f"\n**Risk Flags:** {'; '.join(flags)}"
371
+ return [types.TextContent(type="text", text=output)]
372
+
373
+ elif name == "match_patient_to_trials":
374
+ patient_id = arguments["patient_id"]
375
+ condition = arguments.get("condition")
376
+ top_n = int(arguments.get("top_n", 5))
377
+ use_live_fhir = arguments.get("use_live_fhir", False)
378
+ fhir_token = arguments.get("fhir_token")
379
+
380
+ sharp_ctx = build_sharp_context(patient_id=patient_id, fhir_ref=f"Patient/{patient_id}")
381
+ if fhir_token:
382
+ sharp_ctx["fhir_token"] = fhir_token
383
+
384
+ if use_live_fhir:
385
+ profile = get_live_patient_profile(patient_id, sharp_context=sharp_ctx)
386
+ if not profile:
387
+ return _error("FHIR_PATIENT_NOT_FOUND", f"Patient '{patient_id}' not found on FHIR server")
388
+ if not condition and profile.get("diagnosis_names"):
389
+ condition = profile["diagnosis_names"][0]
390
+ else:
391
+ profile = get_patient_profile(patient_id)
392
+
393
+ matches = match_patient_to_trials(patient_id, condition, top_n)
394
+ output = f"## Top {len(matches)} Trial Matches for {patient_id}\n"
395
+ output += f"SHARP: fhir_ref={sharp_ctx['patient_context']['fhir_ref']} session={sharp_ctx['patient_context']['session_id'][:8]}...\n"
396
+ if profile:
397
+ output += f"Patient: {profile['age']}y {profile['gender']} | Diagnoses: {', '.join(profile.get('diagnosis_names', []))}\n\n"
398
+ for i, m in enumerate(matches, 1):
399
+ output += f"{i}. **{m['title']}** ({m['nct_id']})\n"
400
+ output += f" Match Score: {m['match_score']:.0%} | Eligible: {'YES' if m['eligible'] else 'NO'} | Phase: {m.get('phase','N/A')}\n"
401
+ if m.get("match_summary"):
402
+ output += f" {m['match_summary'][:150]}...\n"
403
+ output += "\n"
404
+ return [types.TextContent(type="text", text=output)]
405
+
406
+ elif name == "generate_recruitment_outreach":
407
+ patient_id = arguments["patient_id"]
408
+ nct_id = arguments["nct_id"]
409
+ channel = arguments.get("channel", "patient_email")
410
+ trial = get_trial_details_sync(nct_id) or {"nct_id": nct_id, "title": "Clinical Trial", "brief_summary": "", "phase": "N/A", "sponsor": "N/A", "locations": []}
411
+ patient_profile = get_patient_profile(patient_id)
412
+ if not patient_profile:
413
+ return [types.TextContent(type="text", text=f"Patient {patient_id} not found")]
414
+ message = generate_outreach_message(patient_profile, trial, channel)
415
+ output = f"## Recruitment Outreach ({channel.replace('_', ' ').title()})\n"
416
+ output += f"Patient: {patient_id} | Trial: {nct_id}\n\n"
417
+ output += "---\n\n" + message
418
+ return [types.TextContent(type="text", text=output)]
419
+
420
+ elif name == "get_trial_analytics":
421
+ trial_id = arguments.get("trial_id")
422
+ kpis = get_kpi_summary()
423
+ funnel = get_enrollment_funnel(trial_id)
424
+ output = "## Clinical Trial Analytics\n\n"
425
+ output += f"**Active Trials:** {kpis['active_trials']}\n"
426
+ output += f"**Patients Identified:** {kpis['patients_identified']}\n"
427
+ output += f"**Enrollment Rate:** {kpis['enrollment_rate']:.0%}\n"
428
+ output += f"**Avg Days to Match:** {kpis['avg_days_to_match']}\n"
429
+ output += f"**Cost Savings:** ${kpis['cost_saved_usd']:,}\n\n"
430
+ output += "**Enrollment Funnel:**\n"
431
+ for stage in funnel:
432
+ output += f" {stage['stage']}: {stage['count']}\n"
433
+ return [types.TextContent(type="text", text=output)]
434
+
435
+ elif name == "summarize_trial_protocol":
436
+ nct_id = arguments["nct_id"]
437
+ trial = get_trial_details_sync(nct_id)
438
+ if not trial:
439
+ return [types.TextContent(type="text", text=f"Trial {nct_id} not found")]
440
+ summary = summarize_trial(trial)
441
+ output = f"## {trial['title']} ({nct_id})\n\n"
442
+ output += f"**Phase:** {trial['phase']} | **Status:** {trial['status']} | **Enrollment:** {trial['enrollment']}\n"
443
+ output += f"**Sponsor:** {trial['sponsor']}\n\n"
444
+ output += summary
445
+ return [types.TextContent(type="text", text=output)]
446
+
447
+ else:
448
+ return [types.TextContent(type="text", text=f"Unknown tool: {name}")]
449
+
450
+ except Exception as e:
451
+ return _error("TOOL_ERROR", f"Tool '{name}' failed: {str(e)}")
452
+
453
+
454
+ async def main():
455
+ async with stdio_server() as (read_stream, write_stream):
456
+ await server.run(read_stream, write_stream, server.create_initialization_options())
457
+
458
+
459
+ if __name__ == "__main__":
460
+ asyncio.run(main())
backend/neo4j_setup.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from neo4j import GraphDatabase
2
+ import os
3
+ from dotenv import load_dotenv
4
+
5
+ load_dotenv()
6
+
7
+
8
+ class Neo4jConnection:
9
+ def __init__(self, uri: str, user: str, password: str, database: str = "neo4j"):
10
+ self.driver = GraphDatabase.driver(uri, auth=(user, password))
11
+ self.database = database
12
+
13
+ def close(self):
14
+ self.driver.close()
15
+
16
+ def run_query(self, query: str, parameters: dict | None = None) -> list:
17
+ with self.driver.session(database=self.database) as session:
18
+ result = session.run(query, parameters or {})
19
+ return [record.data() for record in result]
20
+
21
+
22
+ neo4j_conn = Neo4jConnection(
23
+ uri=os.getenv("NEO4J_URI", ""),
24
+ user=os.getenv("NEO4J_USERNAME", "neo4j"),
25
+ password=os.getenv("NEO4J_PASSWORD", ""),
26
+ database=os.getenv("NEO4J_DATABASE", "neo4j"),
27
+ )
28
+
29
+
30
+ def setup_schema():
31
+ constraints = [
32
+ "CREATE CONSTRAINT patient_id IF NOT EXISTS FOR (p:Patient) REQUIRE p.id IS UNIQUE",
33
+ "CREATE CONSTRAINT trial_id IF NOT EXISTS FOR (t:Trial) REQUIRE t.id IS UNIQUE",
34
+ "CREATE CONSTRAINT diagnosis_code IF NOT EXISTS FOR (d:Diagnosis) REQUIRE d.code IS UNIQUE",
35
+ "CREATE CONSTRAINT site_id IF NOT EXISTS FOR (s:StudySite) REQUIRE s.id IS UNIQUE",
36
+ ]
37
+ indexes = [
38
+ "CREATE INDEX patient_age IF NOT EXISTS FOR (p:Patient) ON (p.age)",
39
+ "CREATE INDEX trial_phase IF NOT EXISTS FOR (t:Trial) ON (t.phase)",
40
+ "CREATE INDEX trial_condition IF NOT EXISTS FOR (t:Trial) ON (t.condition)",
41
+ "CREATE INDEX trial_status IF NOT EXISTS FOR (t:Trial) ON (t.status)",
42
+ ]
43
+ for query in constraints + indexes:
44
+ try:
45
+ neo4j_conn.run_query(query)
46
+ except Exception as e:
47
+ print(f"Schema warning: {e}")
48
+ print("Schema setup complete.")
49
+
50
+
51
+ if __name__ == "__main__":
52
+ setup_schema()
53
+ neo4j_conn.close()
backend/recruitment_pipeline.py ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Recruitment pipeline — state tracking and communication management."""
2
+ import uuid
3
+ from datetime import datetime
4
+ from enum import Enum
5
+ from fhir_adapter import get_patient_profile, MOCK_FHIR_PATIENTS
6
+ from llm_client import generate_outreach_message
7
+
8
+ class RecruitmentStatus(str, Enum):
9
+ IDENTIFIED = "IDENTIFIED"
10
+ CONTACTED = "CONTACTED"
11
+ SCREENING = "SCREENING"
12
+ CONSENTED = "CONSENTED"
13
+ ENROLLED = "ENROLLED"
14
+ DECLINED = "DECLINED"
15
+ INELIGIBLE = "INELIGIBLE"
16
+
17
+
18
+ # In-memory pipeline store
19
+ _pipeline: dict[str, dict] = {}
20
+
21
+
22
+ def _seed_demo_records():
23
+ """Seed realistic demo records across pipeline stages."""
24
+ demo = [
25
+ ("P001", "NCT04889131", "Precision Breast Cancer Study", 0.91, RecruitmentStatus.SCREENING),
26
+ ("P001", "NCT05123456", "Immunotherapy Combination Trial", 0.78, RecruitmentStatus.CONTACTED),
27
+ ("P002", "NCT05456789", "Prostate Cancer BRCA2 Study", 0.85, RecruitmentStatus.IDENTIFIED),
28
+ ("P003", "NCT04889131", "Precision Breast Cancer Study", 0.65, RecruitmentStatus.IDENTIFIED),
29
+ ("P004", "NCT06112233", "EGFR-Mutant NSCLC Trial", 0.93, RecruitmentStatus.CONSENTED),
30
+ ("P004", "NCT05987654", "PD-L1 Immunotherapy Study", 0.81, RecruitmentStatus.SCREENING),
31
+ ("P005", "NCT05334455", "MSI-H Colorectal Cancer Study", 0.88, RecruitmentStatus.ENROLLED),
32
+ ("P002", "NCT04223344", "Androgen Receptor Pathway Study", 0.72, RecruitmentStatus.DECLINED),
33
+ ]
34
+ for patient_id, nct_id, trial_title, score, status in demo:
35
+ record_id = str(uuid.uuid4())
36
+ _pipeline[record_id] = {
37
+ "record_id": record_id,
38
+ "patient_id": patient_id,
39
+ "nct_id": nct_id,
40
+ "trial_title": trial_title,
41
+ "match_score": score,
42
+ "status": status,
43
+ "outreach_history": [],
44
+ "created_at": datetime.utcnow().isoformat(),
45
+ "updated_at": datetime.utcnow().isoformat(),
46
+ }
47
+
48
+
49
+ _seed_demo_records()
50
+
51
+
52
+ def get_kanban_board() -> dict:
53
+ """Return records grouped by status for kanban view."""
54
+ board: dict[str, list] = {s: [] for s in RecruitmentStatus}
55
+ for record in _pipeline.values():
56
+ board[record["status"]].append(record)
57
+ return board
58
+
59
+
60
+ def get_all_records() -> list[dict]:
61
+ return list(_pipeline.values())
62
+
63
+
64
+ def get_record(record_id: str) -> dict | None:
65
+ return _pipeline.get(record_id)
66
+
67
+
68
+ def create_record(patient_id: str, nct_id: str, trial_title: str, match_score: float) -> dict:
69
+ record_id = str(uuid.uuid4())
70
+ record = {
71
+ "record_id": record_id,
72
+ "patient_id": patient_id,
73
+ "nct_id": nct_id,
74
+ "trial_title": trial_title,
75
+ "match_score": match_score,
76
+ "status": RecruitmentStatus.IDENTIFIED,
77
+ "outreach_history": [],
78
+ "created_at": datetime.utcnow().isoformat(),
79
+ "updated_at": datetime.utcnow().isoformat(),
80
+ }
81
+ _pipeline[record_id] = record
82
+ return record
83
+
84
+
85
+ def update_status(record_id: str, new_status: RecruitmentStatus) -> dict:
86
+ if record_id not in _pipeline:
87
+ raise ValueError(f"Record {record_id} not found")
88
+ _pipeline[record_id]["status"] = new_status
89
+ _pipeline[record_id]["updated_at"] = datetime.utcnow().isoformat()
90
+ return _pipeline[record_id]
91
+
92
+
93
+ def generate_and_store_outreach(patient_id: str, nct_id: str, trial_title: str, trial: dict, channel: str) -> dict:
94
+ patient_profile = get_patient_profile(patient_id)
95
+ if not patient_profile:
96
+ raise ValueError(f"Patient {patient_id} not found")
97
+
98
+ message = generate_outreach_message(patient_profile, trial, channel)
99
+
100
+ outreach = {
101
+ "id": str(uuid.uuid4()),
102
+ "channel": channel,
103
+ "message": message,
104
+ "generated_at": datetime.utcnow().isoformat(),
105
+ "status": "GENERATED",
106
+ }
107
+
108
+ # Find or create pipeline record
109
+ record_id = None
110
+ for rid, record in _pipeline.items():
111
+ if record["patient_id"] == patient_id and record["nct_id"] == nct_id:
112
+ record_id = rid
113
+ break
114
+
115
+ if not record_id:
116
+ record = create_record(patient_id, nct_id, trial_title, 0.75)
117
+ record_id = record["record_id"]
118
+
119
+ _pipeline[record_id]["outreach_history"].append(outreach)
120
+ _pipeline[record_id]["updated_at"] = datetime.utcnow().isoformat()
121
+
122
+ return {"record_id": record_id, "outreach": outreach}
backend/requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn[standard]
3
+ neo4j
4
+ langchain
5
+ langchain-community
6
+ langchain-openai
7
+ openai
8
+ httpx
9
+ mcp
10
+ pydantic
11
+ python-dotenv
backend/rl_enrichment.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torch.optim as optim
4
+ from torch_geometric.data import Data
5
+ from torch_geometric.nn import GCNConv
6
+ import random
7
+
8
+ # Simple GNN for graph representation
9
+ class GNN(nn.Module):
10
+ def __init__(self, in_channels, hidden_channels, out_channels):
11
+ super(GNN, self).__init__()
12
+ self.conv1 = GCNConv(in_channels, hidden_channels)
13
+ self.conv2 = GCNConv(hidden_channels, out_channels)
14
+
15
+ def forward(self, x, edge_index):
16
+ x = self.conv1(x, edge_index)
17
+ x = torch.relu(x)
18
+ x = self.conv2(x, edge_index)
19
+ return x
20
+
21
+ # Simple RL Agent for graph enrichment
22
+ class GraphRLEnrichment:
23
+ def __init__(self, graph_data):
24
+ self.model = GNN(in_channels=graph_data.x.shape[1], hidden_channels=64, out_channels=32)
25
+ self.optimizer = optim.Adam(self.model.parameters(), lr=0.01)
26
+ self.graph_data = graph_data
27
+
28
+ def get_state(self):
29
+ # Get GNN embedding as state
30
+ with torch.no_grad():
31
+ state = self.model(self.graph_data.x, self.graph_data.edge_index)
32
+ return state.mean(dim=0) # Aggregate to single vector
33
+
34
+ def select_action(self, state):
35
+ # Simple policy: random for now, in full impl use policy network
36
+ return random.choice([0, 1]) # 0: no edge, 1: add edge
37
+
38
+ def train_step(self, reward):
39
+ # Simple training: minimize negative reward
40
+ loss = -reward # Dummy loss
41
+ self.optimizer.zero_grad()
42
+ loss.backward()
43
+ self.optimizer.step()
44
+
45
+ # Mock graph data (in practice, convert Neo4j graph to PyG Data)
46
+ # Assume nodes: 0-1 patients, 2-3 diagnoses, 4-5 trials
47
+ edge_index = torch.tensor([[0, 1, 2, 3, 4],
48
+ [2, 3, 4, 5, 5]], dtype=torch.long) # Mock edges
49
+ x = torch.randn(6, 10) # 6 nodes, 10 features
50
+ graph_data = Data(x=x, edge_index=edge_index)
51
+
52
+ rl_agent = GraphRLEnrichment(graph_data)
53
+
54
+ def enrich_graph():
55
+ state = rl_agent.get_state()
56
+ action = rl_agent.select_action(state)
57
+ # Simulate reward: if action=1, add edge and reward=1 if successful
58
+ reward = random.random() if action == 1 else 0
59
+ rl_agent.train_step(reward)
60
+ if action == 1:
61
+ print("Added potential edge via RL enrichment.")
62
+ return reward
backend/trial_enrichment.py ADDED
@@ -0,0 +1,233 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Passive graph enrichment — called automatically when users search for trials.
3
+ Each search result is upserted into Neo4j so the graph grows richer over time.
4
+ Also provides graph-intelligence queries for the UI.
5
+ """
6
+ from neo4j_setup import neo4j_conn
7
+ import json
8
+
9
+
10
+ def upsert_trial(trial: dict) -> None:
11
+ """Write/update a Trial node from a ClinicalTrials.gov result."""
12
+ nct_id = trial.get("nct_id", "")
13
+ if not nct_id:
14
+ return
15
+ neo4j_conn.run_query(
16
+ """
17
+ MERGE (t:Trial {id: $id})
18
+ SET t += {
19
+ title: $title,
20
+ status: $status,
21
+ phase: $phase,
22
+ condition: $condition,
23
+ brief_summary: $brief_summary,
24
+ eligibility_criteria: $eligibility_criteria,
25
+ min_age: $min_age,
26
+ max_age: $max_age,
27
+ sex: $sex,
28
+ enrollment: $enrollment,
29
+ start_date: $start_date,
30
+ completion_date: $completion_date,
31
+ last_updated: $last_updated,
32
+ sponsor: $sponsor,
33
+ location_count: $location_count,
34
+ ctgov_url: $ctgov_url,
35
+ ingested_at: datetime()
36
+ }
37
+ """,
38
+ {
39
+ "id": nct_id,
40
+ "title": trial.get("title", "")[:200],
41
+ "status": trial.get("status", ""),
42
+ "phase": trial.get("phase", "N/A"),
43
+ "condition": trial.get("condition", "").lower(),
44
+ "brief_summary": trial.get("brief_summary", "")[:1000],
45
+ "eligibility_criteria": trial.get("eligibility_criteria", "")[:2000],
46
+ "min_age": trial.get("min_age", ""),
47
+ "max_age": trial.get("max_age", ""),
48
+ "sex": trial.get("sex", "ALL"),
49
+ "enrollment": trial.get("enrollment", 0),
50
+ "start_date": trial.get("start_date", ""),
51
+ "completion_date": trial.get("completion_date", ""),
52
+ "last_updated": trial.get("last_updated", ""),
53
+ "sponsor": trial.get("sponsor", "")[:100],
54
+ "location_count": trial.get("location_count", 0),
55
+ "ctgov_url": trial.get("ctgov_url", f"https://clinicaltrials.gov/study/{nct_id}"),
56
+ },
57
+ )
58
+
59
+ # Upsert StudySite nodes for each location
60
+ for loc in trial.get("locations", []):
61
+ if not loc.get("city"):
62
+ continue
63
+ site_id = f"SITE_{nct_id}_{loc['city'].replace(' ', '_').upper()}"
64
+ neo4j_conn.run_query(
65
+ """
66
+ MERGE (s:StudySite {id: $id})
67
+ SET s += {name: $name, city: $city, state: $state, country: $country,
68
+ lat: $lat, lon: $lon}
69
+ WITH s
70
+ MATCH (t:Trial {id: $nct_id})
71
+ MERGE (t)-[:LOCATED_AT]->(s)
72
+ """,
73
+ {
74
+ "id": site_id,
75
+ "name": loc.get("facility", f"{loc['city']} Site"),
76
+ "city": loc["city"],
77
+ "state": loc.get("state", ""),
78
+ "country": loc.get("country", "US"),
79
+ "lat": loc.get("lat"),
80
+ "lon": loc.get("lon"),
81
+ "nct_id": nct_id,
82
+ },
83
+ )
84
+
85
+
86
+ def enrich_trials_from_search(trials: list[dict], condition: str) -> None:
87
+ """Background-safe: upsert all search results into Neo4j, then LLM-parse eligibility."""
88
+ for trial in trials:
89
+ if not trial.get("condition"):
90
+ trial["condition"] = condition
91
+ try:
92
+ upsert_trial(trial)
93
+ # LLM-parse eligibility criteria and store as structured graph properties
94
+ _enrich_eligibility_structured(trial)
95
+ except Exception as e:
96
+ print(f"[enrichment] failed to upsert {trial.get('nct_id')}: {e}")
97
+
98
+
99
+ def _enrich_eligibility_structured(trial: dict) -> None:
100
+ """
101
+ Parse eligibility_criteria text with LLM and store structured fields on the Trial node.
102
+ Only runs if the node doesn't already have parsed criteria (idempotent).
103
+ """
104
+ nct_id = trial.get("nct_id", "")
105
+ if not nct_id or not trial.get("eligibility_criteria"):
106
+ return
107
+
108
+ # Skip if already parsed
109
+ existing = neo4j_conn.run_query(
110
+ "MATCH (t:Trial {id: $id}) RETURN t.parsed_at AS pa", {"id": nct_id}
111
+ )
112
+ if existing and existing[0].get("pa"):
113
+ return
114
+
115
+ try:
116
+ from llm_client import parse_trial_protocol
117
+ criteria = parse_trial_protocol(trial["eligibility_criteria"])
118
+
119
+ neo4j_conn.run_query(
120
+ """
121
+ MATCH (t:Trial {id: $id})
122
+ SET t.parsed_inclusion = $inclusion,
123
+ t.parsed_exclusion = $exclusion,
124
+ t.parsed_age_min = $age_min,
125
+ t.parsed_age_max = $age_max,
126
+ t.parsed_biomarkers = $biomarkers,
127
+ t.parsed_ecog_max = $ecog_max,
128
+ t.parsed_at = datetime()
129
+ """,
130
+ {
131
+ "id": nct_id,
132
+ "inclusion": json.dumps(criteria.get("inclusion_criteria", [])[:10]),
133
+ "exclusion": json.dumps(criteria.get("exclusion_criteria", [])[:10]),
134
+ "age_min": criteria.get("age_range", {}).get("min"),
135
+ "age_max": criteria.get("age_range", {}).get("max"),
136
+ "biomarkers": json.dumps(criteria.get("required_biomarkers", [])),
137
+ "ecog_max": _extract_ecog_max(criteria.get("performance_status", "")),
138
+ },
139
+ )
140
+ print(f"[enrichment] parsed eligibility for {nct_id}")
141
+ except Exception as e:
142
+ print(f"[enrichment] LLM parse failed for {nct_id}: {e}")
143
+
144
+
145
+ def _extract_ecog_max(perf_status: str) -> int | None:
146
+ """Extract numeric ECOG upper bound from strings like 'ECOG 0-2' or 'ECOG ≤ 1'."""
147
+ import re
148
+ if not perf_status:
149
+ return None
150
+ m = re.search(r"(\d)\s*[-–]\s*(\d)", perf_status)
151
+ if m:
152
+ return int(m.group(2))
153
+ m = re.search(r"[≤<=]\s*(\d)", perf_status)
154
+ if m:
155
+ return int(m.group(1))
156
+ m = re.search(r"(\d)", perf_status)
157
+ if m:
158
+ return int(m.group(1))
159
+ return None
160
+
161
+
162
+ def get_eligible_patient_count(nct_id: str) -> int:
163
+ """Count patients in the graph with an ELIGIBLE_FOR edge to this trial."""
164
+ rows = neo4j_conn.run_query(
165
+ "MATCH (p:Patient)-[:ELIGIBLE_FOR]->(t:Trial {id: $id}) RETURN count(p) AS n",
166
+ {"id": nct_id},
167
+ )
168
+ return rows[0]["n"] if rows else 0
169
+
170
+
171
+ def get_eligible_patient_counts(nct_ids: list[str]) -> dict[str, int]:
172
+ """Batch version — returns {nct_id: count} for a list of trials."""
173
+ if not nct_ids:
174
+ return {}
175
+ rows = neo4j_conn.run_query(
176
+ """
177
+ MATCH (p:Patient)-[:ELIGIBLE_FOR]->(t:Trial)
178
+ WHERE t.id IN $ids
179
+ RETURN t.id AS nct_id, count(p) AS n
180
+ """,
181
+ {"ids": nct_ids},
182
+ )
183
+ return {row["nct_id"]: row["n"] for row in rows}
184
+
185
+
186
+ def get_similar_trials(nct_id: str, limit: int = 5) -> list[dict]:
187
+ """Graph-walk: find trials sharing eligible patients with this trial."""
188
+ rows = neo4j_conn.run_query(
189
+ """
190
+ MATCH (p:Patient)-[:ELIGIBLE_FOR]->(seed:Trial {id: $id})
191
+ MATCH (p)-[:ELIGIBLE_FOR]->(other:Trial)
192
+ WHERE other.id <> $id
193
+ RETURN other.id AS nct_id, other.title AS title, other.phase AS phase,
194
+ other.condition AS condition, count(p) AS shared_patients
195
+ ORDER BY shared_patients DESC LIMIT $limit
196
+ """,
197
+ {"id": nct_id, "limit": limit},
198
+ )
199
+ return rows
200
+
201
+
202
+ def get_graph_intelligence(nct_id: str) -> dict:
203
+ """Aggregate graph-derived insights for a single trial."""
204
+ eligible_count = get_eligible_patient_count(nct_id)
205
+ similar = get_similar_trials(nct_id, limit=3)
206
+
207
+ # Biomarker coverage — which biomarkers do eligible patients carry?
208
+ bm_rows = neo4j_conn.run_query(
209
+ """
210
+ MATCH (p:Patient)-[:ELIGIBLE_FOR]->(t:Trial {id: $id})
211
+ MATCH (p)-[:HAS_BIOMARKER]->(b:Biomarker)
212
+ RETURN b.name AS biomarker, count(p) AS patient_count
213
+ ORDER BY patient_count DESC LIMIT 5
214
+ """,
215
+ {"id": nct_id},
216
+ )
217
+
218
+ # Site density — patients near trial sites
219
+ site_rows = neo4j_conn.run_query(
220
+ """
221
+ MATCH (t:Trial {id: $id})-[:LOCATED_AT]->(s:StudySite)
222
+ RETURN s.city AS city, s.state AS state
223
+ LIMIT 5
224
+ """,
225
+ {"id": nct_id},
226
+ )
227
+
228
+ return {
229
+ "eligible_patients": eligible_count,
230
+ "similar_trials": similar,
231
+ "top_biomarkers": bm_rows,
232
+ "sites": site_rows,
233
+ }
docker-compose.yml ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: "3.9"
2
+
3
+ # ── Local development stack ────────────────────────────────────────────────────
4
+ # Usage:
5
+ # docker compose up -d
6
+ # docker compose logs -f backend # watch logs
7
+ # docker compose exec backend python graph_seeder.py # seed real data
8
+ #
9
+ # Frontend: http://localhost:3000
10
+ # Backend: http://localhost:8000
11
+ # Neo4j Browser: http://localhost:7474 (neo4j / clinicalmatch2024)
12
+
13
+ services:
14
+
15
+ # ── Neo4j Community (free, no expiry) ────────────────────────────────────────
16
+ neo4j:
17
+ image: neo4j:5.18-community
18
+ container_name: clinicalmatch-neo4j
19
+ restart: unless-stopped
20
+ ports:
21
+ - "7476:7474" # Neo4j Browser
22
+ - "7687:7687" # Bolt
23
+ volumes:
24
+ - neo4j_data:/data
25
+ - neo4j_logs:/logs
26
+ environment:
27
+ NEO4J_AUTH: "neo4j/clinicalmatch2024"
28
+ NEO4J_PLUGINS: '["apoc"]'
29
+ NEO4J_dbms_security_procedures_unrestricted: "apoc.*"
30
+ NEO4J_dbms_security_procedures_allowlist: "apoc.*"
31
+ NEO4J_server_memory_heap_initial__size: "512m"
32
+ NEO4J_server_memory_heap_max__size: "1g"
33
+ NEO4J_server_memory_pagecache_size: "256m"
34
+ NEO4J_dbms_logs_query_enabled: "OFF"
35
+ healthcheck:
36
+ test: ["CMD-SHELL", "wget -qO- http://localhost:7476 || exit 1"]
37
+ interval: 20s
38
+ timeout: 10s
39
+ retries: 10
40
+ start_period: 60s
41
+
42
+ # ── FastAPI backend ───────────────────────────────────────────────────────────
43
+ backend:
44
+ build:
45
+ context: .
46
+ dockerfile: docker/Dockerfile.backend
47
+ container_name: clinicalmatch-backend
48
+ restart: unless-stopped
49
+ ports:
50
+ - "8000:8000"
51
+ depends_on:
52
+ neo4j:
53
+ condition: service_healthy
54
+ env_file: .env.local
55
+ environment:
56
+ NEO4J_URI: "bolt://neo4j:7687"
57
+ NEO4J_USERNAME: "neo4j"
58
+ NEO4J_PASSWORD: "clinicalmatch2024"
59
+ NEO4J_DATABASE: "neo4j"
60
+ command: >
61
+ sh -c "python3 neo4j_setup.py &&
62
+ python3 data_ingestion.py &&
63
+ uvicorn main:app --host 0.0.0.0 --port 8000 --workers 2"
64
+ volumes:
65
+ - ./backend:/app # hot-reload for local dev
66
+ working_dir: /app
67
+
68
+ # ── Next.js frontend ──────────────────────────────────────────────────────────
69
+ frontend:
70
+ build:
71
+ context: .
72
+ dockerfile: docker/Dockerfile.frontend
73
+ container_name: clinicalmatch-frontend
74
+ restart: unless-stopped
75
+ ports:
76
+ - "3000:3000"
77
+ depends_on:
78
+ - backend
79
+ environment:
80
+ NEXT_PUBLIC_API_URL: "http://localhost:8000"
81
+
82
+ volumes:
83
+ neo4j_data:
84
+ neo4j_logs:
docker/Dockerfile ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ═══════════════════════════════════════════════════════════════════════════════
2
+ # ClinicalMatch AI — HuggingFace Spaces Dockerfile
3
+ # Single container: Neo4j Community + FastAPI + Next.js + Nginx (supervisord)
4
+ # Exposed port: 7860 (HF Spaces default)
5
+ # Persistent storage: /data (Neo4j data lives here — survives restarts)
6
+ # ═══════════════════════════════════════════════════════════════════════════════
7
+
8
+ # ── Stage 1: Build Next.js ────────────────────────────────────────────────────
9
+ FROM node:20-slim AS frontend-builder
10
+
11
+ WORKDIR /build/frontend
12
+
13
+ COPY frontend/package*.json ./
14
+ RUN npm install --legacy-peer-deps --prefer-offline
15
+
16
+ COPY frontend/ ./
17
+
18
+ # Build with empty API URL so all requests are relative (Nginx routes them)
19
+ ENV NEXT_PUBLIC_API_URL=""
20
+ RUN npm run build
21
+
22
+ # ── Stage 2: Final runtime image ──────────────────────────────────────────────
23
+ FROM ubuntu:22.04
24
+
25
+ ENV DEBIAN_FRONTEND=noninteractive
26
+ ENV LANG=C.UTF-8
27
+
28
+ # ── System dependencies ────────────────────────────────────────────────────────
29
+ RUN apt-get update && apt-get install -y --no-install-recommends \
30
+ # Java for Neo4j
31
+ openjdk-17-jre-headless \
32
+ # Python
33
+ python3.11 python3-pip python3.11-venv \
34
+ # Web / infra
35
+ nginx \
36
+ supervisor \
37
+ # Utilities
38
+ curl wget ca-certificates gnupg \
39
+ && rm -rf /var/lib/apt/lists/*
40
+
41
+ # ── Node.js 20 ────────────────────────────────────────────────────────────────
42
+ RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
43
+ && apt-get install -y --no-install-recommends nodejs \
44
+ && rm -rf /var/lib/apt/lists/*
45
+
46
+ # ── Neo4j Community 5.x ───────────────────────────────────────────────────────
47
+ ENV NEO4J_VERSION=5.18.0
48
+ ENV NEO4J_HOME=/opt/neo4j
49
+ ENV PATH="${NEO4J_HOME}/bin:${PATH}"
50
+
51
+ ENV APOC_VERSION=5.18.0
52
+
53
+ RUN wget -q "https://dist.neo4j.org/neo4j-community-${NEO4J_VERSION}-unix.tar.gz" \
54
+ && tar -xzf "neo4j-community-${NEO4J_VERSION}-unix.tar.gz" -C /opt \
55
+ && mv "/opt/neo4j-community-${NEO4J_VERSION}" /opt/neo4j \
56
+ && rm "neo4j-community-${NEO4J_VERSION}-unix.tar.gz" \
57
+ && rm -rf /opt/neo4j/data # will be symlinked to /data at runtime
58
+
59
+ # Download APOC plugin (Community-compatible jar)
60
+ RUN wget -q \
61
+ "https://github.com/neo4j/apoc/releases/download/${APOC_VERSION}/apoc-${APOC_VERSION}-core.jar" \
62
+ -O /opt/neo4j/plugins/apoc-${APOC_VERSION}-core.jar
63
+
64
+ # Neo4j configuration — listen on all interfaces, use /data for persistence
65
+ RUN { \
66
+ echo "server.bolt.listen_address=0.0.0.0:7687"; \
67
+ echo "server.http.listen_address=0.0.0.0:7474"; \
68
+ echo "server.directories.data=/data/neo4j/data"; \
69
+ echo "server.directories.logs=/data/neo4j/logs"; \
70
+ echo "server.directories.plugins=/data/neo4j/plugins"; \
71
+ echo "dbms.security.auth_enabled=true"; \
72
+ echo "dbms.security.procedures.unrestricted=apoc.*"; \
73
+ echo "dbms.security.procedures.allowlist=apoc.*"; \
74
+ echo "server.memory.heap.initial_size=512m"; \
75
+ echo "server.memory.heap.max_size=1g"; \
76
+ echo "server.memory.pagecache.size=256m"; \
77
+ echo "db.transaction.timeout=60s"; \
78
+ echo "dbms.logs.query.enabled=OFF"; \
79
+ } >> /opt/neo4j/conf/neo4j.conf
80
+
81
+ # ── Python backend ────────────────────────────────────────────────────────────
82
+ WORKDIR /app/backend
83
+
84
+ COPY backend/requirements.txt .
85
+ RUN pip3 install --no-cache-dir -r requirements.txt
86
+
87
+ COPY backend/ .
88
+
89
+ # ── Next.js frontend (pre-built) ───────────────────────────────────────────────
90
+ WORKDIR /app/frontend
91
+
92
+ # Copy only what Next.js needs to run (not dev deps)
93
+ COPY --from=frontend-builder /build/frontend/.next/standalone ./
94
+ COPY --from=frontend-builder /build/frontend/.next/static ./.next/static
95
+ COPY --from=frontend-builder /build/frontend/public ./public
96
+
97
+ # ── Config files ───────────────────────────────────────────────────────────────
98
+ COPY docker/nginx.conf /app/docker/nginx.conf
99
+ COPY docker/supervisord.conf /app/docker/supervisord.conf
100
+ COPY docker/entrypoint.sh /app/docker/entrypoint.sh
101
+
102
+ RUN chmod +x /app/docker/entrypoint.sh
103
+
104
+ # ── Nginx writable dirs (runs without root after init) ────────────────────────
105
+ RUN mkdir -p /tmp/nginx-cache /tmp/nginx-body /tmp/nginx-run \
106
+ && chown -R www-data:www-data /var/log/nginx /var/lib/nginx 2>/dev/null || true
107
+
108
+ # ── Expose & environment ───────────────────────────────────────────────────────
109
+ EXPOSE 7860
110
+
111
+ # Neo4j — local Community instance (no Aura)
112
+ ENV NEO4J_URI=bolt://127.0.0.1:7687
113
+ ENV NEO4J_USERNAME=neo4j
114
+ ENV NEO4J_PASSWORD=clinicalmatch2024
115
+ ENV NEO4J_DATABASE=neo4j
116
+
117
+ # LLM — OpenAI-compatible (set real values via HF Spaces secrets)
118
+ ENV OPENAI_API_KEY=""
119
+ ENV OPENAI_BASE_URL=https://ai.aimlapi.com/v1
120
+ ENV OPENAI_MODEL=claude-opus-4-7
121
+
122
+ # Next.js standalone listens on 3000 internally; Nginx routes externally
123
+ ENV PORT=3000
124
+ ENV HOSTNAME=127.0.0.1
125
+
126
+ WORKDIR /app
127
+
128
+ ENTRYPOINT ["/app/docker/entrypoint.sh"]
docker/Dockerfile.backend ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ RUN apt-get update && apt-get install -y --no-install-recommends curl \
6
+ && rm -rf /var/lib/apt/lists/*
7
+
8
+ COPY backend/requirements.txt .
9
+ RUN pip install --no-cache-dir -r requirements.txt
10
+
11
+ COPY backend/ .
12
+
13
+ EXPOSE 8000
14
+
15
+ CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
docker/Dockerfile.frontend ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM node:20-slim AS builder
2
+
3
+ WORKDIR /app
4
+
5
+ COPY frontend/package*.json ./
6
+ RUN npm install --legacy-peer-deps
7
+
8
+ COPY frontend/ ./
9
+
10
+ ARG NEXT_PUBLIC_API_URL=http://localhost:8000
11
+ ENV NEXT_PUBLIC_API_URL=${NEXT_PUBLIC_API_URL}
12
+
13
+ RUN npm run build
14
+
15
+ # ── Runtime ────────────────────────────────────────────────────────────────────
16
+ FROM node:20-slim
17
+
18
+ WORKDIR /app
19
+
20
+ COPY --from=builder /app/.next/standalone ./
21
+ COPY --from=builder /app/.next/static ./.next/static
22
+ COPY --from=builder /app/public ./public
23
+
24
+ EXPOSE 3000
25
+
26
+ ENV PORT=3000
27
+ ENV HOSTNAME=0.0.0.0
28
+
29
+ CMD ["node", "server.js"]
docker/entrypoint.sh ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ set -e
3
+
4
+ log() { echo "[entrypoint] $*"; }
5
+
6
+ # ── Persistent data dirs (HF Spaces mounts /data) ─────────────────────────────
7
+ mkdir -p /data/neo4j/data /data/neo4j/logs /data/neo4j/plugins
8
+
9
+ # Symlink Neo4j data dir to persistent volume
10
+ if [ ! -L /opt/neo4j/data ]; then
11
+ rm -rf /opt/neo4j/data
12
+ ln -sf /data/neo4j/data /opt/neo4j/data
13
+ fi
14
+ if [ ! -L /opt/neo4j/logs ]; then
15
+ rm -rf /opt/neo4j/logs
16
+ ln -sf /data/neo4j/logs /opt/neo4j/logs
17
+ fi
18
+
19
+ # ── Neo4j password bootstrap (first-boot only) ────────────────────────────────
20
+ NEO4J_PASS="${NEO4J_PASSWORD:-clinicalmatch2024}"
21
+
22
+ if [ ! -f /data/.neo4j_ready ]; then
23
+ log "First boot — initialising Neo4j password..."
24
+ # Start Neo4j with default password, change it, stop cleanly
25
+ /opt/neo4j/bin/neo4j start
26
+ log "Waiting for Neo4j to accept connections..."
27
+ for i in $(seq 1 30); do
28
+ if /opt/neo4j/bin/cypher-shell -u neo4j -p neo4j \
29
+ "RETURN 1;" >/dev/null 2>&1; then
30
+ break
31
+ fi
32
+ sleep 2
33
+ done
34
+ /opt/neo4j/bin/cypher-shell -u neo4j -p neo4j \
35
+ "ALTER CURRENT USER SET PASSWORD FROM 'neo4j' TO '$NEO4J_PASS';" 2>/dev/null || true
36
+ /opt/neo4j/bin/neo4j stop
37
+ sleep 3
38
+
39
+ # Run schema + sample data seeding
40
+ log "Seeding schema and sample data..."
41
+ cd /app/backend
42
+ NEO4J_URI=bolt://127.0.0.1:7687 \
43
+ NEO4J_USERNAME=neo4j \
44
+ NEO4J_PASSWORD="$NEO4J_PASS" \
45
+ python3 -c "
46
+ from neo4j_setup import setup_schema
47
+ from data_ingestion import ingest_sample_data
48
+ setup_schema()
49
+ ingest_sample_data()
50
+ print('Schema and sample data ready.')
51
+ " 2>/dev/null || log "Seeding deferred — Neo4j not yet ready (will retry via /setup endpoint)"
52
+
53
+ touch /data/.neo4j_ready
54
+ log "Neo4j initialisation complete."
55
+ fi
56
+
57
+ # ── Nginx tmp dirs (runs as non-root) ─────────────────────────────────────────
58
+ mkdir -p /tmp/nginx-cache /tmp/nginx-body
59
+
60
+ log "Starting all services via supervisord..."
61
+ exec /usr/bin/supervisord -c /app/docker/supervisord.conf
docker/nginx.conf ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ worker_processes 1;
2
+ error_log /tmp/nginx-error.log warn;
3
+ pid /tmp/nginx.pid;
4
+
5
+ events {
6
+ worker_connections 512;
7
+ }
8
+
9
+ http {
10
+ include /etc/nginx/mime.types;
11
+ default_type application/octet-stream;
12
+ access_log /tmp/nginx-access.log;
13
+ sendfile on;
14
+ keepalive_timeout 65;
15
+
16
+ # Upstream services (all internal)
17
+ upstream frontend {
18
+ server 127.0.0.1:3000;
19
+ }
20
+
21
+ upstream backend {
22
+ server 127.0.0.1:8000;
23
+ }
24
+
25
+ upstream neo4j_browser {
26
+ server 127.0.0.1:7474;
27
+ }
28
+
29
+ server {
30
+ listen 7860;
31
+ server_name _;
32
+
33
+ client_max_body_size 20M;
34
+
35
+ # ── FastAPI backend ────────────────────────────────────────────────
36
+ # Routes: /api/*, /docs, /openapi.json, /health, /seed, /setup
37
+ location /api/ {
38
+ proxy_pass http://backend/api/;
39
+ proxy_http_version 1.1;
40
+ proxy_set_header Host $host;
41
+ proxy_set_header X-Real-IP $remote_addr;
42
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
43
+ proxy_set_header X-Forwarded-Proto $scheme;
44
+ proxy_read_timeout 120s;
45
+ }
46
+
47
+ location ~ ^/(docs|openapi\.json|redoc|health|seed|setup|ingest_patient|match_trials|enrich_graph|rag_query|setup_sample_data) {
48
+ proxy_pass http://backend;
49
+ proxy_http_version 1.1;
50
+ proxy_set_header Host $host;
51
+ proxy_set_header X-Real-IP $remote_addr;
52
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
53
+ proxy_set_header X-Forwarded-Proto $scheme;
54
+ proxy_read_timeout 120s;
55
+ }
56
+
57
+ # ── Neo4j Browser (admin only — /neo4j/) ──────────────────────────
58
+ location /neo4j/ {
59
+ proxy_pass http://neo4j_browser/;
60
+ proxy_http_version 1.1;
61
+ proxy_set_header Host $host;
62
+ proxy_set_header X-Real-IP $remote_addr;
63
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
64
+ proxy_set_header X-Forwarded-Proto $scheme;
65
+ }
66
+
67
+ # ── Next.js frontend (catch-all) ───────────────────────────────────
68
+ location / {
69
+ proxy_pass http://frontend;
70
+ proxy_http_version 1.1;
71
+ proxy_set_header Upgrade $http_upgrade;
72
+ proxy_set_header Connection "upgrade";
73
+ proxy_set_header Host $host;
74
+ proxy_set_header X-Real-IP $remote_addr;
75
+ proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
76
+ proxy_set_header X-Forwarded-Proto $scheme;
77
+ proxy_read_timeout 60s;
78
+ }
79
+ }
80
+ }
docker/supervisord.conf ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [unix_http_server]
2
+ file=/tmp/supervisor.sock
3
+
4
+ [supervisord]
5
+ nodaemon=true
6
+ logfile=/tmp/supervisord.log
7
+ pidfile=/tmp/supervisord.pid
8
+ loglevel=info
9
+
10
+ [rpcinterface:supervisor]
11
+ supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
12
+
13
+ [supervisorctl]
14
+ serverurl=unix:///tmp/supervisor.sock
15
+
16
+ # ── Neo4j Community ────────────────────────────────────────────────────────────
17
+ [program:neo4j]
18
+ command=/opt/neo4j/bin/neo4j console
19
+ environment=NEO4J_HOME=/opt/neo4j,JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
20
+ autostart=true
21
+ autorestart=true
22
+ startsecs=30
23
+ startretries=3
24
+ stdout_logfile=/tmp/neo4j.log
25
+ stderr_logfile=/tmp/neo4j.log
26
+ redirect_stderr=true
27
+ priority=10
28
+
29
+ # ── FastAPI backend ────────────────────────────────────────────────────────────
30
+ [program:backend]
31
+ command=python3 -m uvicorn main:app --host 127.0.0.1 --port 8000 --workers 2
32
+ directory=/app/backend
33
+ environment=
34
+ NEO4J_URI="bolt://127.0.0.1:7687",
35
+ NEO4J_USERNAME="%(ENV_NEO4J_USERNAME)s",
36
+ NEO4J_PASSWORD="%(ENV_NEO4J_PASSWORD)s",
37
+ NEO4J_DATABASE="%(ENV_NEO4J_DATABASE)s",
38
+ OPENAI_API_KEY="%(ENV_OPENAI_API_KEY)s",
39
+ OPENAI_BASE_URL="%(ENV_OPENAI_BASE_URL)s",
40
+ OPENAI_MODEL="%(ENV_OPENAI_MODEL)s"
41
+ autostart=true
42
+ autorestart=true
43
+ startsecs=10
44
+ startretries=5
45
+ stdout_logfile=/tmp/backend.log
46
+ stderr_logfile=/tmp/backend.log
47
+ redirect_stderr=true
48
+ priority=30
49
+
50
+ # ── Next.js frontend ───────────────────────────────────────────────────────────
51
+ [program:frontend]
52
+ command=node server.js
53
+ directory=/app/frontend
54
+ environment=PORT="3000",HOSTNAME="127.0.0.1"
55
+ autostart=true
56
+ autorestart=true
57
+ startsecs=5
58
+ stdout_logfile=/tmp/frontend.log
59
+ stderr_logfile=/tmp/frontend.log
60
+ redirect_stderr=true
61
+ priority=40
62
+
63
+ # ── Nginx reverse proxy ────────────────────────────────────────────────────────
64
+ [program:nginx]
65
+ command=nginx -c /app/docker/nginx.conf -g "daemon off;"
66
+ autostart=true
67
+ autorestart=true
68
+ startsecs=3
69
+ stdout_logfile=/tmp/nginx.log
70
+ stderr_logfile=/tmp/nginx.log
71
+ redirect_stderr=true
72
+ priority=50
frontend/.gitignore ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
2
+
3
+ # dependencies
4
+ /node_modules
5
+ /.pnp
6
+ .pnp.*
7
+ .yarn/*
8
+ !.yarn/patches
9
+ !.yarn/plugins
10
+ !.yarn/releases
11
+ !.yarn/versions
12
+
13
+ # testing
14
+ /coverage
15
+
16
+ # next.js
17
+ /.next/
18
+ /out/
19
+
20
+ # production
21
+ /build
22
+
23
+ # misc
24
+ .DS_Store
25
+ *.pem
26
+
27
+ # debug
28
+ npm-debug.log*
29
+ yarn-debug.log*
30
+ yarn-error.log*
31
+ .pnpm-debug.log*
32
+
33
+ # env files (can opt-in for committing if needed)
34
+ .env*
35
+
36
+ # vercel
37
+ .vercel
38
+
39
+ # typescript
40
+ *.tsbuildinfo
41
+ next-env.d.ts
frontend/README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app).
2
+
3
+ ## Getting Started
4
+
5
+ First, run the development server:
6
+
7
+ ```bash
8
+ npm run dev
9
+ # or
10
+ yarn dev
11
+ # or
12
+ pnpm dev
13
+ # or
14
+ bun dev
15
+ ```
16
+
17
+ Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
18
+
19
+ You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file.
20
+
21
+ This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load [Geist](https://vercel.com/font), a new font family for Vercel.
22
+
23
+ ## Learn More
24
+
25
+ To learn more about Next.js, take a look at the following resources:
26
+
27
+ - [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API.
28
+ - [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.
29
+
30
+ You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome!
31
+
32
+ ## Deploy on Vercel
33
+
34
+ The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.
35
+
36
+ Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.
frontend/eslint.config.mjs ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { defineConfig, globalIgnores } from "eslint/config";
2
+ import nextVitals from "eslint-config-next/core-web-vitals";
3
+ import nextTs from "eslint-config-next/typescript";
4
+
5
+ const eslintConfig = defineConfig([
6
+ ...nextVitals,
7
+ ...nextTs,
8
+ // Override default ignores of eslint-config-next.
9
+ globalIgnores([
10
+ // Default ignores of eslint-config-next:
11
+ ".next/**",
12
+ "out/**",
13
+ "build/**",
14
+ "next-env.d.ts",
15
+ ]),
16
+ ]);
17
+
18
+ export default eslintConfig;
frontend/next.config.ts ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { NextConfig } from "next";
2
+
3
+ const nextConfig: NextConfig = {
4
+ ...(process.env.NODE_ENV === "production" ? { output: "standalone" } : {}),
5
+
6
+ experimental: {
7
+ // Tree-shake large icon/chart libs — only bundle exports that are used
8
+ optimizePackageImports: ["lucide-react", "recharts"],
9
+ },
10
+
11
+ webpack(config, { dev }) {
12
+ if (dev) {
13
+ // Persist compiled modules to disk so server restarts reuse the cache
14
+ config.cache = {
15
+ type: "filesystem",
16
+ allowCollectingMemory: true,
17
+ };
18
+ }
19
+ return config;
20
+ },
21
+ turbopack: {} // Add this line
22
+
23
+ };
24
+
25
+ export default nextConfig;
26
+
27
+
28
+
29
+
30
+
frontend/package-lock.json ADDED
The diff for this file is too large to render. See raw diff
 
frontend/package.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "frontend",
3
+ "version": "0.1.0",
4
+ "private": true,
5
+ "scripts": {
6
+ "dev": "next dev --webpack",
7
+ "prewarm": "node scripts/prewarm.mjs",
8
+ "build": "next build",
9
+ "start": "next start",
10
+ "lint": "eslint"
11
+ },
12
+ "dependencies": {
13
+ "autoprefixer": "^10.5.0",
14
+ "clsx": "^2.1.1",
15
+ "geist": "^1.7.0",
16
+ "leaflet": "^1.9.4",
17
+ "lucide-react": "^0.511.0",
18
+ "next": "16.2.4",
19
+ "react": "19.2.4",
20
+ "react-dom": "19.2.4",
21
+ "react-leaflet": "^5.0.0",
22
+ "recharts": "^2.15.0"
23
+ },
24
+ "devDependencies": {
25
+ "@types/leaflet": "^1.9.19",
26
+ "@types/node": "^20",
27
+ "@types/react": "^19",
28
+ "@types/react-dom": "^19",
29
+ "eslint": "^9",
30
+ "eslint-config-next": "16.2.4",
31
+ "tailwindcss": "^3.4.19",
32
+ "typescript": "^5"
33
+ }
34
+ }
frontend/postcss.config.mjs ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ const config = {
2
+ plugins: {
3
+ tailwindcss: {},
4
+ autoprefixer: {},
5
+ },
6
+ };
7
+
8
+ export default config;
frontend/public/file.svg ADDED
frontend/public/globe.svg ADDED
frontend/public/next.svg ADDED
frontend/public/vercel.svg ADDED
frontend/public/window.svg ADDED
frontend/scripts/prewarm.mjs ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env node
2
+ // Hits every route once so webpack compiles them before the user navigates.
3
+ // Run alongside the dev server: npm run prewarm
4
+
5
+ const ROUTES = ["/", "/screening", "/recruitment", "/dashboard", "/map", "/graph"];
6
+ const BASE = process.env.NEXT_PUBLIC_API_URL?.replace("/api", "") ?? "http://localhost:3000";
7
+
8
+ async function waitForServer(url, retries = 30) {
9
+ for (let i = 0; i < retries; i++) {
10
+ try {
11
+ const r = await fetch(url, { signal: AbortSignal.timeout(3000) });
12
+ if (r.ok || r.status < 500) return true;
13
+ } catch {}
14
+ await new Promise((r) => setTimeout(r, 2000));
15
+ }
16
+ return false;
17
+ }
18
+
19
+ const base = "http://localhost:3000";
20
+ console.log("Waiting for dev server…");
21
+ const up = await waitForServer(base);
22
+ if (!up) { console.error("Dev server never came up"); process.exit(1); }
23
+
24
+ console.log("Pre-warming routes (this compiles each page bundle once):");
25
+ for (const route of ROUTES) {
26
+ const start = Date.now();
27
+ try {
28
+ await fetch(`${base}${route}`, { signal: AbortSignal.timeout(120_000) });
29
+ console.log(` ✓ ${route} — ${Date.now() - start}ms`);
30
+ } catch (e) {
31
+ console.log(` ✗ ${route} — ${e.message}`);
32
+ }
33
+ }
34
+ console.log("All routes compiled. Navigation will now be instant.");
frontend/src/app/consent/page.tsx ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ "use client";
2
+
3
+ import { useState, useEffect } from "react";
4
+ import { getConsents, getConsentStats, updateConsentStatus, getAppointments, confirmAppointment } from "@/lib/api";
5
+ import { FileSignature, Calendar, CheckCircle, XCircle, Clock, Loader2, RefreshCw } from "lucide-react";
6
+ import { clsx } from "clsx";
7
+
8
+ const STATUS_COLORS: Record<string, string> = {
9
+ SENT: "bg-blue-100 text-blue-700",
10
+ SIGNED: "bg-emerald-100 text-emerald-700",
11
+ DECLINED: "bg-red-100 text-red-700",
12
+ EXPIRED: "bg-slate-100 text-slate-500",
13
+ PENDING: "bg-amber-100 text-amber-700",
14
+ };
15
+
16
+ const APPT_COLORS: Record<string, string> = {
17
+ PROPOSED: "bg-amber-100 text-amber-700",
18
+ CONFIRMED: "bg-emerald-100 text-emerald-700",
19
+ };
20
+
21
+ function StatCard({ label, value, color }: { label: string; value: number; color: string }) {
22
+ return (
23
+ <div className="bg-white rounded-xl border border-slate-200 p-4 text-center">
24
+ <div className={clsx("text-3xl font-bold", color)}>{value}</div>
25
+ <div className="text-xs text-slate-500 mt-1">{label}</div>
26
+ </div>
27
+ );
28
+ }
29
+
30
+ export default function ConsentPage() {
31
+ const [consents, setConsents] = useState<any[]>([]);
32
+ const [appointments, setAppointments] = useState<any[]>([]);
33
+ const [stats, setStats] = useState<any>(null);
34
+ const [loading, setLoading] = useState(true);
35
+ const [tab, setTab] = useState<"consents" | "appointments">("consents");
36
+ const [expandedConsent, setExpandedConsent] = useState<string | null>(null);
37
+ const [updating, setUpdating] = useState<string | null>(null);
38
+
39
+ const refresh = async () => {
40
+ setLoading(true);
41
+ try {
42
+ const [c, a, s] = await Promise.all([
43
+ getConsents(),
44
+ getAppointments(),
45
+ getConsentStats(),
46
+ ]);
47
+ setConsents(c.consents);
48
+ setAppointments(a.appointments);
49
+ setStats(s);
50
+ } catch {}
51
+ setLoading(false);
52
+ };
53
+
54
+ useEffect(() => { refresh(); }, []);
55
+
56
+ const handleConsentAction = async (consentId: string, status: string) => {
57
+ setUpdating(consentId);
58
+ try {
59
+ await updateConsentStatus(consentId, status);
60
+ await refresh();
61
+ } catch {}
62
+ setUpdating(null);
63
+ };
64
+
65
+ const handleConfirmAppt = async (apptId: string) => {
66
+ setUpdating(apptId);
67
+ try {
68
+ await confirmAppointment(apptId);
69
+ await refresh();
70
+ } catch {}
71
+ setUpdating(null);
72
+ };
73
+
74
+ return (
75
+ <div className="p-6 max-w-5xl mx-auto">
76
+ <div className="flex items-center justify-between mb-6">
77
+ <div>
78
+ <h1 className="text-2xl font-bold text-slate-900 mb-1">Consent & Scheduling</h1>
79
+ <p className="text-slate-500 text-sm">A2A-powered consent workflow and appointment management</p>
80
+ </div>
81
+ <button onClick={refresh} className="flex items-center gap-2 text-sm text-slate-500 hover:text-slate-700 border border-slate-200 rounded-lg px-3 py-1.5">
82
+ <RefreshCw className="w-3.5 h-3.5" /> Refresh
83
+ </button>
84
+ </div>
85
+
86
+ {/* Stats */}
87
+ {stats && (
88
+ <div className="grid grid-cols-5 gap-3 mb-6">
89
+ <StatCard label="Total Consents" value={stats.total} color="text-slate-700" />
90
+ <StatCard label="Sent" value={stats.sent} color="text-blue-600" />
91
+ <StatCard label="Signed" value={stats.signed} color="text-emerald-600" />
92
+ <StatCard label="Declined" value={stats.declined} color="text-red-600" />
93
+ <StatCard label="Appointments" value={stats.appointments_scheduled} color="text-indigo-600" />
94
+ </div>
95
+ )}
96
+
97
+ {/* Tabs */}
98
+ <div className="flex gap-2 mb-4">
99
+ {(["consents", "appointments"] as const).map((t) => (
100
+ <button
101
+ key={t}
102
+ onClick={() => setTab(t)}
103
+ className={clsx("flex items-center gap-2 px-4 py-2 rounded-lg text-sm font-medium transition-colors",
104
+ tab === t ? "bg-indigo-600 text-white" : "bg-white border border-slate-200 text-slate-600 hover:bg-slate-50"
105
+ )}
106
+ >
107
+ {t === "consents" ? <FileSignature className="w-4 h-4" /> : <Calendar className="w-4 h-4" />}
108
+ {t === "consents" ? `Consent Forms (${consents.length})` : `Appointments (${appointments.length})`}
109
+ </button>
110
+ ))}
111
+ </div>
112
+
113
+ {loading ? (
114
+ <div className="flex items-center justify-center py-16 text-slate-400">
115
+ <Loader2 className="w-6 h-6 animate-spin mr-2" /> Loading...
116
+ </div>
117
+ ) : tab === "consents" ? (
118
+ <div className="space-y-3">
119
+ {consents.length === 0 ? (
120
+ <div className="bg-white rounded-xl border border-slate-200 p-8 text-center text-slate-400 text-sm">
121
+ No consent records yet. Run the A2A Pipeline on the Screening page to generate consent requests automatically.
122
+ </div>
123
+ ) : consents.map((c: any) => (
124
+ <div key={c.consent_id} className="bg-white rounded-xl border border-slate-200 overflow-hidden">
125
+ <div
126
+ className="flex items-center gap-4 p-4 cursor-pointer hover:bg-slate-50"
127
+ onClick={() => setExpandedConsent(expandedConsent === c.consent_id ? null : c.consent_id)}
128
+ >
129
+ <FileSignature className="w-5 h-5 text-indigo-400 shrink-0" />
130
+ <div className="flex-1 min-w-0">
131
+ <div className="font-medium text-slate-900 text-sm truncate">{c.trial_title || c.nct_id}</div>
132
+ <div className="text-xs text-slate-500 mt-0.5">
133
+ Patient: {c.patient_id} · NCT: {c.nct_id} · Score: {Math.round((c.match_score || 0) * 100)}%
134
+ </div>
135
+ </div>
136
+ <span className={clsx("text-xs font-medium px-2.5 py-1 rounded-full shrink-0", STATUS_COLORS[c.status] || STATUS_COLORS.PENDING)}>
137
+ {c.status}
138
+ </span>
139
+ {c.status === "SENT" && (
140
+ <div className="flex gap-2 shrink-0">
141
+ <button
142
+ disabled={updating === c.consent_id}
143
+ onClick={(e) => { e.stopPropagation(); handleConsentAction(c.consent_id, "SIGNED"); }}
144
+ className="flex items-center gap-1 bg-emerald-600 text-white text-xs px-3 py-1.5 rounded-lg hover:bg-emerald-700 disabled:opacity-50"
145
+ >
146
+ {updating === c.consent_id ? <Loader2 className="w-3 h-3 animate-spin" /> : <CheckCircle className="w-3 h-3" />}
147
+ Sign
148
+ </button>
149
+ <button
150
+ disabled={updating === c.consent_id}
151
+ onClick={(e) => { e.stopPropagation(); handleConsentAction(c.consent_id, "DECLINED"); }}
152
+ className="flex items-center gap-1 border border-red-200 text-red-600 text-xs px-3 py-1.5 rounded-lg hover:bg-red-50 disabled:opacity-50"
153
+ >
154
+ <XCircle className="w-3 h-3" /> Decline
155
+ </button>
156
+ </div>
157
+ )}
158
+ </div>
159
+
160
+ {expandedConsent === c.consent_id && c.consent_document && (
161
+ <div className="px-4 pb-4 border-t border-slate-100">
162
+ <div className="mt-3 bg-slate-50 rounded-lg p-4 text-xs text-slate-700 whitespace-pre-wrap leading-relaxed max-h-64 overflow-y-auto font-mono">
163
+ {c.consent_document}
164
+ </div>
165
+ <div className="flex items-center gap-4 mt-2 text-xs text-slate-400">
166
+ <span>Created: {new Date(c.created_at).toLocaleDateString()}</span>
167
+ <span>Expires: {new Date(c.expires_at).toLocaleDateString()}</span>
168
+ {c.signed_at && <span className="text-emerald-600">Signed: {new Date(c.signed_at).toLocaleDateString()}</span>}
169
+ </div>
170
+ </div>
171
+ )}
172
+ </div>
173
+ ))}
174
+ </div>
175
+ ) : (
176
+ <div className="space-y-3">
177
+ {appointments.length === 0 ? (
178
+ <div className="bg-white rounded-xl border border-slate-200 p-8 text-center text-slate-400 text-sm">
179
+ No appointments scheduled. Appointments are automatically created when a consent is signed.
180
+ </div>
181
+ ) : appointments.map((a: any) => (
182
+ <div key={a.appointment_id} className="bg-white rounded-xl border border-slate-200 p-4 flex items-center gap-4">
183
+ <Calendar className="w-5 h-5 text-indigo-400 shrink-0" />
184
+ <div className="flex-1 min-w-0">
185
+ <div className="font-medium text-slate-900 text-sm">{a.nct_id}</div>
186
+ <div className="text-xs text-slate-500 mt-0.5">
187
+ Patient: {a.patient_id}
188
+ {a.site_city && ` · Site: ${a.site_city}${a.site_state ? ", " + a.site_state : ""}`}
189
+ </div>
190
+ <div className="flex items-center gap-1 mt-1 text-xs text-slate-600">
191
+ <Clock className="w-3 h-3" />
192
+ {new Date(a.proposed_datetime).toLocaleString()}
193
+ </div>
194
+ </div>
195
+ <span className={clsx("text-xs font-medium px-2.5 py-1 rounded-full shrink-0", APPT_COLORS[a.status] || "bg-slate-100 text-slate-600")}>
196
+ {a.status}
197
+ </span>
198
+ {a.status === "PROPOSED" && (
199
+ <button
200
+ disabled={updating === a.appointment_id}
201
+ onClick={() => handleConfirmAppt(a.appointment_id)}
202
+ className="flex items-center gap-1 bg-indigo-600 text-white text-xs px-3 py-1.5 rounded-lg hover:bg-indigo-700 disabled:opacity-50 shrink-0"
203
+ >
204
+ {updating === a.appointment_id ? <Loader2 className="w-3 h-3 animate-spin" /> : <CheckCircle className="w-3 h-3" />}
205
+ Confirm
206
+ </button>
207
+ )}
208
+ </div>
209
+ ))}
210
+ </div>
211
+ )}
212
+ </div>
213
+ );
214
+ }
frontend/src/app/dashboard/page.tsx ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ "use client";
2
+
3
+ import { useEffect, useState } from "react";
4
+ import { getKPIs, getEnrollmentFunnel, getSitePerformance, getDemographics, getTimeline } from "@/lib/api";
5
+ import {
6
+ BarChart, Bar, XAxis, YAxis, Tooltip, ResponsiveContainer, PieChart, Pie, Cell,
7
+ LineChart, Line, CartesianGrid, Legend,
8
+ } from "recharts";
9
+ import { TrendingUp, Users, FlaskConical, Clock, DollarSign, Loader2 } from "lucide-react";
10
+
11
+ function KPICard({ label, value, sub, icon: Icon, color }: { label: string; value: string; sub?: string; icon: any; color: string }) {
12
+ return (
13
+ <div className="bg-white rounded-xl border border-slate-200 p-5">
14
+ <div className="flex items-start justify-between">
15
+ <div>
16
+ <p className="text-xs text-slate-500 mb-1">{label}</p>
17
+ <p className="text-2xl font-bold text-slate-900">{value}</p>
18
+ {sub && <p className="text-xs text-slate-400 mt-0.5">{sub}</p>}
19
+ </div>
20
+ <div className={`p-2.5 rounded-lg ${color}`}>
21
+ <Icon className="w-5 h-5 text-white" />
22
+ </div>
23
+ </div>
24
+ </div>
25
+ );
26
+ }
27
+
28
+ export default function DashboardPage() {
29
+ const [kpis, setKpis] = useState<any>(null);
30
+ const [funnel, setFunnel] = useState<any[]>([]);
31
+ const [sites, setSites] = useState<any[]>([]);
32
+ const [demographics, setDemographics] = useState<any>(null);
33
+ const [timeline, setTimeline] = useState<any[]>([]);
34
+ const [loading, setLoading] = useState(true);
35
+
36
+ useEffect(() => {
37
+ Promise.all([
38
+ getKPIs(),
39
+ getEnrollmentFunnel(),
40
+ getSitePerformance(),
41
+ getDemographics(),
42
+ getTimeline(30),
43
+ ]).then(([k, f, s, d, t]) => {
44
+ setKpis(k);
45
+ setFunnel(f.funnel);
46
+ setSites(s.sites);
47
+ setDemographics(d);
48
+ setTimeline(t.timeline.filter((_: any, i: number) => i % 3 === 0)); // Sample every 3 days
49
+ }).finally(() => setLoading(false));
50
+ }, []);
51
+
52
+ if (loading) {
53
+ return (
54
+ <div className="flex items-center justify-center h-64">
55
+ <Loader2 className="w-6 h-6 animate-spin text-indigo-500" />
56
+ </div>
57
+ );
58
+ }
59
+
60
+ return (
61
+ <div className="p-6 max-w-6xl mx-auto space-y-6">
62
+ <div>
63
+ <h1 className="text-2xl font-bold text-slate-900 mb-1">Analytics Dashboard</h1>
64
+ <p className="text-slate-500 text-sm">Real-time recruitment metrics and trial performance analytics</p>
65
+ </div>
66
+
67
+ {/* KPI cards */}
68
+ {kpis && (
69
+ <div className="grid grid-cols-4 gap-4">
70
+ <KPICard label="Active Trials" value={kpis.active_trials} icon={FlaskConical} color="bg-indigo-500" />
71
+ <KPICard label="Patients Identified" value={kpis.patients_identified.toLocaleString()} sub={`${kpis.patients_enrolled} enrolled`} icon={Users} color="bg-violet-500" />
72
+ <KPICard label="Enrollment Rate" value={`${Math.round(kpis.enrollment_rate * 100)}%`} sub={`${kpis.avg_days_to_match} avg days to match`} icon={TrendingUp} color="bg-emerald-500" />
73
+ <KPICard label="Cost Savings" value={`$${(kpis.cost_saved_usd / 1000).toFixed(0)}K`} sub="vs manual screening" icon={DollarSign} color="bg-amber-500" />
74
+ </div>
75
+ )}
76
+
77
+ <div className="grid grid-cols-2 gap-6">
78
+ {/* Enrollment funnel */}
79
+ <div className="bg-white rounded-xl border border-slate-200 p-5">
80
+ <h2 className="font-semibold text-slate-900 text-sm mb-4">Enrollment Funnel</h2>
81
+ <ResponsiveContainer width="100%" height={220}>
82
+ <BarChart data={funnel} layout="vertical" margin={{ left: 20, right: 20 }}>
83
+ <XAxis type="number" tick={{ fontSize: 11 }} />
84
+ <YAxis type="category" dataKey="stage" tick={{ fontSize: 11 }} width={80} />
85
+ <Tooltip contentStyle={{ fontSize: 12 }} />
86
+ <Bar dataKey="count" radius={[0, 4, 4, 0]}>
87
+ {funnel.map((entry, i) => <Cell key={i} fill={entry.fill} />)}
88
+ </Bar>
89
+ </BarChart>
90
+ </ResponsiveContainer>
91
+ </div>
92
+
93
+ {/* Gender pie */}
94
+ {demographics?.gender_distribution && (
95
+ <div className="bg-white rounded-xl border border-slate-200 p-5">
96
+ <h2 className="font-semibold text-slate-900 text-sm mb-4">Patient Demographics — Gender</h2>
97
+ <div className="flex items-center gap-4">
98
+ <ResponsiveContainer width="60%" height={200}>
99
+ <PieChart>
100
+ <Pie data={demographics.gender_distribution} dataKey="value" cx="50%" cy="50%" outerRadius={80} label={false}>
101
+ {demographics.gender_distribution.map((entry: any, i: number) => <Cell key={i} fill={entry.fill} />)}
102
+ </Pie>
103
+ <Tooltip contentStyle={{ fontSize: 12 }} formatter={(v: any) => [`${v}%`]} />
104
+ </PieChart>
105
+ </ResponsiveContainer>
106
+ <div className="space-y-2">
107
+ {demographics.gender_distribution.map((d: any, i: number) => (
108
+ <div key={i} className="flex items-center gap-2 text-xs">
109
+ <span className="w-3 h-3 rounded-full shrink-0" style={{ background: d.fill }} />
110
+ <span className="text-slate-600">{d.name}</span>
111
+ <span className="text-slate-400 ml-auto">{d.value}%</span>
112
+ </div>
113
+ ))}
114
+ </div>
115
+ </div>
116
+ </div>
117
+ )}
118
+ </div>
119
+
120
+ {/* Enrollment timeline */}
121
+ {timeline.length > 0 && (
122
+ <div className="bg-white rounded-xl border border-slate-200 p-5">
123
+ <h2 className="font-semibold text-slate-900 text-sm mb-4">Enrollment Progress (30 days)</h2>
124
+ <ResponsiveContainer width="100%" height={200}>
125
+ <LineChart data={timeline} margin={{ left: 0, right: 20 }}>
126
+ <CartesianGrid strokeDasharray="3 3" stroke="#f1f5f9" />
127
+ <XAxis dataKey="date" tick={{ fontSize: 10 }} interval={2} />
128
+ <YAxis tick={{ fontSize: 11 }} />
129
+ <Tooltip contentStyle={{ fontSize: 12 }} />
130
+ <Legend wrapperStyle={{ fontSize: 12 }} />
131
+ <Line type="monotone" dataKey="cumulative_enrolled" stroke="#6366f1" name="Enrolled" strokeWidth={2} dot={false} />
132
+ <Line type="monotone" dataKey="target" stroke="#e2e8f0" name="Target" strokeWidth={1.5} strokeDasharray="4 4" dot={false} />
133
+ </LineChart>
134
+ </ResponsiveContainer>
135
+ </div>
136
+ )}
137
+
138
+ {/* Site performance table */}
139
+ {sites.length > 0 && (
140
+ <div className="bg-white rounded-xl border border-slate-200 p-5">
141
+ <h2 className="font-semibold text-slate-900 text-sm mb-4">Site Performance</h2>
142
+ <div className="overflow-x-auto">
143
+ <table className="w-full text-sm">
144
+ <thead>
145
+ <tr className="border-b border-slate-100">
146
+ <th className="text-left py-2 text-xs font-semibold text-slate-500">Site</th>
147
+ <th className="text-left py-2 text-xs font-semibold text-slate-500">City</th>
148
+ <th className="text-center py-2 text-xs font-semibold text-slate-500">Trials</th>
149
+ <th className="text-center py-2 text-xs font-semibold text-slate-500">Enrolled</th>
150
+ <th className="text-center py-2 text-xs font-semibold text-slate-500">Capacity</th>
151
+ <th className="text-left py-2 text-xs font-semibold text-slate-500 w-36">Fill Rate</th>
152
+ </tr>
153
+ </thead>
154
+ <tbody>
155
+ {sites.slice(0, 6).map((site: any, i: number) => (
156
+ <tr key={i} className="border-b border-slate-50 hover:bg-slate-50">
157
+ <td className="py-2.5 font-medium text-slate-800 text-xs">{site.name}</td>
158
+ <td className="py-2.5 text-slate-500 text-xs">{site.city}, {site.state}</td>
159
+ <td className="py-2.5 text-center text-slate-600 text-xs">{site.trials}</td>
160
+ <td className="py-2.5 text-center text-slate-600 text-xs">{site.enrolled}</td>
161
+ <td className="py-2.5 text-center text-slate-600 text-xs">{site.capacity}</td>
162
+ <td className="py-2.5">
163
+ <div className="flex items-center gap-2">
164
+ <div className="flex-1 h-1.5 bg-slate-100 rounded-full overflow-hidden">
165
+ <div
166
+ className="h-full bg-indigo-500 rounded-full"
167
+ style={{ width: `${site.fill_percentage}%` }}
168
+ />
169
+ </div>
170
+ <span className="text-xs text-slate-500 shrink-0">{site.fill_percentage}%</span>
171
+ </div>
172
+ </td>
173
+ </tr>
174
+ ))}
175
+ </tbody>
176
+ </table>
177
+ </div>
178
+ </div>
179
+ )}
180
+ </div>
181
+ );
182
+ }
frontend/src/app/favicon.ico ADDED
frontend/src/app/globals.css ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @tailwind base;
2
+ @tailwind components;
3
+ @tailwind utilities;
4
+
5
+ :root {
6
+ --background: #f8fafc;
7
+ --foreground: #0f172a;
8
+ }
9
+
10
+ body {
11
+ background: var(--background);
12
+ color: var(--foreground);
13
+ }
14
+
15
+ html, body { height: 100%; }
frontend/src/app/graph/page.tsx ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ "use client";
2
+
3
+ import { useState } from "react";
4
+ import { graphQuery, getGraphStats } from "@/lib/api";
5
+ import { MessageSquare, Loader2, Database } from "lucide-react";
6
+ import { useEffect } from "react";
7
+
8
+ const SAMPLE_QUESTIONS = [
9
+ "Which patients are eligible for breast cancer trials?",
10
+ "What trials are in Phase II?",
11
+ "List all patients with HER2 positive biomarker",
12
+ "How many active trials are there for prostate cancer?",
13
+ "Which study sites have the most active trials?",
14
+ ];
15
+
16
+ export default function GraphPage() {
17
+ const [question, setQuestion] = useState("");
18
+ const [response, setResponse] = useState("");
19
+ const [loading, setLoading] = useState(false);
20
+ const [error, setError] = useState("");
21
+ const [stats, setStats] = useState<any>(null);
22
+
23
+ useEffect(() => {
24
+ getGraphStats().then(setStats).catch(() => {});
25
+ }, []);
26
+
27
+ const handleQuery = async (q = question) => {
28
+ if (!q.trim()) return;
29
+ setLoading(true);
30
+ setError("");
31
+ setResponse("");
32
+ setQuestion(q);
33
+ try {
34
+ const data = await graphQuery(q);
35
+ setResponse(data.response);
36
+ } catch (e: any) {
37
+ setError(e.message);
38
+ }
39
+ setLoading(false);
40
+ };
41
+
42
+ return (
43
+ <div className="p-6 max-w-3xl mx-auto">
44
+ <div className="mb-6">
45
+ <h1 className="text-2xl font-bold text-slate-900 mb-1">Graph RAG</h1>
46
+ <p className="text-slate-500 text-sm">Ask natural language questions about the clinical trial knowledge graph</p>
47
+ </div>
48
+
49
+ {stats && (
50
+ <div className="flex gap-3 mb-6">
51
+ {Object.entries(stats).map(([k, v]: any) => (
52
+ <div key={k} className="bg-white border border-slate-200 rounded-lg px-4 py-2.5 flex items-center gap-2">
53
+ <Database className="w-4 h-4 text-indigo-400" />
54
+ <span className="text-sm font-bold text-slate-800">{v}</span>
55
+ <span className="text-xs text-slate-500 capitalize">{k}</span>
56
+ </div>
57
+ ))}
58
+ </div>
59
+ )}
60
+
61
+ <div className="mb-4">
62
+ <p className="text-xs font-semibold text-slate-500 mb-2">Sample Questions</p>
63
+ <div className="flex flex-wrap gap-2">
64
+ {SAMPLE_QUESTIONS.map((q) => (
65
+ <button
66
+ key={q}
67
+ onClick={() => handleQuery(q)}
68
+ className="text-xs bg-indigo-50 text-indigo-700 hover:bg-indigo-100 px-3 py-1.5 rounded-full transition-colors"
69
+ >
70
+ {q}
71
+ </button>
72
+ ))}
73
+ </div>
74
+ </div>
75
+
76
+ <div className="flex gap-3 mb-6">
77
+ <input
78
+ type="text"
79
+ value={question}
80
+ onChange={(e) => setQuestion(e.target.value)}
81
+ onKeyDown={(e) => e.key === "Enter" && handleQuery()}
82
+ placeholder="Ask anything about patients, trials, or biomarkers..."
83
+ className="flex-1 border border-slate-200 rounded-lg px-4 py-2.5 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-500 bg-white"
84
+ />
85
+ <button
86
+ onClick={() => handleQuery()}
87
+ disabled={loading || !question.trim()}
88
+ className="flex items-center gap-2 bg-indigo-600 text-white px-4 py-2.5 rounded-lg text-sm font-medium hover:bg-indigo-700 disabled:opacity-50 transition-colors"
89
+ >
90
+ {loading ? <Loader2 className="w-4 h-4 animate-spin" /> : <MessageSquare className="w-4 h-4" />}
91
+ {loading ? "Querying..." : "Ask"}
92
+ </button>
93
+ </div>
94
+
95
+ {error && (
96
+ <div className="bg-red-50 border border-red-200 text-red-700 rounded-lg px-4 py-3 text-sm mb-4">{error}</div>
97
+ )}
98
+
99
+ {response && (
100
+ <div className="bg-white rounded-xl border border-slate-200 p-5">
101
+ <div className="flex items-center gap-2 mb-3">
102
+ <MessageSquare className="w-4 h-4 text-indigo-500" />
103
+ <span className="text-xs font-semibold text-slate-600">Response</span>
104
+ </div>
105
+ <p className="text-sm text-slate-700 leading-relaxed whitespace-pre-wrap">{response}</p>
106
+ </div>
107
+ )}
108
+ </div>
109
+ );
110
+ }