Spaces:
Sleeping
Sleeping
File size: 12,630 Bytes
59abb4f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 | # ClinicalMatch AI β Agent Instructions
> Project memory (build state, completed features, constraints) is also tracked in `.claude/project_memory.md` in this repo.
This is a hackathon submission for **"Agents Assemble: Healthcare AI Endgame Challenge"** on the Prompt Opinion platform. Judging criteria: MCP compliance, A2A workflow, FHIR R4 standards, AI quality, impact, feasibility.
## Stack at a glance
| Layer | Technology |
|---|---|
| Backend | FastAPI (Python 3.12), uvicorn |
| Graph DB | Neo4j Community 5.x via bolt |
| LLM | claude-opus-4-7 via aimlapi.com (OpenAI-compatible) |
| GraphRAG | LangChain `GraphCypherQAChain` + custom Cypher prompt |
| Frontend | Next.js 16 (webpack mode), React 19, Tailwind CSS 3, Recharts, Leaflet |
| Standards | FHIR R4 Β· MCP (stdio) Β· A2A state machine |
## Critical: LLM API
**Never use the Anthropic SDK directly.** All LLM calls go through aimlapi.com or a compatible alternative using the OpenAI-compatible interface:
```python
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("OPENAI_API_KEY"),
base_url=os.getenv("OPENAI_BASE_URL", "https://ai.aimlapi.com/v1"),
)
model = os.getenv("OPENAI_MODEL", "claude-opus-4-7")
```
See `backend/llm_client.py` for the canonical pattern. Do not add `import anthropic` anywhere.
## Starting the services
```bash
# Backend β always use --reload for hot reload
cd backend && source venv/bin/activate
uvicorn main:app --reload --port 8000
# Frontend β always use --webpack (Turbopack is broken on this system)
cd frontend && npm run dev # runs: next dev --webpack
# MCP server (separate process, stdio transport)
cd backend && python mcp_server.py
# Seed graph data (~15 min first run)
curl -X POST http://localhost:8000/seed
```
After changing backend Python files, uvicorn `--reload` should pick them up. If a 404 appears for a newly-added endpoint or old errors persist, the server needs a manual restart β kill the process and re-run the uvicorn command.
## Project layout
```
promptop/
βββ CLAUDE.md β you are here
βββ README.md β user-facing docs
βββ backend/
β βββ main.py β FastAPI app, all routes
β βββ clinicaltrials_api.py β ClinicalTrials.gov v2 API (async + sync)
β βββ intake_matching.py β SI-unit clinical intake β trial scoring
β βββ trial_enrichment.py β passive graph enrichment on search
β βββ matching_engine.py β FHIR patient β trial scoring (LLM-assisted)
β βββ a2a_workflow.py β A2A state machine (INGESTβPARSEβMATCHβSCOREβRECRUIT)
β βββ graphrag.py β LangChain GraphCypherQAChain with custom prompt
β βββ graph_seeder.py β seeds 500 patients + real NCT trials from APIs
β βββ fhir_adapter.py β FHIR R4 patient models (P001βP005 mock patients)
β βββ neo4j_setup.py β Neo4j connection + schema setup
β βββ analytics.py β dashboard KPIs, funnel, demographics, map data
β βββ recruitment_pipeline.py β kanban board, outreach generation
β βββ llm_client.py β all LLM calls (aimlapi.com / claude-opus-4-7)
β βββ mcp_server.py β MCP stdio server (6 tools)
β βββ requirements.txt
βββ frontend/
β βββ src/app/
β β βββ page.tsx β Trial Finder (real-time CT.gov, recency sort)
β β βββ intake/page.tsx β Eligibility Check (SI-unit clinical intake form)
β β βββ screening/page.tsx β Patient Screening (A2A pipeline, FHIR patients)
β β βββ recruitment/page.tsxβ Recruitment Hub (kanban + outreach generation)
β β βββ dashboard/page.tsx β Analytics dashboard (Recharts)
β β βββ map/page.tsx β Leaflet site map
β β βββ graph/page.tsx β GraphRAG natural language query
β β βββ layout.tsx β App shell with Sidebar
β βββ src/components/
β β βββ Sidebar.tsx β Navigation sidebar
β β βββ MapComponent.tsx β Raw Leaflet map (no react-leaflet SSR issues)
β βββ src/lib/api.ts β Typed API client for all backend endpoints
β βββ next.config.ts β webpack mode, filesystem cache, optimizePackageImports
βββ docker/ β Docker + Nginx for HuggingFace Spaces deployment
```
## Neo4j graph schema
```
(Patient) id, name, age, sex, ecog, condition, city, state, ethnicity,
biomarkers[], medications[], source, stage
(Trial) id (NCT), title, condition, phase, status, sponsor,
eligibility_criteria, min_age, max_age, sex, enrollment,
start_date, completion_date, last_updated, ctgov_url
(Diagnosis) id, name, icd10
(Biomarker) id (e.g. HER2_POS), name (e.g. "HER2 Positive")
(Medication) id (e.g. TAMOXIFEN), name
(StudySite) id, name, city, state, lat, lon, trials, enrolled, capacity
Relationships:
(Patient)-[:ELIGIBLE_FOR {score}]->(Trial)
(Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
(Patient)-[:HAS_BIOMARKER]->(Biomarker)
(Patient)-[:TAKES_MEDICATION]->(Medication)
(Trial)-[:LOCATED_AT]->(StudySite)
```
**Graph scale after seeding:** ~500 patients, ~250 trials, ~9,100 ELIGIBLE_FOR edges.
Patient IDs from seeder: `P_C50_0001` (breast), `P_C61_0001` (prostate), etc.
Mock FHIR patients: `P001`β`P005` (used by screening/workflow pages).
## Key backend modules
### `clinicaltrials_api.py`
- `search_trials()` β async, `sort=LastUpdatePostDate:desc`
- `get_trial_details()` β async
- `search_trials_sync()` / `get_trial_details_sync()` β sync using `httpx.Client` (NOT `asyncio.run()`). Safe to call from both sync functions and FastAPI async handlers.
- `_normalize_study()` β extracts `last_updated`, `ctgov_url` in addition to core fields.
**Do not** use `asyncio.run()` inside these sync wrappers β it breaks when called from a running FastAPI event loop. The sync wrappers use `httpx.Client` directly.
### `intake_matching.py`
Implements SI-unit clinical intake β trial eligibility matching without requiring a patient ID:
- `BIOMARKER_REGISTRY` β maps graph node IDs to labels and eligibility text search terms
- `score_intake_against_trial()` β weighted scoring: age (25), sex (15), ECOG (15), biomarkers (30), labs (15)
- `_check_labs()` β parses thresholds from eligibility criteria text, converts SI units (creatinine ΞΌmol/L β mg/dL, bilirubin ΞΌmol/L β mg/dL)
- `save_intake_as_patient()` β persists intake as `Patient` node for long-term graph enrichment
### `trial_enrichment.py`
- `enrich_trials_from_search()` β called as a `BackgroundTask` on every `/api/v1/trials/search` response; upserts Trial + StudySite nodes
- `get_eligible_patient_counts()` β batch graph query, returns `{nct_id: count}`
- `get_graph_intelligence()` β per-trial: eligible count + top biomarkers + similar trials
### `graphrag.py`
Uses a custom `_CYPHER_PROMPT` with explicit schema examples. Critical rules in the prompt:
- Biomarker lookups use `id` property (`{id: 'HER2_POS'}`), never `{name: 'HER2', status: 'positive'}`
- Condition lookups use lowercase on Trial nodes
- Patient eligibility always via `(Patient)-[:ELIGIBLE_FOR]->(Trial)`
### `a2a_workflow.py`
Five-state machine: `INGESTING β PARSING_PROTOCOL β MATCHING β SCORING β RECRUITING`
- Calls `search_trials_sync()` / `get_trial_details_sync()` β these are safe (use httpx.Client)
- `run_pipeline()` is synchronous; called from async FastAPI endpoint without `await`
## Key frontend pages
### `/intake` β Eligibility Check
The primary self-service interface. Accepts raw clinical data in SI units; no patient ID needed.
- Six sections: Diagnosis & Demographics, Biomarkers, Lab Values, Treatment History
- Biomarker registry loaded from `GET /api/v1/intake/biomarkers`
- Submits to `POST /api/v1/intake/match`
- Optional "Save to graph" checkbox persists profile as Patient node
### `/` β Trial Finder
- Sorted by `LastUpdatePostDate:desc` (most recently updated first)
- Each search result triggers background graph enrichment
- Expanded cards show Graph Intelligence panel: eligible patient count, top biomarkers, similar trials
- Direct ClinicalTrials.gov link per trial
### `/screening` β Patient Screening
- Patient ID field is a `<input list="...">` combobox loading from `GET /api/v1/graph/patients`
- NCT ID field is a combobox with quick-pick suggestions
- Validates non-empty inputs before submitting
- Two modes: Single Trial Screen and A2A Full Pipeline
## API endpoints (key ones)
```
GET /api/v1/trials/search β real-time CT.gov search, sorted by recency, graph-enriched
POST /api/v1/intake/match β SI-unit clinical intake β ranked trial matches
GET /api/v1/intake/biomarkers β biomarker registry for the intake form
GET /api/v1/trials/{nct_id}/intelligence β graph-derived insights per trial
GET /api/v1/graph/patients β query Neo4j for seeded patient IDs
POST /api/v1/patients/{id}/screen/{nct_id} β screen FHIR patient against trial
POST /api/v1/workflow/run β run full A2A pipeline
GET /api/v1/analytics/kpi β dashboard KPIs
GET /api/v1/map/data β site coordinates + patient clusters
POST /api/v1/graph/query β GraphRAG natural language
POST /seed β trigger full graph seeding
GET /api/v1/graph/stats β node/edge counts
```
Full interactive docs at `http://localhost:8000/docs`.
## Environment variables
```env
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=clinicalmatch2024
NEO4J_DATABASE=neo4j
OPENAI_API_KEY=<aimlapi.com key>
OPENAI_BASE_URL=https://ai.aimlapi.com/v1
OPENAI_MODEL=claude-opus-4-7
NEXT_PUBLIC_API_URL=http://localhost:8000 # dev only; empty string in Docker
```
## Known issues and constraints
- **Turbopack is broken** on this machine β always use `next dev --webpack`. Never suggest `next dev` without `--webpack`.
- **`next/font/google`** causes compilation to hang (network request during bundling). Geist font is installed as a package but the `next/font/google` import is removed. Use plain Tailwind `font-sans`.
- **`asyncio.run()` from async context** β the sync CT.gov wrappers use `httpx.Client` to avoid this. Never re-introduce `asyncio.run()` into the sync wrappers; it will fail when called from FastAPI's running event loop.
- **Leaflet SSR** β `MapComponent.tsx` uses raw Leaflet (not react-leaflet) via `useEffect`. The `MapComponent` dynamic import has `ssr: false`. Do not switch to react-leaflet's `MapContainer`.
- **`suppressHydrationWarning`** on `<body>` in `layout.tsx` β required for Grammarly browser extension compatibility.
- **Mock FHIR patients** (P001βP005) live in `fhir_adapter.py`. The 500 seeded graph patients (`P_C50_0001` etc.) are in Neo4j only. The screening page loads graph patients from `GET /api/v1/graph/patients` for the combobox.
## Adding new features
1. **New backend route**: add to `main.py`, import the module at the top, add a Pydantic request model if needed
2. **New API function**: add a typed function to `frontend/src/lib/api.ts`
3. **New page**: create `frontend/src/app/<name>/page.tsx`, add to `nav` array in `Sidebar.tsx`
4. **Graph schema change**: update `neo4j_setup.py` constraints/indexes, update `_CYPHER_PROMPT` in `graphrag.py` with the new node/property examples
5. **New biomarker**: add to `BIOMARKER_REGISTRY` in `intake_matching.py` and to `BM_GROUPS` in `frontend/src/app/intake/page.tsx`
## Demo script (for judges)
1. `GET /api/v1/graph/stats` β confirm 500+ patients and 9,100+ edges
2. `/` β search "breast cancer" β observe recency sort, graph-matched patient count badges
3. Expand a trial β Graph Intelligence panel shows eligible patients, top biomarkers, similar trials
4. `/intake` β enter: Age 52, Female, ECOG 1, HER2+, Hgb 12.5 g/dL, Creatinine 88 ΞΌmol/L β ranked trials with pass/fail breakdown
5. `/screening` β select P_C50_0001 from combobox β run A2A Pipeline β observe 5-state machine
6. `/recruitment` β kanban board, generate PCP letter outreach
7. `/dashboard` β KPI cards, enrollment funnel, demographics
8. `/graph` β ask "which patients are eligible for breast cancer trials?"
9. In Prompt Opinion: call MCP tool `find_trials(condition="breast cancer")`
|