Spaces:
Running
Running
Initial deployment: ClinicalMatch AI v2.0 β FHIR R4 Β· MCP (9 tools) Β· A2A workflow Β· SHARP compliance Β· 100k synthetic patients Β· Neo4j graph Β· GraphRAG chatbot
59abb4f | # ClinicalMatch AI β Agent Instructions | |
| > Project memory (build state, completed features, constraints) is also tracked in `.claude/project_memory.md` in this repo. | |
| This is a hackathon submission for **"Agents Assemble: Healthcare AI Endgame Challenge"** on the Prompt Opinion platform. Judging criteria: MCP compliance, A2A workflow, FHIR R4 standards, AI quality, impact, feasibility. | |
| ## Stack at a glance | |
| | Layer | Technology | | |
| |---|---| | |
| | Backend | FastAPI (Python 3.12), uvicorn | | |
| | Graph DB | Neo4j Community 5.x via bolt | | |
| | LLM | claude-opus-4-7 via aimlapi.com (OpenAI-compatible) | | |
| | GraphRAG | LangChain `GraphCypherQAChain` + custom Cypher prompt | | |
| | Frontend | Next.js 16 (webpack mode), React 19, Tailwind CSS 3, Recharts, Leaflet | | |
| | Standards | FHIR R4 Β· MCP (stdio) Β· A2A state machine | | |
| ## Critical: LLM API | |
| **Never use the Anthropic SDK directly.** All LLM calls go through aimlapi.com or a compatible alternative using the OpenAI-compatible interface: | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI( | |
| api_key=os.getenv("OPENAI_API_KEY"), | |
| base_url=os.getenv("OPENAI_BASE_URL", "https://ai.aimlapi.com/v1"), | |
| ) | |
| model = os.getenv("OPENAI_MODEL", "claude-opus-4-7") | |
| ``` | |
| See `backend/llm_client.py` for the canonical pattern. Do not add `import anthropic` anywhere. | |
| ## Starting the services | |
| ```bash | |
| # Backend β always use --reload for hot reload | |
| cd backend && source venv/bin/activate | |
| uvicorn main:app --reload --port 8000 | |
| # Frontend β always use --webpack (Turbopack is broken on this system) | |
| cd frontend && npm run dev # runs: next dev --webpack | |
| # MCP server (separate process, stdio transport) | |
| cd backend && python mcp_server.py | |
| # Seed graph data (~15 min first run) | |
| curl -X POST http://localhost:8000/seed | |
| ``` | |
| After changing backend Python files, uvicorn `--reload` should pick them up. If a 404 appears for a newly-added endpoint or old errors persist, the server needs a manual restart β kill the process and re-run the uvicorn command. | |
| ## Project layout | |
| ``` | |
| promptop/ | |
| βββ CLAUDE.md β you are here | |
| βββ README.md β user-facing docs | |
| βββ backend/ | |
| β βββ main.py β FastAPI app, all routes | |
| β βββ clinicaltrials_api.py β ClinicalTrials.gov v2 API (async + sync) | |
| β βββ intake_matching.py β SI-unit clinical intake β trial scoring | |
| β βββ trial_enrichment.py β passive graph enrichment on search | |
| β βββ matching_engine.py β FHIR patient β trial scoring (LLM-assisted) | |
| β βββ a2a_workflow.py β A2A state machine (INGESTβPARSEβMATCHβSCOREβRECRUIT) | |
| β βββ graphrag.py β LangChain GraphCypherQAChain with custom prompt | |
| β βββ graph_seeder.py β seeds 500 patients + real NCT trials from APIs | |
| β βββ fhir_adapter.py β FHIR R4 patient models (P001βP005 mock patients) | |
| β βββ neo4j_setup.py β Neo4j connection + schema setup | |
| β βββ analytics.py β dashboard KPIs, funnel, demographics, map data | |
| β βββ recruitment_pipeline.py β kanban board, outreach generation | |
| β βββ llm_client.py β all LLM calls (aimlapi.com / claude-opus-4-7) | |
| β βββ mcp_server.py β MCP stdio server (6 tools) | |
| β βββ requirements.txt | |
| βββ frontend/ | |
| β βββ src/app/ | |
| β β βββ page.tsx β Trial Finder (real-time CT.gov, recency sort) | |
| β β βββ intake/page.tsx β Eligibility Check (SI-unit clinical intake form) | |
| β β βββ screening/page.tsx β Patient Screening (A2A pipeline, FHIR patients) | |
| β β βββ recruitment/page.tsxβ Recruitment Hub (kanban + outreach generation) | |
| β β βββ dashboard/page.tsx β Analytics dashboard (Recharts) | |
| β β βββ map/page.tsx β Leaflet site map | |
| β β βββ graph/page.tsx β GraphRAG natural language query | |
| β β βββ layout.tsx β App shell with Sidebar | |
| β βββ src/components/ | |
| β β βββ Sidebar.tsx β Navigation sidebar | |
| β β βββ MapComponent.tsx β Raw Leaflet map (no react-leaflet SSR issues) | |
| β βββ src/lib/api.ts β Typed API client for all backend endpoints | |
| β βββ next.config.ts β webpack mode, filesystem cache, optimizePackageImports | |
| βββ docker/ β Docker + Nginx for HuggingFace Spaces deployment | |
| ``` | |
| ## Neo4j graph schema | |
| ``` | |
| (Patient) id, name, age, sex, ecog, condition, city, state, ethnicity, | |
| biomarkers[], medications[], source, stage | |
| (Trial) id (NCT), title, condition, phase, status, sponsor, | |
| eligibility_criteria, min_age, max_age, sex, enrollment, | |
| start_date, completion_date, last_updated, ctgov_url | |
| (Diagnosis) id, name, icd10 | |
| (Biomarker) id (e.g. HER2_POS), name (e.g. "HER2 Positive") | |
| (Medication) id (e.g. TAMOXIFEN), name | |
| (StudySite) id, name, city, state, lat, lon, trials, enrolled, capacity | |
| Relationships: | |
| (Patient)-[:ELIGIBLE_FOR {score}]->(Trial) | |
| (Patient)-[:HAS_DIAGNOSIS]->(Diagnosis) | |
| (Patient)-[:HAS_BIOMARKER]->(Biomarker) | |
| (Patient)-[:TAKES_MEDICATION]->(Medication) | |
| (Trial)-[:LOCATED_AT]->(StudySite) | |
| ``` | |
| **Graph scale after seeding:** ~500 patients, ~250 trials, ~9,100 ELIGIBLE_FOR edges. | |
| Patient IDs from seeder: `P_C50_0001` (breast), `P_C61_0001` (prostate), etc. | |
| Mock FHIR patients: `P001`β`P005` (used by screening/workflow pages). | |
| ## Key backend modules | |
| ### `clinicaltrials_api.py` | |
| - `search_trials()` β async, `sort=LastUpdatePostDate:desc` | |
| - `get_trial_details()` β async | |
| - `search_trials_sync()` / `get_trial_details_sync()` β sync using `httpx.Client` (NOT `asyncio.run()`). Safe to call from both sync functions and FastAPI async handlers. | |
| - `_normalize_study()` β extracts `last_updated`, `ctgov_url` in addition to core fields. | |
| **Do not** use `asyncio.run()` inside these sync wrappers β it breaks when called from a running FastAPI event loop. The sync wrappers use `httpx.Client` directly. | |
| ### `intake_matching.py` | |
| Implements SI-unit clinical intake β trial eligibility matching without requiring a patient ID: | |
| - `BIOMARKER_REGISTRY` β maps graph node IDs to labels and eligibility text search terms | |
| - `score_intake_against_trial()` β weighted scoring: age (25), sex (15), ECOG (15), biomarkers (30), labs (15) | |
| - `_check_labs()` β parses thresholds from eligibility criteria text, converts SI units (creatinine ΞΌmol/L β mg/dL, bilirubin ΞΌmol/L β mg/dL) | |
| - `save_intake_as_patient()` β persists intake as `Patient` node for long-term graph enrichment | |
| ### `trial_enrichment.py` | |
| - `enrich_trials_from_search()` β called as a `BackgroundTask` on every `/api/v1/trials/search` response; upserts Trial + StudySite nodes | |
| - `get_eligible_patient_counts()` β batch graph query, returns `{nct_id: count}` | |
| - `get_graph_intelligence()` β per-trial: eligible count + top biomarkers + similar trials | |
| ### `graphrag.py` | |
| Uses a custom `_CYPHER_PROMPT` with explicit schema examples. Critical rules in the prompt: | |
| - Biomarker lookups use `id` property (`{id: 'HER2_POS'}`), never `{name: 'HER2', status: 'positive'}` | |
| - Condition lookups use lowercase on Trial nodes | |
| - Patient eligibility always via `(Patient)-[:ELIGIBLE_FOR]->(Trial)` | |
| ### `a2a_workflow.py` | |
| Five-state machine: `INGESTING β PARSING_PROTOCOL β MATCHING β SCORING β RECRUITING` | |
| - Calls `search_trials_sync()` / `get_trial_details_sync()` β these are safe (use httpx.Client) | |
| - `run_pipeline()` is synchronous; called from async FastAPI endpoint without `await` | |
| ## Key frontend pages | |
| ### `/intake` β Eligibility Check | |
| The primary self-service interface. Accepts raw clinical data in SI units; no patient ID needed. | |
| - Six sections: Diagnosis & Demographics, Biomarkers, Lab Values, Treatment History | |
| - Biomarker registry loaded from `GET /api/v1/intake/biomarkers` | |
| - Submits to `POST /api/v1/intake/match` | |
| - Optional "Save to graph" checkbox persists profile as Patient node | |
| ### `/` β Trial Finder | |
| - Sorted by `LastUpdatePostDate:desc` (most recently updated first) | |
| - Each search result triggers background graph enrichment | |
| - Expanded cards show Graph Intelligence panel: eligible patient count, top biomarkers, similar trials | |
| - Direct ClinicalTrials.gov link per trial | |
| ### `/screening` β Patient Screening | |
| - Patient ID field is a `<input list="...">` combobox loading from `GET /api/v1/graph/patients` | |
| - NCT ID field is a combobox with quick-pick suggestions | |
| - Validates non-empty inputs before submitting | |
| - Two modes: Single Trial Screen and A2A Full Pipeline | |
| ## API endpoints (key ones) | |
| ``` | |
| GET /api/v1/trials/search β real-time CT.gov search, sorted by recency, graph-enriched | |
| POST /api/v1/intake/match β SI-unit clinical intake β ranked trial matches | |
| GET /api/v1/intake/biomarkers β biomarker registry for the intake form | |
| GET /api/v1/trials/{nct_id}/intelligence β graph-derived insights per trial | |
| GET /api/v1/graph/patients β query Neo4j for seeded patient IDs | |
| POST /api/v1/patients/{id}/screen/{nct_id} β screen FHIR patient against trial | |
| POST /api/v1/workflow/run β run full A2A pipeline | |
| GET /api/v1/analytics/kpi β dashboard KPIs | |
| GET /api/v1/map/data β site coordinates + patient clusters | |
| POST /api/v1/graph/query β GraphRAG natural language | |
| POST /seed β trigger full graph seeding | |
| GET /api/v1/graph/stats β node/edge counts | |
| ``` | |
| Full interactive docs at `http://localhost:8000/docs`. | |
| ## Environment variables | |
| ```env | |
| NEO4J_URI=bolt://localhost:7687 | |
| NEO4J_USERNAME=neo4j | |
| NEO4J_PASSWORD=clinicalmatch2024 | |
| NEO4J_DATABASE=neo4j | |
| OPENAI_API_KEY=<aimlapi.com key> | |
| OPENAI_BASE_URL=https://ai.aimlapi.com/v1 | |
| OPENAI_MODEL=claude-opus-4-7 | |
| NEXT_PUBLIC_API_URL=http://localhost:8000 # dev only; empty string in Docker | |
| ``` | |
| ## Known issues and constraints | |
| - **Turbopack is broken** on this machine β always use `next dev --webpack`. Never suggest `next dev` without `--webpack`. | |
| - **`next/font/google`** causes compilation to hang (network request during bundling). Geist font is installed as a package but the `next/font/google` import is removed. Use plain Tailwind `font-sans`. | |
| - **`asyncio.run()` from async context** β the sync CT.gov wrappers use `httpx.Client` to avoid this. Never re-introduce `asyncio.run()` into the sync wrappers; it will fail when called from FastAPI's running event loop. | |
| - **Leaflet SSR** β `MapComponent.tsx` uses raw Leaflet (not react-leaflet) via `useEffect`. The `MapComponent` dynamic import has `ssr: false`. Do not switch to react-leaflet's `MapContainer`. | |
| - **`suppressHydrationWarning`** on `<body>` in `layout.tsx` β required for Grammarly browser extension compatibility. | |
| - **Mock FHIR patients** (P001βP005) live in `fhir_adapter.py`. The 500 seeded graph patients (`P_C50_0001` etc.) are in Neo4j only. The screening page loads graph patients from `GET /api/v1/graph/patients` for the combobox. | |
| ## Adding new features | |
| 1. **New backend route**: add to `main.py`, import the module at the top, add a Pydantic request model if needed | |
| 2. **New API function**: add a typed function to `frontend/src/lib/api.ts` | |
| 3. **New page**: create `frontend/src/app/<name>/page.tsx`, add to `nav` array in `Sidebar.tsx` | |
| 4. **Graph schema change**: update `neo4j_setup.py` constraints/indexes, update `_CYPHER_PROMPT` in `graphrag.py` with the new node/property examples | |
| 5. **New biomarker**: add to `BIOMARKER_REGISTRY` in `intake_matching.py` and to `BM_GROUPS` in `frontend/src/app/intake/page.tsx` | |
| ## Demo script (for judges) | |
| 1. `GET /api/v1/graph/stats` β confirm 500+ patients and 9,100+ edges | |
| 2. `/` β search "breast cancer" β observe recency sort, graph-matched patient count badges | |
| 3. Expand a trial β Graph Intelligence panel shows eligible patients, top biomarkers, similar trials | |
| 4. `/intake` β enter: Age 52, Female, ECOG 1, HER2+, Hgb 12.5 g/dL, Creatinine 88 ΞΌmol/L β ranked trials with pass/fail breakdown | |
| 5. `/screening` β select P_C50_0001 from combobox β run A2A Pipeline β observe 5-state machine | |
| 6. `/recruitment` β kanban board, generate PCP letter outreach | |
| 7. `/dashboard` β KPI cards, enrollment funnel, demographics | |
| 8. `/graph` β ask "which patients are eligible for breast cancer trials?" | |
| 9. In Prompt Opinion: call MCP tool `find_trials(condition="breast cancer")` | |