Spaces:

TheQuantEd
/

CTA

Running

CTA / CLAUDE.md

Initial deployment: ClinicalMatch AI v2.0 — FHIR R4 · MCP (9 tools) · A2A workflow · SHARP compliance · 100k synthetic patients · Neo4j graph · GraphRAG chatbot

59abb4f 11 days ago

preview code

raw

history blame contribute delete

12.6 kB

	# ClinicalMatch AI — Agent Instructions

	> Project memory (build state, completed features, constraints) is also tracked in `.claude/project_memory.md` in this repo.

	This is a hackathon submission for "Agents Assemble: Healthcare AI Endgame Challenge" on the Prompt Opinion platform. Judging criteria: MCP compliance, A2A workflow, FHIR R4 standards, AI quality, impact, feasibility.

	## Stack at a glance

	\| Layer \| Technology \|
	\|---\|---\|
	\| Backend \| FastAPI (Python 3.12), uvicorn \|
	\| Graph DB \| Neo4j Community 5.x via bolt \|
	\| LLM \| claude-opus-4-7 via aimlapi.com (OpenAI-compatible) \|
	\| GraphRAG \| LangChain `GraphCypherQAChain` + custom Cypher prompt \|
	\| Frontend \| Next.js 16 (webpack mode), React 19, Tailwind CSS 3, Recharts, Leaflet \|
	\| Standards \| FHIR R4 · MCP (stdio) · A2A state machine \|

	## Critical: LLM API

	Never use the Anthropic SDK directly. All LLM calls go through aimlapi.com or a compatible alternative using the OpenAI-compatible interface:

	```python
	from openai import OpenAI

	client = OpenAI(
	api_key=os.getenv("OPENAI_API_KEY"),
	base_url=os.getenv("OPENAI_BASE_URL", "https://ai.aimlapi.com/v1"),
	)
	model = os.getenv("OPENAI_MODEL", "claude-opus-4-7")
	```

	See `backend/llm_client.py` for the canonical pattern. Do not add `import anthropic` anywhere.

	## Starting the services

	```bash
	# Backend — always use --reload for hot reload
	cd backend && source venv/bin/activate
	uvicorn main:app --reload --port 8000

	# Frontend — always use --webpack (Turbopack is broken on this system)
	cd frontend && npm run dev # runs: next dev --webpack

	# MCP server (separate process, stdio transport)
	cd backend && python mcp_server.py

	# Seed graph data (~15 min first run)
	curl -X POST http://localhost:8000/seed
	```

	After changing backend Python files, uvicorn `--reload` should pick them up. If a 404 appears for a newly-added endpoint or old errors persist, the server needs a manual restart — kill the process and re-run the uvicorn command.

	## Project layout

	```
	promptop/
	├── CLAUDE.md ← you are here
	├── README.md ← user-facing docs
	├── backend/
	│ ├── main.py ← FastAPI app, all routes
	│ ├── clinicaltrials_api.py ← ClinicalTrials.gov v2 API (async + sync)
	│ ├── intake_matching.py ← SI-unit clinical intake → trial scoring
	│ ├── trial_enrichment.py ← passive graph enrichment on search
	│ ├── matching_engine.py ← FHIR patient → trial scoring (LLM-assisted)
	│ ├── a2a_workflow.py ← A2A state machine (INGEST→PARSE→MATCH→SCORE→RECRUIT)
	│ ├── graphrag.py ← LangChain GraphCypherQAChain with custom prompt
	│ ├── graph_seeder.py ← seeds 500 patients + real NCT trials from APIs
	│ ├── fhir_adapter.py ← FHIR R4 patient models (P001–P005 mock patients)
	│ ├── neo4j_setup.py ← Neo4j connection + schema setup
	│ ├── analytics.py ← dashboard KPIs, funnel, demographics, map data
	│ ├── recruitment_pipeline.py ← kanban board, outreach generation
	│ ├── llm_client.py ← all LLM calls (aimlapi.com / claude-opus-4-7)
	│ ├── mcp_server.py ← MCP stdio server (6 tools)
	│ └── requirements.txt
	├── frontend/
	│ ├── src/app/
	│ │ ├── page.tsx ← Trial Finder (real-time CT.gov, recency sort)
	│ │ ├── intake/page.tsx ← Eligibility Check (SI-unit clinical intake form)
	│ │ ├── screening/page.tsx ← Patient Screening (A2A pipeline, FHIR patients)
	│ │ ├── recruitment/page.tsx← Recruitment Hub (kanban + outreach generation)
	│ │ ├── dashboard/page.tsx ← Analytics dashboard (Recharts)
	│ │ ├── map/page.tsx ← Leaflet site map
	│ │ ├── graph/page.tsx ← GraphRAG natural language query
	│ │ └── layout.tsx ← App shell with Sidebar
	│ ├── src/components/
	│ │ ├── Sidebar.tsx ← Navigation sidebar
	│ │ └── MapComponent.tsx ← Raw Leaflet map (no react-leaflet SSR issues)
	│ ├── src/lib/api.ts ← Typed API client for all backend endpoints
	│ └── next.config.ts ← webpack mode, filesystem cache, optimizePackageImports
	└── docker/ ← Docker + Nginx for HuggingFace Spaces deployment
	```

	## Neo4j graph schema

	```
	(Patient) id, name, age, sex, ecog, condition, city, state, ethnicity,
	biomarkers[], medications[], source, stage
	(Trial) id (NCT), title, condition, phase, status, sponsor,
	eligibility_criteria, min_age, max_age, sex, enrollment,
	start_date, completion_date, last_updated, ctgov_url
	(Diagnosis) id, name, icd10
	(Biomarker) id (e.g. HER2_POS), name (e.g. "HER2 Positive")
	(Medication) id (e.g. TAMOXIFEN), name
	(StudySite) id, name, city, state, lat, lon, trials, enrolled, capacity

	Relationships:
	(Patient)-[:ELIGIBLE_FOR {score}]->(Trial)
	(Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
	(Patient)-[:HAS_BIOMARKER]->(Biomarker)
	(Patient)-[:TAKES_MEDICATION]->(Medication)
	(Trial)-[:LOCATED_AT]->(StudySite)
	```

	Graph scale after seeding: ~500 patients, ~250 trials, ~9,100 ELIGIBLE_FOR edges.

	Patient IDs from seeder: `P_C50_0001` (breast), `P_C61_0001` (prostate), etc.
	Mock FHIR patients: `P001`–`P005` (used by screening/workflow pages).

	## Key backend modules

	### `clinicaltrials_api.py`
	- `search_trials()` — async, `sort=LastUpdatePostDate:desc`
	- `get_trial_details()` — async
	- `search_trials_sync()` / `get_trial_details_sync()` — sync using `httpx.Client` (NOT `asyncio.run()`). Safe to call from both sync functions and FastAPI async handlers.
	- `_normalize_study()` — extracts `last_updated`, `ctgov_url` in addition to core fields.

	Do not use `asyncio.run()` inside these sync wrappers — it breaks when called from a running FastAPI event loop. The sync wrappers use `httpx.Client` directly.

	### `intake_matching.py`
	Implements SI-unit clinical intake → trial eligibility matching without requiring a patient ID:
	- `BIOMARKER_REGISTRY` — maps graph node IDs to labels and eligibility text search terms
	- `score_intake_against_trial()` — weighted scoring: age (25), sex (15), ECOG (15), biomarkers (30), labs (15)
	- `_check_labs()` — parses thresholds from eligibility criteria text, converts SI units (creatinine μmol/L ↔ mg/dL, bilirubin μmol/L ↔ mg/dL)
	- `save_intake_as_patient()` — persists intake as `Patient` node for long-term graph enrichment

	### `trial_enrichment.py`
	- `enrich_trials_from_search()` — called as a `BackgroundTask` on every `/api/v1/trials/search` response; upserts Trial + StudySite nodes
	- `get_eligible_patient_counts()` — batch graph query, returns `{nct_id: count}`
	- `get_graph_intelligence()` — per-trial: eligible count + top biomarkers + similar trials

	### `graphrag.py`
	Uses a custom `_CYPHER_PROMPT` with explicit schema examples. Critical rules in the prompt:
	- Biomarker lookups use `id` property (`{id: 'HER2_POS'}`), never `{name: 'HER2', status: 'positive'}`
	- Condition lookups use lowercase on Trial nodes
	- Patient eligibility always via `(Patient)-[:ELIGIBLE_FOR]->(Trial)`

	### `a2a_workflow.py`
	Five-state machine: `INGESTING → PARSING_PROTOCOL → MATCHING → SCORING → RECRUITING`
	- Calls `search_trials_sync()` / `get_trial_details_sync()` — these are safe (use httpx.Client)
	- `run_pipeline()` is synchronous; called from async FastAPI endpoint without `await`

	## Key frontend pages

	### `/intake` — Eligibility Check
	The primary self-service interface. Accepts raw clinical data in SI units; no patient ID needed.
	- Six sections: Diagnosis & Demographics, Biomarkers, Lab Values, Treatment History
	- Biomarker registry loaded from `GET /api/v1/intake/biomarkers`
	- Submits to `POST /api/v1/intake/match`
	- Optional "Save to graph" checkbox persists profile as Patient node

	### `/` — Trial Finder
	- Sorted by `LastUpdatePostDate:desc` (most recently updated first)
	- Each search result triggers background graph enrichment
	- Expanded cards show Graph Intelligence panel: eligible patient count, top biomarkers, similar trials
	- Direct ClinicalTrials.gov link per trial

	### `/screening` — Patient Screening
	- Patient ID field is a `<input list="...">` combobox loading from `GET /api/v1/graph/patients`
	- NCT ID field is a combobox with quick-pick suggestions
	- Validates non-empty inputs before submitting
	- Two modes: Single Trial Screen and A2A Full Pipeline

	## API endpoints (key ones)

	```
	GET /api/v1/trials/search — real-time CT.gov search, sorted by recency, graph-enriched
	POST /api/v1/intake/match — SI-unit clinical intake → ranked trial matches
	GET /api/v1/intake/biomarkers — biomarker registry for the intake form
	GET /api/v1/trials/{nct_id}/intelligence — graph-derived insights per trial
	GET /api/v1/graph/patients — query Neo4j for seeded patient IDs
	POST /api/v1/patients/{id}/screen/{nct_id} — screen FHIR patient against trial
	POST /api/v1/workflow/run — run full A2A pipeline
	GET /api/v1/analytics/kpi — dashboard KPIs
	GET /api/v1/map/data — site coordinates + patient clusters
	POST /api/v1/graph/query — GraphRAG natural language
	POST /seed — trigger full graph seeding
	GET /api/v1/graph/stats — node/edge counts
	```

	Full interactive docs at `http://localhost:8000/docs`.

	## Environment variables

	```env
	NEO4J_URI=bolt://localhost:7687
	NEO4J_USERNAME=neo4j
	NEO4J_PASSWORD=clinicalmatch2024
	NEO4J_DATABASE=neo4j

	OPENAI_API_KEY=<aimlapi.com key>
	OPENAI_BASE_URL=https://ai.aimlapi.com/v1
	OPENAI_MODEL=claude-opus-4-7

	NEXT_PUBLIC_API_URL=http://localhost:8000 # dev only; empty string in Docker
	```

	## Known issues and constraints

	- Turbopack is broken on this machine — always use `next dev --webpack`. Never suggest `next dev` without `--webpack`.
	- `next/font/google` causes compilation to hang (network request during bundling). Geist font is installed as a package but the `next/font/google` import is removed. Use plain Tailwind `font-sans`.
	- `asyncio.run()` from async context — the sync CT.gov wrappers use `httpx.Client` to avoid this. Never re-introduce `asyncio.run()` into the sync wrappers; it will fail when called from FastAPI's running event loop.
	- Leaflet SSR — `MapComponent.tsx` uses raw Leaflet (not react-leaflet) via `useEffect`. The `MapComponent` dynamic import has `ssr: false`. Do not switch to react-leaflet's `MapContainer`.
	- `suppressHydrationWarning` on `<body>` in `layout.tsx` — required for Grammarly browser extension compatibility.
	- Mock FHIR patients (P001–P005) live in `fhir_adapter.py`. The 500 seeded graph patients (`P_C50_0001` etc.) are in Neo4j only. The screening page loads graph patients from `GET /api/v1/graph/patients` for the combobox.

	## Adding new features

	1. New backend route: add to `main.py`, import the module at the top, add a Pydantic request model if needed
	2. New API function: add a typed function to `frontend/src/lib/api.ts`
	3. New page: create `frontend/src/app/<name>/page.tsx`, add to `nav` array in `Sidebar.tsx`
	4. Graph schema change: update `neo4j_setup.py` constraints/indexes, update `_CYPHER_PROMPT` in `graphrag.py` with the new node/property examples
	5. New biomarker: add to `BIOMARKER_REGISTRY` in `intake_matching.py` and to `BM_GROUPS` in `frontend/src/app/intake/page.tsx`

	## Demo script (for judges)

	1. `GET /api/v1/graph/stats` — confirm 500+ patients and 9,100+ edges
	2. `/` — search "breast cancer" → observe recency sort, graph-matched patient count badges
	3. Expand a trial → Graph Intelligence panel shows eligible patients, top biomarkers, similar trials
	4. `/intake` — enter: Age 52, Female, ECOG 1, HER2+, Hgb 12.5 g/dL, Creatinine 88 μmol/L → ranked trials with pass/fail breakdown
	5. `/screening` — select P_C50_0001 from combobox → run A2A Pipeline → observe 5-state machine
	6. `/recruitment` — kanban board, generate PCP letter outreach
	7. `/dashboard` — KPI cards, enrollment funnel, demographics
	8. `/graph` — ask "which patients are eligible for breast cancer trials?"
	9. In Prompt Opinion: call MCP tool `find_trials(condition="breast cancer")`