Spaces:

TheQuantEd
/

CTA

Running

CTA / README.md

Initial deployment: ClinicalMatch AI v2.0 — FHIR R4 · MCP (9 tools) · A2A workflow · SHARP compliance · 100k synthetic patients · Neo4j graph · GraphRAG chatbot

59abb4f 5 days ago

preview code

raw

history blame contribute delete

10.9 kB

	---
	title: ClinicalMatch AI
	emoji: 🧬
	colorFrom: indigo
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: true
	---

	# ClinicalMatch AI — Precision Clinical Trial Matching & Recruitment Agent

	"Agents Assemble: Healthcare AI Endgame Challenge" — Prompt Opinion platform
	Standards: FHIR R4 · MCP · A2A

	> 80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that.

	---

	## What it does

	ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring.

	Key capabilities:

	\| Feature \| Description \|
	\|---\|---\|
	\| Eligibility Check \| Individual enters raw clinical data (age, labs in SI units, biomarkers) — no patient ID required — and receives ranked, explainable trial matches \|
	\| Trial Finder \| Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph \|
	\| Graph Intelligence \| Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk \|
	\| A2A Pipeline \| 5-state orchestration (INGEST → PARSE → MATCH → SCORE → RECRUIT) for FHIR patient profiles \|
	\| Recruitment Hub \| Kanban board tracking patients through IDENTIFIED → ENROLLED; generates personalized outreach (PCP letter, patient email, social post) \|
	\| GraphRAG \| Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?") \|
	\| MCP Server \| 6 tools callable by Prompt Opinion directly via stdio transport \|

	---

	## Architecture

	```
	Prompt Opinion Platform
	│ MCP Protocol (stdio)
	▼
	┌────────────────────────────────────────────────────┐
	│ MCP Server (mcp_server.py) │
	│ find_trials · screen_patient · match_patient │
	│ generate_outreach · get_analytics · summarize │
	└──────────────────────┬─────────────────────────────┘
	│ A2A Orchestration
	▼
	┌────────────────────────────────────────────────────┐
	│ FastAPI Backend (main.py, port 8000) │
	│ 30+ REST endpoints │
	├──────────┬────────────┬────────────┬───────────────┤
	│ CT.gov │ FHIR R4 │ Claude │ Neo4j Graph │
	│ live API │ adapter │ LLM │ RAG + match │
	└──────────┴────────────┴────────────┴───────────────┘
	│
	▼
	┌────────────────────────────────────────────────────┐
	│ Next.js 16 Frontend (port 3000) │
	│ Trial Finder · Eligibility Check · Screening │
	│ Recruitment Hub · Dashboard · Map · GraphRAG │
	└────────────────────────────────────────────────────┘
	│ Nginx (port 7860)
	▼
	HuggingFace Spaces
	```

	Data sources (all free, no auth):

	\| Source \| Data \|
	\|---\|---\|
	\| ClinicalTrials.gov v2 \| Real recruiting NCT trials, sorted by recency \|
	\| RxNorm (NIH) \| Medication RxCUI codes \|
	\| ICD-10 CM (NLM) \| Cancer diagnosis codes \|
	\| PubMed (NCBI) \| Supporting literature PMIDs \|
	\| OpenFDA \| Drug labels and adverse events \|
	\| Synthetic \| 500 realistic patient profiles matched to real trials \|

	---

	## Graph Knowledge Base

	After seeding, the Neo4j graph contains:

	\| Node type \| Count \| Key properties \|
	\|---\|---\|---\|
	\| Patient \| 500 \| age, sex, ECOG, condition, city, biomarkers[], medications[] \|
	\| Trial \| ~250 \| NCT ID, eligibility criteria, phase, last_updated \|
	\| Diagnosis \| ~130 \| ICD-10 codes across 10 oncology conditions \|
	\| Biomarker \| 20 \| HER2+/−, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc. \|
	\| Medication \| 16 \| Trastuzumab, Pembrolizumab, Olaparib, etc. \|
	\| StudySite \| ~200 \| lat/lon coordinates \|
	\| ELIGIBLE_FOR edges \| ~9,100 \| score, linking patients to trials \|

	The graph grows passively — every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges.

	---

	## Clinical Eligibility Check (SI Units)

	The `/intake` page accepts raw clinical data — no patient ID or account required. Fields:

	Demographics: Age (years), Sex, ECOG performance status (0–4), Disease stage (I–IV)

	Biomarker status (toggles):
	- Breast/Gynecologic: HER2+/−, ER+, PR+, BRCA1/2 mutation, Triple-Negative
	- Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1
	- GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E
	- Hematology: FLT3, IDH1/2, BCR-ABL

	Lab values (SI units):

	\| Field \| Unit \| Conversion \|
	\|---\|---\|---\|
	\| Haemoglobin \| g/dL \| — \|
	\| WBC \| ×10⁹/L \| — \|
	\| ANC \| ×10⁹/L \| — \|
	\| Platelets \| ×10⁹/L \| — \|
	\| Creatinine \| μmol/L \| auto-converted ÷88.4 → mg/dL for trial text \|
	\| eGFR \| mL/min/1.73m² \| — \|
	\| Bilirubin \| μmol/L \| auto-converted ÷17.1 → mg/dL for trial text \|
	\| ALT / AST \| U/L \| — \|

	Matching score breakdown:
	- Age 25 pts — compared against trial min/max age
	- Sex 15 pts — compared against trial sex restriction
	- ECOG 15 pts — extracted via regex from eligibility criteria text
	- Biomarkers 30 pts — checks whether biomarker terms appear in trial eligibility text
	- Lab values 15 pts — parses thresholds from text, converts SI units, checks patient values

	Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links.

	---

	## Running Locally (no Docker)

	```bash
	# 1. Start Neo4j
	docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community

	# 2. Backend
	cd backend
	python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
	cp ../.env.example ../.env.local # fill in credentials
	uvicorn main:app --reload --port 8000

	# 3. Schema setup (once)
	curl -X POST http://localhost:8000/setup

	# 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients)
	curl -X POST http://localhost:8000/seed

	# 5. Frontend
	cd frontend
	npm install --legacy-peer-deps
	npm run dev # http://localhost:3000 (uses --webpack, not Turbopack)

	# 6. MCP server (for Prompt Opinion integration)
	cd backend
	python mcp_server.py
	```

	---

	## Running with Docker Compose

	```bash
	cp .env.example .env.local # fill in OPENAI_API_KEY etc.
	docker compose up -d

	# Wait ~60s for Neo4j to be healthy, then:
	curl -X POST http://localhost:7860/setup
	curl -X POST http://localhost:7860/seed
	```

	Services: app → http://localhost:7860 \| API docs → http://localhost:7860/api/docs \| Neo4j → http://localhost:7474

	---

	## Deploying to HuggingFace Spaces

	1. Create a Space → Docker SDK → blank template
	2. Push repo to the Space:
	```bash
	git remote add hf https://huggingface.co/spaces/<username>/<space-name>
	git push hf main
	```
	3. Set Repository Secrets:
	```
	OPENAI_API_KEY = <aimlapi.com key>
	OPENAI_BASE_URL = https://ai.aimlapi.com/v1
	OPENAI_MODEL = claude-opus-4-7
	NEO4J_PASSWORD = clinicalmatch2024
	```
	4. After first boot, seed data:
	```
	POST https://<space>.hf.space/seed
	```

	---

	## MCP Tools (Prompt Opinion integration)

	```bash
	python backend/mcp_server.py # stdio transport
	```

	\| Tool \| Arguments \| Description \|
	\|---\|---\|---\|
	\| `find_trials` \| `condition, phase?` \| Real-time trial search \|
	\| `screen_patient` \| `patient_id, nct_id` \| Eligibility screening \|
	\| `match_patient_to_trials` \| `patient_id` \| Top-N trial matches \|
	\| `generate_recruitment_outreach` \| `patient_id, nct_id, channel` \| Personalized outreach \|
	\| `get_trial_analytics` \| — \| Enrollment funnel + KPIs \|
	\| `summarize_trial_protocol` \| `nct_id` \| AI-parsed protocol summary \|

	---

	## Key API Endpoints

	\| Method \| Path \| Description \|
	\|---\|---\|---\|
	\| POST \| `/api/v1/intake/match` \| SI-unit intake → ranked trial matches \|
	\| GET \| `/api/v1/intake/biomarkers` \| Biomarker registry \|
	\| GET \| `/api/v1/trials/search` \| Real-time CT.gov search (recency-sorted, graph-enriched) \|
	\| GET \| `/api/v1/trials/{nct_id}/intelligence` \| Graph intelligence per trial \|
	\| GET \| `/api/v1/graph/patients` \| Query seeded patient IDs from Neo4j \|
	\| POST \| `/api/v1/patients/{id}/screen/{nct_id}` \| Screen FHIR patient against trial \|
	\| POST \| `/api/v1/workflow/run` \| Run full A2A pipeline \|
	\| GET \| `/api/v1/analytics/kpi` \| Dashboard KPIs \|
	\| GET \| `/api/v1/map/data` \| Site coordinates + patient clusters \|
	\| POST \| `/api/v1/graph/query` \| GraphRAG natural language query \|
	\| POST \| `/seed` \| Seed full graph from live APIs \|
	\| GET \| `/api/v1/graph/stats` \| Node and edge counts \|

	Full interactive docs: `http://localhost:8000/docs`

	---

	## Environment Variables

	\| Variable \| Description \| Default \|
	\|---\|---\|---\|
	\| `NEO4J_URI` \| Neo4j bolt URI \| `bolt://localhost:7687` \|
	\| `NEO4J_USERNAME` \| Neo4j username \| `neo4j` \|
	\| `NEO4J_PASSWORD` \| Neo4j password \| `clinicalmatch2024` \|
	\| `NEO4J_DATABASE` \| Database name \| `neo4j` \|
	\| `OPENAI_API_KEY` \| aimlapi.com API key \| — \|
	\| `OPENAI_BASE_URL` \| LLM base URL \| `https://ai.aimlapi.com/v1` \|
	\| `OPENAI_MODEL` \| Model identifier \| `claude-opus-4-7` \|
	\| `NEXT_PUBLIC_API_URL` \| Frontend API base URL \| `""` (relative, via Nginx) \|

	---

	## Frontend Pages

	\| Route \| Page \| Description \|
	\|---\|---\|---\|
	\| `/` \| Trial Finder \| Real-time CT.gov search, recency-sorted, graph intelligence on expand \|
	\| `/intake` \| Eligibility Check \| SI-unit clinical intake form, no patient ID required \|
	\| `/screening` \| Patient Screening \| FHIR patient + trial combobox, A2A pipeline with state tracker \|
	\| `/recruitment` \| Recruitment Hub \| Kanban board, AI outreach generation (PCP / email / social) \|
	\| `/dashboard` \| Dashboard \| KPI cards, enrollment funnel, demographics, site performance \|
	\| `/map` \| Site Map \| Leaflet map of trial sites and patient density clusters \|
	\| `/graph` \| GraphRAG \| Natural language queries over the knowledge graph \|