Spaces:

TheQuantEd
/

CTA

Running

CTA / README.md

Initial deployment: ClinicalMatch AI v2.0 — FHIR R4 · MCP (9 tools) · A2A workflow · SHARP compliance · 100k synthetic patients · Neo4j graph · GraphRAG chatbot

59abb4f 5 days ago

preview code

raw

history blame contribute delete

10.9 kB

metadata

title: ClinicalMatch AI
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: true

ClinicalMatch AI — Precision Clinical Trial Matching & Recruitment Agent

"Agents Assemble: Healthcare AI Endgame Challenge" — Prompt Opinion platform
Standards: FHIR R4 · MCP · A2A

80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that.

What it does

ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring.

Key capabilities:

Feature	Description
Eligibility Check	Individual enters raw clinical data (age, labs in SI units, biomarkers) — no patient ID required — and receives ranked, explainable trial matches
Trial Finder	Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph
Graph Intelligence	Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk
A2A Pipeline	5-state orchestration (INGEST → PARSE → MATCH → SCORE → RECRUIT) for FHIR patient profiles
Recruitment Hub	Kanban board tracking patients through IDENTIFIED → ENROLLED; generates personalized outreach (PCP letter, patient email, social post)
GraphRAG	Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?")
MCP Server	6 tools callable by Prompt Opinion directly via stdio transport

Architecture

Prompt Opinion Platform
        │  MCP Protocol (stdio)
        ▼
┌────────────────────────────────────────────────────┐
│  MCP Server (mcp_server.py)                        │
│  find_trials · screen_patient · match_patient      │
│  generate_outreach · get_analytics · summarize     │
└──────────────────────┬─────────────────────────────┘
                       │ A2A Orchestration
                       ▼
┌────────────────────────────────────────────────────┐
│  FastAPI Backend  (main.py, port 8000)             │
│  30+ REST endpoints                                │
├──────────┬────────────┬────────────┬───────────────┤
│ CT.gov   │  FHIR R4   │  Claude    │  Neo4j Graph  │
│ live API │  adapter   │  LLM       │  RAG + match  │
└──────────┴────────────┴────────────┴───────────────┘
                       │
                       ▼
┌────────────────────────────────────────────────────┐
│  Next.js 16 Frontend  (port 3000)                  │
│  Trial Finder · Eligibility Check · Screening      │
│  Recruitment Hub · Dashboard · Map · GraphRAG      │
└────────────────────────────────────────────────────┘
                       │  Nginx (port 7860)
                       ▼
              HuggingFace Spaces

Data sources (all free, no auth):

Source	Data
ClinicalTrials.gov v2	Real recruiting NCT trials, sorted by recency
RxNorm (NIH)	Medication RxCUI codes
ICD-10 CM (NLM)	Cancer diagnosis codes
PubMed (NCBI)	Supporting literature PMIDs
OpenFDA	Drug labels and adverse events
Synthetic	500 realistic patient profiles matched to real trials

Graph Knowledge Base

After seeding, the Neo4j graph contains:

Node type	Count	Key properties
Patient	500	age, sex, ECOG, condition, city, biomarkers[], medications[]
Trial	~250	NCT ID, eligibility criteria, phase, last_updated
Diagnosis	~130	ICD-10 codes across 10 oncology conditions
Biomarker	20	HER2+/−, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc.
Medication	16	Trastuzumab, Pembrolizumab, Olaparib, etc.
StudySite	~200	lat/lon coordinates
ELIGIBLE_FOR edges	~9,100	score, linking patients to trials

The graph grows passively — every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges.

Clinical Eligibility Check (SI Units)

The /intake page accepts raw clinical data — no patient ID or account required. Fields:

Demographics: Age (years), Sex, ECOG performance status (0–4), Disease stage (I–IV)

Biomarker status (toggles):

Breast/Gynecologic: HER2+/−, ER+, PR+, BRCA1/2 mutation, Triple-Negative
Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1
GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E
Hematology: FLT3, IDH1/2, BCR-ABL

Lab values (SI units):

Field	Unit	Conversion
Haemoglobin	g/dL	—
WBC	×10⁹/L	—
ANC	×10⁹/L	—
Platelets	×10⁹/L	—
Creatinine	μmol/L	auto-converted ÷88.4 → mg/dL for trial text
eGFR	mL/min/1.73m²	—
Bilirubin	μmol/L	auto-converted ÷17.1 → mg/dL for trial text
ALT / AST	U/L	—

Matching score breakdown:

Age 25 pts — compared against trial min/max age
Sex 15 pts — compared against trial sex restriction
ECOG 15 pts — extracted via regex from eligibility criteria text
Biomarkers 30 pts — checks whether biomarker terms appear in trial eligibility text
Lab values 15 pts — parses thresholds from text, converts SI units, checks patient values

Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links.

Running Locally (no Docker)

# 1. Start Neo4j
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community

# 2. Backend
cd backend
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cp ../.env.example ../.env.local   # fill in credentials
uvicorn main:app --reload --port 8000

# 3. Schema setup (once)
curl -X POST http://localhost:8000/setup

# 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients)
curl -X POST http://localhost:8000/seed

# 5. Frontend
cd frontend
npm install --legacy-peer-deps
npm run dev        # http://localhost:3000  (uses --webpack, not Turbopack)

# 6. MCP server (for Prompt Opinion integration)
cd backend
python mcp_server.py

Running with Docker Compose

cp .env.example .env.local   # fill in OPENAI_API_KEY etc.
docker compose up -d

# Wait ~60s for Neo4j to be healthy, then:
curl -X POST http://localhost:7860/setup
curl -X POST http://localhost:7860/seed

Services: app → http://localhost:7860 | API docs → http://localhost:7860/api/docs | Neo4j → http://localhost:7474

Deploying to HuggingFace Spaces

Create a Space → Docker SDK → blank template

Push repo to the Space:

git remote add hf https://huggingface.co/spaces/<username>/<space-name>
git push hf main

Set Repository Secrets:

OPENAI_API_KEY    = <aimlapi.com key>
OPENAI_BASE_URL   = https://ai.aimlapi.com/v1
OPENAI_MODEL      = claude-opus-4-7
NEO4J_PASSWORD    = clinicalmatch2024

After first boot, seed data:
```
POST https://<space>.hf.space/seed
```

MCP Tools (Prompt Opinion integration)

python backend/mcp_server.py   # stdio transport

Tool	Arguments	Description
`find_trials`	`condition, phase?`	Real-time trial search
`screen_patient`	`patient_id, nct_id`	Eligibility screening
`match_patient_to_trials`	`patient_id`	Top-N trial matches
`generate_recruitment_outreach`	`patient_id, nct_id, channel`	Personalized outreach
`get_trial_analytics`	—	Enrollment funnel + KPIs
`summarize_trial_protocol`	`nct_id`	AI-parsed protocol summary

Key API Endpoints

Method	Path	Description
POST	`/api/v1/intake/match`	SI-unit intake → ranked trial matches
GET	`/api/v1/intake/biomarkers`	Biomarker registry
GET	`/api/v1/trials/search`	Real-time CT.gov search (recency-sorted, graph-enriched)
GET	`/api/v1/trials/{nct_id}/intelligence`	Graph intelligence per trial
GET	`/api/v1/graph/patients`	Query seeded patient IDs from Neo4j
POST	`/api/v1/patients/{id}/screen/{nct_id}`	Screen FHIR patient against trial
POST	`/api/v1/workflow/run`	Run full A2A pipeline
GET	`/api/v1/analytics/kpi`	Dashboard KPIs
GET	`/api/v1/map/data`	Site coordinates + patient clusters
POST	`/api/v1/graph/query`	GraphRAG natural language query
POST	`/seed`	Seed full graph from live APIs
GET	`/api/v1/graph/stats`	Node and edge counts

Full interactive docs: http://localhost:8000/docs

Environment Variables

Variable	Description	Default
`NEO4J_URI`	Neo4j bolt URI	`bolt://localhost:7687`
`NEO4J_USERNAME`	Neo4j username	`neo4j`
`NEO4J_PASSWORD`	Neo4j password	`clinicalmatch2024`
`NEO4J_DATABASE`	Database name	`neo4j`
`OPENAI_API_KEY`	aimlapi.com API key	—
`OPENAI_BASE_URL`	LLM base URL	`https://ai.aimlapi.com/v1`
`OPENAI_MODEL`	Model identifier	`claude-opus-4-7`
`NEXT_PUBLIC_API_URL`	Frontend API base URL	`""` (relative, via Nginx)

Frontend Pages

Route	Page	Description
`/`	Trial Finder	Real-time CT.gov search, recency-sorted, graph intelligence on expand
`/intake`	Eligibility Check	SI-unit clinical intake form, no patient ID required
`/screening`	Patient Screening	FHIR patient + trial combobox, A2A pipeline with state tracker
`/recruitment`	Recruitment Hub	Kanban board, AI outreach generation (PCP / email / social)
`/dashboard`	Dashboard	KPI cards, enrollment funnel, demographics, site performance
`/map`	Site Map	Leaflet map of trial sites and patient density clusters
`/graph`	GraphRAG	Natural language queries over the knowledge graph