CTA / README.md
TheQuantEd's picture
Initial deployment: ClinicalMatch AI v2.0 β€” FHIR R4 Β· MCP (9 tools) Β· A2A workflow Β· SHARP compliance Β· 100k synthetic patients Β· Neo4j graph Β· GraphRAG chatbot
59abb4f
metadata
title: ClinicalMatch AI
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: true

ClinicalMatch AI β€” Precision Clinical Trial Matching & Recruitment Agent

"Agents Assemble: Healthcare AI Endgame Challenge" β€” Prompt Opinion platform
Standards: FHIR R4 Β· MCP Β· A2A

80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that.


What it does

ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring.

Key capabilities:

Feature Description
Eligibility Check Individual enters raw clinical data (age, labs in SI units, biomarkers) β€” no patient ID required β€” and receives ranked, explainable trial matches
Trial Finder Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph
Graph Intelligence Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk
A2A Pipeline 5-state orchestration (INGEST β†’ PARSE β†’ MATCH β†’ SCORE β†’ RECRUIT) for FHIR patient profiles
Recruitment Hub Kanban board tracking patients through IDENTIFIED β†’ ENROLLED; generates personalized outreach (PCP letter, patient email, social post)
GraphRAG Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?")
MCP Server 6 tools callable by Prompt Opinion directly via stdio transport

Architecture

Prompt Opinion Platform
        β”‚  MCP Protocol (stdio)
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MCP Server (mcp_server.py)                        β”‚
β”‚  find_trials Β· screen_patient Β· match_patient      β”‚
β”‚  generate_outreach Β· get_analytics Β· summarize     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚ A2A Orchestration
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FastAPI Backend  (main.py, port 8000)             β”‚
β”‚  30+ REST endpoints                                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ CT.gov   β”‚  FHIR R4   β”‚  Claude    β”‚  Neo4j Graph  β”‚
β”‚ live API β”‚  adapter   β”‚  LLM       β”‚  RAG + match  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Next.js 16 Frontend  (port 3000)                  β”‚
β”‚  Trial Finder Β· Eligibility Check Β· Screening      β”‚
β”‚  Recruitment Hub Β· Dashboard Β· Map Β· GraphRAG      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚  Nginx (port 7860)
                       β–Ό
              HuggingFace Spaces

Data sources (all free, no auth):

Source Data
ClinicalTrials.gov v2 Real recruiting NCT trials, sorted by recency
RxNorm (NIH) Medication RxCUI codes
ICD-10 CM (NLM) Cancer diagnosis codes
PubMed (NCBI) Supporting literature PMIDs
OpenFDA Drug labels and adverse events
Synthetic 500 realistic patient profiles matched to real trials

Graph Knowledge Base

After seeding, the Neo4j graph contains:

Node type Count Key properties
Patient 500 age, sex, ECOG, condition, city, biomarkers[], medications[]
Trial ~250 NCT ID, eligibility criteria, phase, last_updated
Diagnosis ~130 ICD-10 codes across 10 oncology conditions
Biomarker 20 HER2+/βˆ’, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc.
Medication 16 Trastuzumab, Pembrolizumab, Olaparib, etc.
StudySite ~200 lat/lon coordinates
ELIGIBLE_FOR edges ~9,100 score, linking patients to trials

The graph grows passively β€” every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges.


Clinical Eligibility Check (SI Units)

The /intake page accepts raw clinical data β€” no patient ID or account required. Fields:

Demographics: Age (years), Sex, ECOG performance status (0–4), Disease stage (I–IV)

Biomarker status (toggles):

  • Breast/Gynecologic: HER2+/βˆ’, ER+, PR+, BRCA1/2 mutation, Triple-Negative
  • Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1
  • GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E
  • Hematology: FLT3, IDH1/2, BCR-ABL

Lab values (SI units):

Field Unit Conversion
Haemoglobin g/dL β€”
WBC Γ—10⁹/L β€”
ANC Γ—10⁹/L β€”
Platelets Γ—10⁹/L β€”
Creatinine ΞΌmol/L auto-converted Γ·88.4 β†’ mg/dL for trial text
eGFR mL/min/1.73mΒ² β€”
Bilirubin ΞΌmol/L auto-converted Γ·17.1 β†’ mg/dL for trial text
ALT / AST U/L β€”

Matching score breakdown:

  • Age 25 pts β€” compared against trial min/max age
  • Sex 15 pts β€” compared against trial sex restriction
  • ECOG 15 pts β€” extracted via regex from eligibility criteria text
  • Biomarkers 30 pts β€” checks whether biomarker terms appear in trial eligibility text
  • Lab values 15 pts β€” parses thresholds from text, converts SI units, checks patient values

Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links.


Running Locally (no Docker)

# 1. Start Neo4j
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community

# 2. Backend
cd backend
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cp ../.env.example ../.env.local   # fill in credentials
uvicorn main:app --reload --port 8000

# 3. Schema setup (once)
curl -X POST http://localhost:8000/setup

# 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients)
curl -X POST http://localhost:8000/seed

# 5. Frontend
cd frontend
npm install --legacy-peer-deps
npm run dev        # http://localhost:3000  (uses --webpack, not Turbopack)

# 6. MCP server (for Prompt Opinion integration)
cd backend
python mcp_server.py

Running with Docker Compose

cp .env.example .env.local   # fill in OPENAI_API_KEY etc.
docker compose up -d

# Wait ~60s for Neo4j to be healthy, then:
curl -X POST http://localhost:7860/setup
curl -X POST http://localhost:7860/seed

Services: app β†’ http://localhost:7860 | API docs β†’ http://localhost:7860/api/docs | Neo4j β†’ http://localhost:7474


Deploying to HuggingFace Spaces

  1. Create a Space β†’ Docker SDK β†’ blank template
  2. Push repo to the Space:
    git remote add hf https://huggingface.co/spaces/<username>/<space-name>
    git push hf main
    
  3. Set Repository Secrets:
    OPENAI_API_KEY    = <aimlapi.com key>
    OPENAI_BASE_URL   = https://ai.aimlapi.com/v1
    OPENAI_MODEL      = claude-opus-4-7
    NEO4J_PASSWORD    = clinicalmatch2024
    
  4. After first boot, seed data:
    POST https://<space>.hf.space/seed
    

MCP Tools (Prompt Opinion integration)

python backend/mcp_server.py   # stdio transport
Tool Arguments Description
find_trials condition, phase? Real-time trial search
screen_patient patient_id, nct_id Eligibility screening
match_patient_to_trials patient_id Top-N trial matches
generate_recruitment_outreach patient_id, nct_id, channel Personalized outreach
get_trial_analytics β€” Enrollment funnel + KPIs
summarize_trial_protocol nct_id AI-parsed protocol summary

Key API Endpoints

Method Path Description
POST /api/v1/intake/match SI-unit intake β†’ ranked trial matches
GET /api/v1/intake/biomarkers Biomarker registry
GET /api/v1/trials/search Real-time CT.gov search (recency-sorted, graph-enriched)
GET /api/v1/trials/{nct_id}/intelligence Graph intelligence per trial
GET /api/v1/graph/patients Query seeded patient IDs from Neo4j
POST /api/v1/patients/{id}/screen/{nct_id} Screen FHIR patient against trial
POST /api/v1/workflow/run Run full A2A pipeline
GET /api/v1/analytics/kpi Dashboard KPIs
GET /api/v1/map/data Site coordinates + patient clusters
POST /api/v1/graph/query GraphRAG natural language query
POST /seed Seed full graph from live APIs
GET /api/v1/graph/stats Node and edge counts

Full interactive docs: http://localhost:8000/docs


Environment Variables

Variable Description Default
NEO4J_URI Neo4j bolt URI bolt://localhost:7687
NEO4J_USERNAME Neo4j username neo4j
NEO4J_PASSWORD Neo4j password clinicalmatch2024
NEO4J_DATABASE Database name neo4j
OPENAI_API_KEY aimlapi.com API key β€”
OPENAI_BASE_URL LLM base URL https://ai.aimlapi.com/v1
OPENAI_MODEL Model identifier claude-opus-4-7
NEXT_PUBLIC_API_URL Frontend API base URL "" (relative, via Nginx)

Frontend Pages

Route Page Description
/ Trial Finder Real-time CT.gov search, recency-sorted, graph intelligence on expand
/intake Eligibility Check SI-unit clinical intake form, no patient ID required
/screening Patient Screening FHIR patient + trial combobox, A2A pipeline with state tracker
/recruitment Recruitment Hub Kanban board, AI outreach generation (PCP / email / social)
/dashboard Dashboard KPI cards, enrollment funnel, demographics, site performance
/map Site Map Leaflet map of trial sites and patient density clusters
/graph GraphRAG Natural language queries over the knowledge graph