CTA / README.md
TheQuantEd's picture
Initial deployment: ClinicalMatch AI v2.0 β€” FHIR R4 Β· MCP (9 tools) Β· A2A workflow Β· SHARP compliance Β· 100k synthetic patients Β· Neo4j graph Β· GraphRAG chatbot
59abb4f
---
title: ClinicalMatch AI
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: true
---
# ClinicalMatch AI β€” Precision Clinical Trial Matching & Recruitment Agent
**"Agents Assemble: Healthcare AI Endgame Challenge"** β€” Prompt Opinion platform
Standards: **FHIR R4 Β· MCP Β· A2A**
> 80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that.
---
## What it does
ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring.
**Key capabilities:**
| Feature | Description |
|---|---|
| **Eligibility Check** | Individual enters raw clinical data (age, labs in SI units, biomarkers) β€” no patient ID required β€” and receives ranked, explainable trial matches |
| **Trial Finder** | Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph |
| **Graph Intelligence** | Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk |
| **A2A Pipeline** | 5-state orchestration (INGEST β†’ PARSE β†’ MATCH β†’ SCORE β†’ RECRUIT) for FHIR patient profiles |
| **Recruitment Hub** | Kanban board tracking patients through IDENTIFIED β†’ ENROLLED; generates personalized outreach (PCP letter, patient email, social post) |
| **GraphRAG** | Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?") |
| **MCP Server** | 6 tools callable by Prompt Opinion directly via stdio transport |
---
## Architecture
```
Prompt Opinion Platform
β”‚ MCP Protocol (stdio)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MCP Server (mcp_server.py) β”‚
β”‚ find_trials Β· screen_patient Β· match_patient β”‚
β”‚ generate_outreach Β· get_analytics Β· summarize β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ A2A Orchestration
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FastAPI Backend (main.py, port 8000) β”‚
β”‚ 30+ REST endpoints β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ CT.gov β”‚ FHIR R4 β”‚ Claude β”‚ Neo4j Graph β”‚
β”‚ live API β”‚ adapter β”‚ LLM β”‚ RAG + match β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Next.js 16 Frontend (port 3000) β”‚
β”‚ Trial Finder Β· Eligibility Check Β· Screening β”‚
β”‚ Recruitment Hub Β· Dashboard Β· Map Β· GraphRAG β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ Nginx (port 7860)
β–Ό
HuggingFace Spaces
```
**Data sources (all free, no auth):**
| Source | Data |
|---|---|
| ClinicalTrials.gov v2 | Real recruiting NCT trials, sorted by recency |
| RxNorm (NIH) | Medication RxCUI codes |
| ICD-10 CM (NLM) | Cancer diagnosis codes |
| PubMed (NCBI) | Supporting literature PMIDs |
| OpenFDA | Drug labels and adverse events |
| Synthetic | 500 realistic patient profiles matched to real trials |
---
## Graph Knowledge Base
After seeding, the Neo4j graph contains:
| Node type | Count | Key properties |
|---|---|---|
| Patient | 500 | age, sex, ECOG, condition, city, biomarkers[], medications[] |
| Trial | ~250 | NCT ID, eligibility criteria, phase, last_updated |
| Diagnosis | ~130 | ICD-10 codes across 10 oncology conditions |
| Biomarker | 20 | HER2+/βˆ’, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc. |
| Medication | 16 | Trastuzumab, Pembrolizumab, Olaparib, etc. |
| StudySite | ~200 | lat/lon coordinates |
| **ELIGIBLE_FOR edges** | **~9,100** | score, linking patients to trials |
The graph grows passively β€” every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges.
---
## Clinical Eligibility Check (SI Units)
The `/intake` page accepts raw clinical data β€” no patient ID or account required. Fields:
**Demographics:** Age (years), Sex, ECOG performance status (0–4), Disease stage (I–IV)
**Biomarker status (toggles):**
- Breast/Gynecologic: HER2+/βˆ’, ER+, PR+, BRCA1/2 mutation, Triple-Negative
- Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1
- GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E
- Hematology: FLT3, IDH1/2, BCR-ABL
**Lab values (SI units):**
| Field | Unit | Conversion |
|---|---|---|
| Haemoglobin | g/dL | β€” |
| WBC | Γ—10⁹/L | β€” |
| ANC | Γ—10⁹/L | β€” |
| Platelets | Γ—10⁹/L | β€” |
| Creatinine | **ΞΌmol/L** | auto-converted Γ·88.4 β†’ mg/dL for trial text |
| eGFR | mL/min/1.73mΒ² | β€” |
| Bilirubin | **ΞΌmol/L** | auto-converted Γ·17.1 β†’ mg/dL for trial text |
| ALT / AST | U/L | β€” |
Matching score breakdown:
- **Age** 25 pts β€” compared against trial min/max age
- **Sex** 15 pts β€” compared against trial sex restriction
- **ECOG** 15 pts β€” extracted via regex from eligibility criteria text
- **Biomarkers** 30 pts β€” checks whether biomarker terms appear in trial eligibility text
- **Lab values** 15 pts β€” parses thresholds from text, converts SI units, checks patient values
Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links.
---
## Running Locally (no Docker)
```bash
# 1. Start Neo4j
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community
# 2. Backend
cd backend
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cp ../.env.example ../.env.local # fill in credentials
uvicorn main:app --reload --port 8000
# 3. Schema setup (once)
curl -X POST http://localhost:8000/setup
# 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients)
curl -X POST http://localhost:8000/seed
# 5. Frontend
cd frontend
npm install --legacy-peer-deps
npm run dev # http://localhost:3000 (uses --webpack, not Turbopack)
# 6. MCP server (for Prompt Opinion integration)
cd backend
python mcp_server.py
```
---
## Running with Docker Compose
```bash
cp .env.example .env.local # fill in OPENAI_API_KEY etc.
docker compose up -d
# Wait ~60s for Neo4j to be healthy, then:
curl -X POST http://localhost:7860/setup
curl -X POST http://localhost:7860/seed
```
Services: app β†’ http://localhost:7860 | API docs β†’ http://localhost:7860/api/docs | Neo4j β†’ http://localhost:7474
---
## Deploying to HuggingFace Spaces
1. Create a Space β†’ **Docker SDK** β†’ blank template
2. Push repo to the Space:
```bash
git remote add hf https://huggingface.co/spaces/<username>/<space-name>
git push hf main
```
3. Set **Repository Secrets**:
```
OPENAI_API_KEY = <aimlapi.com key>
OPENAI_BASE_URL = https://ai.aimlapi.com/v1
OPENAI_MODEL = claude-opus-4-7
NEO4J_PASSWORD = clinicalmatch2024
```
4. After first boot, seed data:
```
POST https://<space>.hf.space/seed
```
---
## MCP Tools (Prompt Opinion integration)
```bash
python backend/mcp_server.py # stdio transport
```
| Tool | Arguments | Description |
|---|---|---|
| `find_trials` | `condition, phase?` | Real-time trial search |
| `screen_patient` | `patient_id, nct_id` | Eligibility screening |
| `match_patient_to_trials` | `patient_id` | Top-N trial matches |
| `generate_recruitment_outreach` | `patient_id, nct_id, channel` | Personalized outreach |
| `get_trial_analytics` | β€” | Enrollment funnel + KPIs |
| `summarize_trial_protocol` | `nct_id` | AI-parsed protocol summary |
---
## Key API Endpoints
| Method | Path | Description |
|---|---|---|
| POST | `/api/v1/intake/match` | SI-unit intake β†’ ranked trial matches |
| GET | `/api/v1/intake/biomarkers` | Biomarker registry |
| GET | `/api/v1/trials/search` | Real-time CT.gov search (recency-sorted, graph-enriched) |
| GET | `/api/v1/trials/{nct_id}/intelligence` | Graph intelligence per trial |
| GET | `/api/v1/graph/patients` | Query seeded patient IDs from Neo4j |
| POST | `/api/v1/patients/{id}/screen/{nct_id}` | Screen FHIR patient against trial |
| POST | `/api/v1/workflow/run` | Run full A2A pipeline |
| GET | `/api/v1/analytics/kpi` | Dashboard KPIs |
| GET | `/api/v1/map/data` | Site coordinates + patient clusters |
| POST | `/api/v1/graph/query` | GraphRAG natural language query |
| POST | `/seed` | Seed full graph from live APIs |
| GET | `/api/v1/graph/stats` | Node and edge counts |
Full interactive docs: `http://localhost:8000/docs`
---
## Environment Variables
| Variable | Description | Default |
|---|---|---|
| `NEO4J_URI` | Neo4j bolt URI | `bolt://localhost:7687` |
| `NEO4J_USERNAME` | Neo4j username | `neo4j` |
| `NEO4J_PASSWORD` | Neo4j password | `clinicalmatch2024` |
| `NEO4J_DATABASE` | Database name | `neo4j` |
| `OPENAI_API_KEY` | aimlapi.com API key | β€” |
| `OPENAI_BASE_URL` | LLM base URL | `https://ai.aimlapi.com/v1` |
| `OPENAI_MODEL` | Model identifier | `claude-opus-4-7` |
| `NEXT_PUBLIC_API_URL` | Frontend API base URL | `""` (relative, via Nginx) |
---
## Frontend Pages
| Route | Page | Description |
|---|---|---|
| `/` | Trial Finder | Real-time CT.gov search, recency-sorted, graph intelligence on expand |
| `/intake` | Eligibility Check | SI-unit clinical intake form, no patient ID required |
| `/screening` | Patient Screening | FHIR patient + trial combobox, A2A pipeline with state tracker |
| `/recruitment` | Recruitment Hub | Kanban board, AI outreach generation (PCP / email / social) |
| `/dashboard` | Dashboard | KPI cards, enrollment funnel, demographics, site performance |
| `/map` | Site Map | Leaflet map of trial sites and patient density clusters |
| `/graph` | GraphRAG | Natural language queries over the knowledge graph |