Spaces:

TheQuantEd
/

CTA

Running

File size: 10,880 Bytes

f022dec
59abb4f
 
 
f022dec
 
59abb4f
 
f022dec
 
59abb4f

---
title: ClinicalMatch AI
emoji: 🧬
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: true
---

# ClinicalMatch AI — Precision Clinical Trial Matching & Recruitment Agent

**"Agents Assemble: Healthcare AI Endgame Challenge"** — Prompt Opinion platform  
Standards: **FHIR R4 · MCP · A2A**

> 80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that.

---

## What it does

ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring.

**Key capabilities:**

| Feature | Description |
|---|---|
| **Eligibility Check** | Individual enters raw clinical data (age, labs in SI units, biomarkers) — no patient ID required — and receives ranked, explainable trial matches |
| **Trial Finder** | Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph |
| **Graph Intelligence** | Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk |
| **A2A Pipeline** | 5-state orchestration (INGEST → PARSE → MATCH → SCORE → RECRUIT) for FHIR patient profiles |
| **Recruitment Hub** | Kanban board tracking patients through IDENTIFIED → ENROLLED; generates personalized outreach (PCP letter, patient email, social post) |
| **GraphRAG** | Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?") |
| **MCP Server** | 6 tools callable by Prompt Opinion directly via stdio transport |

---

## Architecture

```
Prompt Opinion Platform
        │  MCP Protocol (stdio)
        ▼
┌────────────────────────────────────────────────────┐
│  MCP Server (mcp_server.py)                        │
│  find_trials · screen_patient · match_patient      │
│  generate_outreach · get_analytics · summarize     │
└──────────────────────┬─────────────────────────────┘
                       │ A2A Orchestration
                       ▼
┌────────────────────────────────────────────────────┐
│  FastAPI Backend  (main.py, port 8000)             │
│  30+ REST endpoints                                │
├──────────┬────────────┬────────────┬───────────────┤
│ CT.gov   │  FHIR R4   │  Claude    │  Neo4j Graph  │
│ live API │  adapter   │  LLM       │  RAG + match  │
└──────────┴────────────┴────────────┴───────────────┘
                       │
                       ▼
┌────────────────────────────────────────────────────┐
│  Next.js 16 Frontend  (port 3000)                  │
│  Trial Finder · Eligibility Check · Screening      │
│  Recruitment Hub · Dashboard · Map · GraphRAG      │
└────────────────────────────────────────────────────┘
                       │  Nginx (port 7860)
                       ▼
              HuggingFace Spaces
```

**Data sources (all free, no auth):**

| Source | Data |
|---|---|
| ClinicalTrials.gov v2 | Real recruiting NCT trials, sorted by recency |
| RxNorm (NIH) | Medication RxCUI codes |
| ICD-10 CM (NLM) | Cancer diagnosis codes |
| PubMed (NCBI) | Supporting literature PMIDs |
| OpenFDA | Drug labels and adverse events |
| Synthetic | 500 realistic patient profiles matched to real trials |

---

## Graph Knowledge Base

After seeding, the Neo4j graph contains:

| Node type | Count | Key properties |
|---|---|---|
| Patient | 500 | age, sex, ECOG, condition, city, biomarkers[], medications[] |
| Trial | ~250 | NCT ID, eligibility criteria, phase, last_updated |
| Diagnosis | ~130 | ICD-10 codes across 10 oncology conditions |
| Biomarker | 20 | HER2+/−, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc. |
| Medication | 16 | Trastuzumab, Pembrolizumab, Olaparib, etc. |
| StudySite | ~200 | lat/lon coordinates |
| **ELIGIBLE_FOR edges** | **~9,100** | score, linking patients to trials |

The graph grows passively — every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges.

---

## Clinical Eligibility Check (SI Units)

The `/intake` page accepts raw clinical data — no patient ID or account required. Fields:

**Demographics:** Age (years), Sex, ECOG performance status (0–4), Disease stage (I–IV)

**Biomarker status (toggles):**
- Breast/Gynecologic: HER2+/−, ER+, PR+, BRCA1/2 mutation, Triple-Negative
- Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1
- GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E
- Hematology: FLT3, IDH1/2, BCR-ABL

**Lab values (SI units):**

| Field | Unit | Conversion |
|---|---|---|
| Haemoglobin | g/dL | — |
| WBC | ×10⁹/L | — |
| ANC | ×10⁹/L | — |
| Platelets | ×10⁹/L | — |
| Creatinine | **μmol/L** | auto-converted ÷88.4 → mg/dL for trial text |
| eGFR | mL/min/1.73m² | — |
| Bilirubin | **μmol/L** | auto-converted ÷17.1 → mg/dL for trial text |
| ALT / AST | U/L | — |

Matching score breakdown:
- **Age** 25 pts — compared against trial min/max age
- **Sex** 15 pts — compared against trial sex restriction
- **ECOG** 15 pts — extracted via regex from eligibility criteria text
- **Biomarkers** 30 pts — checks whether biomarker terms appear in trial eligibility text
- **Lab values** 15 pts — parses thresholds from text, converts SI units, checks patient values

Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links.

---

## Running Locally (no Docker)

```bash
# 1. Start Neo4j
docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community

# 2. Backend
cd backend
python -m venv venv && source venv/bin/activate && pip install -r requirements.txt
cp ../.env.example ../.env.local   # fill in credentials
uvicorn main:app --reload --port 8000

# 3. Schema setup (once)
curl -X POST http://localhost:8000/setup

# 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients)
curl -X POST http://localhost:8000/seed

# 5. Frontend
cd frontend
npm install --legacy-peer-deps
npm run dev        # http://localhost:3000  (uses --webpack, not Turbopack)

# 6. MCP server (for Prompt Opinion integration)
cd backend
python mcp_server.py
```

---

## Running with Docker Compose

```bash
cp .env.example .env.local   # fill in OPENAI_API_KEY etc.
docker compose up -d

# Wait ~60s for Neo4j to be healthy, then:
curl -X POST http://localhost:7860/setup
curl -X POST http://localhost:7860/seed
```

Services: app → http://localhost:7860 | API docs → http://localhost:7860/api/docs | Neo4j → http://localhost:7474

---

## Deploying to HuggingFace Spaces

1. Create a Space → **Docker SDK** → blank template
2. Push repo to the Space:
   ```bash
   git remote add hf https://huggingface.co/spaces/<username>/<space-name>
   git push hf main
   ```
3. Set **Repository Secrets**:
   ```
   OPENAI_API_KEY    = <aimlapi.com key>
   OPENAI_BASE_URL   = https://ai.aimlapi.com/v1
   OPENAI_MODEL      = claude-opus-4-7
   NEO4J_PASSWORD    = clinicalmatch2024
   ```
4. After first boot, seed data:
   ```
   POST https://<space>.hf.space/seed
   ```

---

## MCP Tools (Prompt Opinion integration)

```bash
python backend/mcp_server.py   # stdio transport
```

| Tool | Arguments | Description |
|---|---|---|
| `find_trials` | `condition, phase?` | Real-time trial search |
| `screen_patient` | `patient_id, nct_id` | Eligibility screening |
| `match_patient_to_trials` | `patient_id` | Top-N trial matches |
| `generate_recruitment_outreach` | `patient_id, nct_id, channel` | Personalized outreach |
| `get_trial_analytics` | — | Enrollment funnel + KPIs |
| `summarize_trial_protocol` | `nct_id` | AI-parsed protocol summary |

---

## Key API Endpoints

| Method | Path | Description |
|---|---|---|
| POST | `/api/v1/intake/match` | SI-unit intake → ranked trial matches |
| GET | `/api/v1/intake/biomarkers` | Biomarker registry |
| GET | `/api/v1/trials/search` | Real-time CT.gov search (recency-sorted, graph-enriched) |
| GET | `/api/v1/trials/{nct_id}/intelligence` | Graph intelligence per trial |
| GET | `/api/v1/graph/patients` | Query seeded patient IDs from Neo4j |
| POST | `/api/v1/patients/{id}/screen/{nct_id}` | Screen FHIR patient against trial |
| POST | `/api/v1/workflow/run` | Run full A2A pipeline |
| GET | `/api/v1/analytics/kpi` | Dashboard KPIs |
| GET | `/api/v1/map/data` | Site coordinates + patient clusters |
| POST | `/api/v1/graph/query` | GraphRAG natural language query |
| POST | `/seed` | Seed full graph from live APIs |
| GET | `/api/v1/graph/stats` | Node and edge counts |

Full interactive docs: `http://localhost:8000/docs`

---

## Environment Variables

| Variable | Description | Default |
|---|---|---|
| `NEO4J_URI` | Neo4j bolt URI | `bolt://localhost:7687` |
| `NEO4J_USERNAME` | Neo4j username | `neo4j` |
| `NEO4J_PASSWORD` | Neo4j password | `clinicalmatch2024` |
| `NEO4J_DATABASE` | Database name | `neo4j` |
| `OPENAI_API_KEY` | aimlapi.com API key | — |
| `OPENAI_BASE_URL` | LLM base URL | `https://ai.aimlapi.com/v1` |
| `OPENAI_MODEL` | Model identifier | `claude-opus-4-7` |
| `NEXT_PUBLIC_API_URL` | Frontend API base URL | `""` (relative, via Nginx) |

---

## Frontend Pages

| Route | Page | Description |
|---|---|---|
| `/` | Trial Finder | Real-time CT.gov search, recency-sorted, graph intelligence on expand |
| `/intake` | Eligibility Check | SI-unit clinical intake form, no patient ID required |
| `/screening` | Patient Screening | FHIR patient + trial combobox, A2A pipeline with state tracker |
| `/recruitment` | Recruitment Hub | Kanban board, AI outreach generation (PCP / email / social) |
| `/dashboard` | Dashboard | KPI cards, enrollment funnel, demographics, site performance |
| `/map` | Site Map | Leaflet map of trial sites and patient density clusters |
| `/graph` | GraphRAG | Natural language queries over the knowledge graph |