--- title: ClinicalMatch AI emoji: ๐Ÿงฌ colorFrom: indigo colorTo: purple sdk: docker app_port: 7860 pinned: true --- # ClinicalMatch AI โ€” Precision Clinical Trial Matching & Recruitment Agent **"Agents Assemble: Healthcare AI Endgame Challenge"** โ€” Prompt Opinion platform Standards: **FHIR R4 ยท MCP ยท A2A** > 80% of clinical trials fail to meet enrollment deadlines. 85% of eligible patients are never identified. This agent directly addresses that. --- ## What it does ClinicalMatch AI is a full-stack AI agent that matches patients to recruiting clinical trials using a knowledge graph, real-time data from ClinicalTrials.gov, and structured clinical eligibility scoring. **Key capabilities:** | Feature | Description | |---|---| | **Eligibility Check** | Individual enters raw clinical data (age, labs in SI units, biomarkers) โ€” no patient ID required โ€” and receives ranked, explainable trial matches | | **Trial Finder** | Real-time search of ClinicalTrials.gov sorted by most recently updated; results auto-ingest into the knowledge graph | | **Graph Intelligence** | Per-trial: eligible patient count, top biomarkers among matches, similar trials via graph-neighborhood walk | | **A2A Pipeline** | 5-state orchestration (INGEST โ†’ PARSE โ†’ MATCH โ†’ SCORE โ†’ RECRUIT) for FHIR patient profiles | | **Recruitment Hub** | Kanban board tracking patients through IDENTIFIED โ†’ ENROLLED; generates personalized outreach (PCP letter, patient email, social post) | | **GraphRAG** | Natural language queries over the knowledge graph ("which patients are eligible for breast cancer trials?") | | **MCP Server** | 6 tools callable by Prompt Opinion directly via stdio transport | --- ## Architecture ``` Prompt Opinion Platform โ”‚ MCP Protocol (stdio) โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ MCP Server (mcp_server.py) โ”‚ โ”‚ find_trials ยท screen_patient ยท match_patient โ”‚ โ”‚ generate_outreach ยท get_analytics ยท summarize โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ A2A Orchestration โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ FastAPI Backend (main.py, port 8000) โ”‚ โ”‚ 30+ REST endpoints โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ CT.gov โ”‚ FHIR R4 โ”‚ Claude โ”‚ Neo4j Graph โ”‚ โ”‚ live API โ”‚ adapter โ”‚ LLM โ”‚ RAG + match โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Next.js 16 Frontend (port 3000) โ”‚ โ”‚ Trial Finder ยท Eligibility Check ยท Screening โ”‚ โ”‚ Recruitment Hub ยท Dashboard ยท Map ยท GraphRAG โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ Nginx (port 7860) โ–ผ HuggingFace Spaces ``` **Data sources (all free, no auth):** | Source | Data | |---|---| | ClinicalTrials.gov v2 | Real recruiting NCT trials, sorted by recency | | RxNorm (NIH) | Medication RxCUI codes | | ICD-10 CM (NLM) | Cancer diagnosis codes | | PubMed (NCBI) | Supporting literature PMIDs | | OpenFDA | Drug labels and adverse events | | Synthetic | 500 realistic patient profiles matched to real trials | --- ## Graph Knowledge Base After seeding, the Neo4j graph contains: | Node type | Count | Key properties | |---|---|---| | Patient | 500 | age, sex, ECOG, condition, city, biomarkers[], medications[] | | Trial | ~250 | NCT ID, eligibility criteria, phase, last_updated | | Diagnosis | ~130 | ICD-10 codes across 10 oncology conditions | | Biomarker | 20 | HER2+/โˆ’, EGFR, ALK, BRCA1/2, MSI-H, FLT3, etc. | | Medication | 16 | Trastuzumab, Pembrolizumab, Olaparib, etc. | | StudySite | ~200 | lat/lon coordinates | | **ELIGIBLE_FOR edges** | **~9,100** | score, linking patients to trials | The graph grows passively โ€” every Trial Finder search automatically upserts new Trial and StudySite nodes. Every Eligibility Check submission (with "Save to graph" enabled) adds a new Patient node with biomarker edges. --- ## Clinical Eligibility Check (SI Units) The `/intake` page accepts raw clinical data โ€” no patient ID or account required. Fields: **Demographics:** Age (years), Sex, ECOG performance status (0โ€“4), Disease stage (Iโ€“IV) **Biomarker status (toggles):** - Breast/Gynecologic: HER2+/โˆ’, ER+, PR+, BRCA1/2 mutation, Triple-Negative - Lung (NSCLC): EGFR mutation, ALK, ROS1 rearrangement, PD-L1 - GI/Colorectal: MSI-High, KRAS wild-type, BRAF V600E - Hematology: FLT3, IDH1/2, BCR-ABL **Lab values (SI units):** | Field | Unit | Conversion | |---|---|---| | Haemoglobin | g/dL | โ€” | | WBC | ร—10โน/L | โ€” | | ANC | ร—10โน/L | โ€” | | Platelets | ร—10โน/L | โ€” | | Creatinine | **ฮผmol/L** | auto-converted รท88.4 โ†’ mg/dL for trial text | | eGFR | mL/min/1.73mยฒ | โ€” | | Bilirubin | **ฮผmol/L** | auto-converted รท17.1 โ†’ mg/dL for trial text | | ALT / AST | U/L | โ€” | Matching score breakdown: - **Age** 25 pts โ€” compared against trial min/max age - **Sex** 15 pts โ€” compared against trial sex restriction - **ECOG** 15 pts โ€” extracted via regex from eligibility criteria text - **Biomarkers** 30 pts โ€” checks whether biomarker terms appear in trial eligibility text - **Lab values** 15 pts โ€” parses thresholds from text, converts SI units, checks patient values Results are ranked by score with pass/fail/uncertain per criterion and direct ClinicalTrials.gov links. --- ## Running Locally (no Docker) ```bash # 1. Start Neo4j docker run -d --name neo4j -p 7474:7474 -p 7687:7687 -e NEO4J_AUTH=neo4j/clinicalmatch2024 neo4j:5.18-community # 2. Backend cd backend python -m venv venv && source venv/bin/activate && pip install -r requirements.txt cp ../.env.example ../.env.local # fill in credentials uvicorn main:app --reload --port 8000 # 3. Schema setup (once) curl -X POST http://localhost:8000/setup # 4. Seed graph data from live APIs (~15 min, ~250 real trials + 500 patients) curl -X POST http://localhost:8000/seed # 5. Frontend cd frontend npm install --legacy-peer-deps npm run dev # http://localhost:3000 (uses --webpack, not Turbopack) # 6. MCP server (for Prompt Opinion integration) cd backend python mcp_server.py ``` --- ## Running with Docker Compose ```bash cp .env.example .env.local # fill in OPENAI_API_KEY etc. docker compose up -d # Wait ~60s for Neo4j to be healthy, then: curl -X POST http://localhost:7860/setup curl -X POST http://localhost:7860/seed ``` Services: app โ†’ http://localhost:7860 | API docs โ†’ http://localhost:7860/api/docs | Neo4j โ†’ http://localhost:7474 --- ## Deploying to HuggingFace Spaces 1. Create a Space โ†’ **Docker SDK** โ†’ blank template 2. Push repo to the Space: ```bash git remote add hf https://huggingface.co/spaces// git push hf main ``` 3. Set **Repository Secrets**: ``` OPENAI_API_KEY = OPENAI_BASE_URL = https://ai.aimlapi.com/v1 OPENAI_MODEL = claude-opus-4-7 NEO4J_PASSWORD = clinicalmatch2024 ``` 4. After first boot, seed data: ``` POST https://.hf.space/seed ``` --- ## MCP Tools (Prompt Opinion integration) ```bash python backend/mcp_server.py # stdio transport ``` | Tool | Arguments | Description | |---|---|---| | `find_trials` | `condition, phase?` | Real-time trial search | | `screen_patient` | `patient_id, nct_id` | Eligibility screening | | `match_patient_to_trials` | `patient_id` | Top-N trial matches | | `generate_recruitment_outreach` | `patient_id, nct_id, channel` | Personalized outreach | | `get_trial_analytics` | โ€” | Enrollment funnel + KPIs | | `summarize_trial_protocol` | `nct_id` | AI-parsed protocol summary | --- ## Key API Endpoints | Method | Path | Description | |---|---|---| | POST | `/api/v1/intake/match` | SI-unit intake โ†’ ranked trial matches | | GET | `/api/v1/intake/biomarkers` | Biomarker registry | | GET | `/api/v1/trials/search` | Real-time CT.gov search (recency-sorted, graph-enriched) | | GET | `/api/v1/trials/{nct_id}/intelligence` | Graph intelligence per trial | | GET | `/api/v1/graph/patients` | Query seeded patient IDs from Neo4j | | POST | `/api/v1/patients/{id}/screen/{nct_id}` | Screen FHIR patient against trial | | POST | `/api/v1/workflow/run` | Run full A2A pipeline | | GET | `/api/v1/analytics/kpi` | Dashboard KPIs | | GET | `/api/v1/map/data` | Site coordinates + patient clusters | | POST | `/api/v1/graph/query` | GraphRAG natural language query | | POST | `/seed` | Seed full graph from live APIs | | GET | `/api/v1/graph/stats` | Node and edge counts | Full interactive docs: `http://localhost:8000/docs` --- ## Environment Variables | Variable | Description | Default | |---|---|---| | `NEO4J_URI` | Neo4j bolt URI | `bolt://localhost:7687` | | `NEO4J_USERNAME` | Neo4j username | `neo4j` | | `NEO4J_PASSWORD` | Neo4j password | `clinicalmatch2024` | | `NEO4J_DATABASE` | Database name | `neo4j` | | `OPENAI_API_KEY` | aimlapi.com API key | โ€” | | `OPENAI_BASE_URL` | LLM base URL | `https://ai.aimlapi.com/v1` | | `OPENAI_MODEL` | Model identifier | `claude-opus-4-7` | | `NEXT_PUBLIC_API_URL` | Frontend API base URL | `""` (relative, via Nginx) | --- ## Frontend Pages | Route | Page | Description | |---|---|---| | `/` | Trial Finder | Real-time CT.gov search, recency-sorted, graph intelligence on expand | | `/intake` | Eligibility Check | SI-unit clinical intake form, no patient ID required | | `/screening` | Patient Screening | FHIR patient + trial combobox, A2A pipeline with state tracker | | `/recruitment` | Recruitment Hub | Kanban board, AI outreach generation (PCP / email / social) | | `/dashboard` | Dashboard | KPI cards, enrollment funnel, demographics, site performance | | `/map` | Site Map | Leaflet map of trial sites and patient density clusters | | `/graph` | GraphRAG | Natural language queries over the knowledge graph |