CTA / CLAUDE.md
TheQuantEd's picture
Initial deployment: ClinicalMatch AI v2.0 β€” FHIR R4 Β· MCP (9 tools) Β· A2A workflow Β· SHARP compliance Β· 100k synthetic patients Β· Neo4j graph Β· GraphRAG chatbot
59abb4f

ClinicalMatch AI β€” Agent Instructions

Project memory (build state, completed features, constraints) is also tracked in .claude/project_memory.md in this repo.

This is a hackathon submission for "Agents Assemble: Healthcare AI Endgame Challenge" on the Prompt Opinion platform. Judging criteria: MCP compliance, A2A workflow, FHIR R4 standards, AI quality, impact, feasibility.

Stack at a glance

Layer Technology
Backend FastAPI (Python 3.12), uvicorn
Graph DB Neo4j Community 5.x via bolt
LLM claude-opus-4-7 via aimlapi.com (OpenAI-compatible)
GraphRAG LangChain GraphCypherQAChain + custom Cypher prompt
Frontend Next.js 16 (webpack mode), React 19, Tailwind CSS 3, Recharts, Leaflet
Standards FHIR R4 Β· MCP (stdio) Β· A2A state machine

Critical: LLM API

Never use the Anthropic SDK directly. All LLM calls go through aimlapi.com or a compatible alternative using the OpenAI-compatible interface:

from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url=os.getenv("OPENAI_BASE_URL", "https://ai.aimlapi.com/v1"),
)
model = os.getenv("OPENAI_MODEL", "claude-opus-4-7")

See backend/llm_client.py for the canonical pattern. Do not add import anthropic anywhere.

Starting the services

# Backend β€” always use --reload for hot reload
cd backend && source venv/bin/activate
uvicorn main:app --reload --port 8000

# Frontend β€” always use --webpack (Turbopack is broken on this system)
cd frontend && npm run dev   # runs: next dev --webpack

# MCP server (separate process, stdio transport)
cd backend && python mcp_server.py

# Seed graph data (~15 min first run)
curl -X POST http://localhost:8000/seed

After changing backend Python files, uvicorn --reload should pick them up. If a 404 appears for a newly-added endpoint or old errors persist, the server needs a manual restart β€” kill the process and re-run the uvicorn command.

Project layout

promptop/
β”œβ”€β”€ CLAUDE.md                   ← you are here
β”œβ”€β”€ README.md                   ← user-facing docs
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                 ← FastAPI app, all routes
β”‚   β”œβ”€β”€ clinicaltrials_api.py   ← ClinicalTrials.gov v2 API (async + sync)
β”‚   β”œβ”€β”€ intake_matching.py      ← SI-unit clinical intake β†’ trial scoring
β”‚   β”œβ”€β”€ trial_enrichment.py     ← passive graph enrichment on search
β”‚   β”œβ”€β”€ matching_engine.py      ← FHIR patient β†’ trial scoring (LLM-assisted)
β”‚   β”œβ”€β”€ a2a_workflow.py         ← A2A state machine (INGESTβ†’PARSEβ†’MATCHβ†’SCOREβ†’RECRUIT)
β”‚   β”œβ”€β”€ graphrag.py             ← LangChain GraphCypherQAChain with custom prompt
β”‚   β”œβ”€β”€ graph_seeder.py         ← seeds 500 patients + real NCT trials from APIs
β”‚   β”œβ”€β”€ fhir_adapter.py         ← FHIR R4 patient models (P001–P005 mock patients)
β”‚   β”œβ”€β”€ neo4j_setup.py          ← Neo4j connection + schema setup
β”‚   β”œβ”€β”€ analytics.py            ← dashboard KPIs, funnel, demographics, map data
β”‚   β”œβ”€β”€ recruitment_pipeline.py ← kanban board, outreach generation
β”‚   β”œβ”€β”€ llm_client.py           ← all LLM calls (aimlapi.com / claude-opus-4-7)
β”‚   β”œβ”€β”€ mcp_server.py           ← MCP stdio server (6 tools)
β”‚   └── requirements.txt
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/app/
β”‚   β”‚   β”œβ”€β”€ page.tsx            ← Trial Finder (real-time CT.gov, recency sort)
β”‚   β”‚   β”œβ”€β”€ intake/page.tsx     ← Eligibility Check (SI-unit clinical intake form)
β”‚   β”‚   β”œβ”€β”€ screening/page.tsx  ← Patient Screening (A2A pipeline, FHIR patients)
β”‚   β”‚   β”œβ”€β”€ recruitment/page.tsx← Recruitment Hub (kanban + outreach generation)
β”‚   β”‚   β”œβ”€β”€ dashboard/page.tsx  ← Analytics dashboard (Recharts)
β”‚   β”‚   β”œβ”€β”€ map/page.tsx        ← Leaflet site map
β”‚   β”‚   β”œβ”€β”€ graph/page.tsx      ← GraphRAG natural language query
β”‚   β”‚   └── layout.tsx          ← App shell with Sidebar
β”‚   β”œβ”€β”€ src/components/
β”‚   β”‚   β”œβ”€β”€ Sidebar.tsx         ← Navigation sidebar
β”‚   β”‚   └── MapComponent.tsx    ← Raw Leaflet map (no react-leaflet SSR issues)
β”‚   β”œβ”€β”€ src/lib/api.ts          ← Typed API client for all backend endpoints
β”‚   └── next.config.ts          ← webpack mode, filesystem cache, optimizePackageImports
└── docker/                     ← Docker + Nginx for HuggingFace Spaces deployment

Neo4j graph schema

(Patient)         id, name, age, sex, ecog, condition, city, state, ethnicity,
                  biomarkers[], medications[], source, stage
(Trial)           id (NCT), title, condition, phase, status, sponsor,
                  eligibility_criteria, min_age, max_age, sex, enrollment,
                  start_date, completion_date, last_updated, ctgov_url
(Diagnosis)       id, name, icd10
(Biomarker)       id (e.g. HER2_POS), name (e.g. "HER2 Positive")
(Medication)      id (e.g. TAMOXIFEN), name
(StudySite)       id, name, city, state, lat, lon, trials, enrolled, capacity

Relationships:
  (Patient)-[:ELIGIBLE_FOR {score}]->(Trial)
  (Patient)-[:HAS_DIAGNOSIS]->(Diagnosis)
  (Patient)-[:HAS_BIOMARKER]->(Biomarker)
  (Patient)-[:TAKES_MEDICATION]->(Medication)
  (Trial)-[:LOCATED_AT]->(StudySite)

Graph scale after seeding: ~500 patients, ~250 trials, ~9,100 ELIGIBLE_FOR edges.

Patient IDs from seeder: P_C50_0001 (breast), P_C61_0001 (prostate), etc. Mock FHIR patients: P001–P005 (used by screening/workflow pages).

Key backend modules

clinicaltrials_api.py

  • search_trials() β€” async, sort=LastUpdatePostDate:desc
  • get_trial_details() β€” async
  • search_trials_sync() / get_trial_details_sync() β€” sync using httpx.Client (NOT asyncio.run()). Safe to call from both sync functions and FastAPI async handlers.
  • _normalize_study() β€” extracts last_updated, ctgov_url in addition to core fields.

Do not use asyncio.run() inside these sync wrappers β€” it breaks when called from a running FastAPI event loop. The sync wrappers use httpx.Client directly.

intake_matching.py

Implements SI-unit clinical intake β†’ trial eligibility matching without requiring a patient ID:

  • BIOMARKER_REGISTRY β€” maps graph node IDs to labels and eligibility text search terms
  • score_intake_against_trial() β€” weighted scoring: age (25), sex (15), ECOG (15), biomarkers (30), labs (15)
  • _check_labs() β€” parses thresholds from eligibility criteria text, converts SI units (creatinine ΞΌmol/L ↔ mg/dL, bilirubin ΞΌmol/L ↔ mg/dL)
  • save_intake_as_patient() β€” persists intake as Patient node for long-term graph enrichment

trial_enrichment.py

  • enrich_trials_from_search() β€” called as a BackgroundTask on every /api/v1/trials/search response; upserts Trial + StudySite nodes
  • get_eligible_patient_counts() β€” batch graph query, returns {nct_id: count}
  • get_graph_intelligence() β€” per-trial: eligible count + top biomarkers + similar trials

graphrag.py

Uses a custom _CYPHER_PROMPT with explicit schema examples. Critical rules in the prompt:

  • Biomarker lookups use id property ({id: 'HER2_POS'}), never {name: 'HER2', status: 'positive'}
  • Condition lookups use lowercase on Trial nodes
  • Patient eligibility always via (Patient)-[:ELIGIBLE_FOR]->(Trial)

a2a_workflow.py

Five-state machine: INGESTING β†’ PARSING_PROTOCOL β†’ MATCHING β†’ SCORING β†’ RECRUITING

  • Calls search_trials_sync() / get_trial_details_sync() β€” these are safe (use httpx.Client)
  • run_pipeline() is synchronous; called from async FastAPI endpoint without await

Key frontend pages

/intake β€” Eligibility Check

The primary self-service interface. Accepts raw clinical data in SI units; no patient ID needed.

  • Six sections: Diagnosis & Demographics, Biomarkers, Lab Values, Treatment History
  • Biomarker registry loaded from GET /api/v1/intake/biomarkers
  • Submits to POST /api/v1/intake/match
  • Optional "Save to graph" checkbox persists profile as Patient node

/ β€” Trial Finder

  • Sorted by LastUpdatePostDate:desc (most recently updated first)
  • Each search result triggers background graph enrichment
  • Expanded cards show Graph Intelligence panel: eligible patient count, top biomarkers, similar trials
  • Direct ClinicalTrials.gov link per trial

/screening β€” Patient Screening

  • Patient ID field is a <input list="..."> combobox loading from GET /api/v1/graph/patients
  • NCT ID field is a combobox with quick-pick suggestions
  • Validates non-empty inputs before submitting
  • Two modes: Single Trial Screen and A2A Full Pipeline

API endpoints (key ones)

GET  /api/v1/trials/search          β€” real-time CT.gov search, sorted by recency, graph-enriched
POST /api/v1/intake/match           β€” SI-unit clinical intake β†’ ranked trial matches
GET  /api/v1/intake/biomarkers      β€” biomarker registry for the intake form
GET  /api/v1/trials/{nct_id}/intelligence β€” graph-derived insights per trial
GET  /api/v1/graph/patients         β€” query Neo4j for seeded patient IDs
POST /api/v1/patients/{id}/screen/{nct_id} β€” screen FHIR patient against trial
POST /api/v1/workflow/run           β€” run full A2A pipeline
GET  /api/v1/analytics/kpi         β€” dashboard KPIs
GET  /api/v1/map/data               β€” site coordinates + patient clusters
POST /api/v1/graph/query            β€” GraphRAG natural language
POST /seed                          β€” trigger full graph seeding
GET  /api/v1/graph/stats            β€” node/edge counts

Full interactive docs at http://localhost:8000/docs.

Environment variables

NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=clinicalmatch2024
NEO4J_DATABASE=neo4j

OPENAI_API_KEY=<aimlapi.com key>
OPENAI_BASE_URL=https://ai.aimlapi.com/v1
OPENAI_MODEL=claude-opus-4-7

NEXT_PUBLIC_API_URL=http://localhost:8000   # dev only; empty string in Docker

Known issues and constraints

  • Turbopack is broken on this machine β€” always use next dev --webpack. Never suggest next dev without --webpack.
  • next/font/google causes compilation to hang (network request during bundling). Geist font is installed as a package but the next/font/google import is removed. Use plain Tailwind font-sans.
  • asyncio.run() from async context β€” the sync CT.gov wrappers use httpx.Client to avoid this. Never re-introduce asyncio.run() into the sync wrappers; it will fail when called from FastAPI's running event loop.
  • Leaflet SSR β€” MapComponent.tsx uses raw Leaflet (not react-leaflet) via useEffect. The MapComponent dynamic import has ssr: false. Do not switch to react-leaflet's MapContainer.
  • suppressHydrationWarning on <body> in layout.tsx β€” required for Grammarly browser extension compatibility.
  • Mock FHIR patients (P001–P005) live in fhir_adapter.py. The 500 seeded graph patients (P_C50_0001 etc.) are in Neo4j only. The screening page loads graph patients from GET /api/v1/graph/patients for the combobox.

Adding new features

  1. New backend route: add to main.py, import the module at the top, add a Pydantic request model if needed
  2. New API function: add a typed function to frontend/src/lib/api.ts
  3. New page: create frontend/src/app/<name>/page.tsx, add to nav array in Sidebar.tsx
  4. Graph schema change: update neo4j_setup.py constraints/indexes, update _CYPHER_PROMPT in graphrag.py with the new node/property examples
  5. New biomarker: add to BIOMARKER_REGISTRY in intake_matching.py and to BM_GROUPS in frontend/src/app/intake/page.tsx

Demo script (for judges)

  1. GET /api/v1/graph/stats β€” confirm 500+ patients and 9,100+ edges
  2. / β€” search "breast cancer" β†’ observe recency sort, graph-matched patient count badges
  3. Expand a trial β†’ Graph Intelligence panel shows eligible patients, top biomarkers, similar trials
  4. /intake β€” enter: Age 52, Female, ECOG 1, HER2+, Hgb 12.5 g/dL, Creatinine 88 ΞΌmol/L β†’ ranked trials with pass/fail breakdown
  5. /screening β€” select P_C50_0001 from combobox β†’ run A2A Pipeline β†’ observe 5-state machine
  6. /recruitment β€” kanban board, generate PCP letter outreach
  7. /dashboard β€” KPI cards, enrollment funnel, demographics
  8. /graph β€” ask "which patients are eligible for breast cancer trials?"
  9. In Prompt Opinion: call MCP tool find_trials(condition="breast cancer")