Spaces:

yakilee
/

TrialPath

Sleeping

yakilee Claude Opus 4.6 commited on Feb 6

Commit

e05c99c

1 Parent(s): 6ba35c5

docs: add ARCHITECTURE directory with mermaid diagrams and process docs

Generated from codebase analysis: main README with system diagram,
module communities, and dependency graph, plus 5 process files covering
patient journey, search refinement loop, dual-model evaluation,
UI state management, and Parlant bridge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (6) hide show

architecture/README.md +168 -0
architecture/dual-model-evaluation.md +71 -0
architecture/parlant-bridge.md +79 -0
architecture/patient-journey.md +102 -0
architecture/search-refinement-loop.md +56 -0
architecture/ui-state-management.md +72 -0

architecture/README.md ADDED Viewed

	@@ -0,0 +1,168 @@

+# TrialPath Architecture
+AI-powered NSCLC clinical trial matching system. PoC phase.
+## Stats
+| Metric | Value |
+|--------|-------|
+| Python files | 63 |
+| Lines of code | ~7,000 |
+| Test functions | 259 |
+| Data model types | 22 |
+| Parlant tools | 7 |
+| UI pages / components | 5 / 6 |
+## System Diagram
+```mermaid
+graph TB
+    subgraph UI["Streamlit UI"]
+        upload[1_upload]
+        profile[2_profile_review]
+        matching[3_trial_matching]
+        gaps[4_gap_analysis]
+        summary[5_summary]
+    end
+    subgraph Frontend_Services["Frontend Services"]
+        state_mgr[StateManager]
+        parlant_bridge[ParlantBridge]
+        mock_data[MockData]
+    end
+    subgraph Agent["Parlant Agent"]
+        journey[Journey<br/>5 states]
+        tools[7 Tools]
+        guidelines[10 Guidelines]
+    end
+    subgraph Backend_Services["Backend Services"]
+        medgemma[MedGemma 4B<br/>HF Endpoint]
+        gemini[Gemini 3 Pro<br/>LLM Planner]
+        mcp[ClinicalTrials<br/>MCP Client]
+    end
+    subgraph Models["Data Contracts"]
+        patient[PatientProfile]
+        anchors[SearchAnchors]
+        trial[TrialCandidate]
+        ledger[EligibilityLedger]
+        searchlog[SearchLog]
+    end
+    subgraph External["External APIs"]
+        hf_api[HuggingFace API]
+        gemini_api[Google Gemini API]
+        ct_api[ClinicalTrials.gov v2]
+    end
+    UI --> Frontend_Services
+    Frontend_Services -->|async bridge| Agent
+    Agent --> Backend_Services
+    Backend_Services --> Models
+    medgemma --> hf_api
+    gemini --> gemini_api
+    mcp --> ct_api
+```
+## Module Communities
+### 1. Data Models (`trialpath/models/`)
+Shared language for the entire system. 5 Pydantic v2 contracts, 22 exported types.
+| Contract | Purpose |
+|----------|---------|
+| `PatientProfile` | MedGemma output: demographics, diagnosis, biomarkers, labs, treatments, unknowns + evidence spans |
+| `SearchAnchors` | Gemini-generated query params with relaxation order |
+| `TrialCandidate` | Normalized ClinicalTrials.gov results |
+| `EligibilityLedger` | Per-trial criterion assessment with traffic-light status + gaps |
+| `SearchLog` | Iterative query refinement tracking (max 5 rounds) |
+### 2. Backend Services (`trialpath/services/`)
+4 service integrations, all currently stubbed.
+| Service | File | External Dependency |
+|---------|------|-------------------|
+| `MedGemmaExtractor` | `medgemma_extractor.py` | HuggingFace Inference Endpoint |
+| `GeminiPlanner` | `gemini_planner.py` | Google Gemini API (`google-genai`) |
+| `ClinicalTrialsMCPClient` | `mcp_client.py` | ClinicalTrials.gov REST API v2 |
+| `ParlantClient` | `parlant_client.py` | Parlant Engine REST API |
+### 3. Parlant Agent (`trialpath/agent/`)
+Orchestration layer using Parlant SDK. Defines the 5-state journey, 7 tools, and 10 guidelines.
+- **Tools** are thin async wrappers around backend services (lazy singleton pattern)
+- **Journey** defines state machine with conditional transitions and loops
+- **Guidelines** provide phase-specific and global behavioral rules
+### 4. Streamlit Frontend (`app/`)
+5-page journey mirroring Parlant states. Currently running on mock data.
+| Page | State | Prerequisite |
+|------|-------|-------------|
+| Upload | INGEST | none |
+| Profile Review | PRESCREEN | `patient_profile` |
+| Trial Matching | VALIDATE_TRIALS | `trial_candidates` |
+| Gap Analysis | GAP_FOLLOWUP | `eligibility_ledger` |
+| Summary | SUMMARY | `eligibility_ledger` |
+### 5. Integration Tests (`tests/`)
+Cross-module testing: 18 integration + 14 service integration + 7 e2e tests.
+## Cross-Community Dependencies
+```mermaid
+graph LR
+    Models["Data Models"] --> Services["Backend Services"]
+    Models --> Agent["Parlant Agent"]
+    Models --> Frontend["Streamlit Frontend"]
+    Models --> Tests["Integration Tests"]
+    Services --> Agent
+    Agent -->|Parlant REST| Frontend
+    Config["config.py"] --> Services
+    Config --> Agent
+    MockData["mock_data.py"] --> Frontend
+    MockData --> Tests
+```
+## Key Processes
+| Process | File |
+|---------|------|
+| [Patient Journey (5-state flow)](patient-journey.md) | `trialpath/agent/journey.py` |
+| [Search Refinement Loop](search-refinement-loop.md) | `trialpath/agent/tools.py` |
+| [Dual-Model Eligibility Evaluation](dual-model-evaluation.md) | `trialpath/agent/tools.py` |
+| [UI State Management](ui-state-management.md) | `app/services/state_manager.py` |
+| [Parlant Bridge (sync/async)](parlant-bridge.md) | `app/services/parlant_bridge.py` |
+## Configuration
+All via environment variables (`trialpath/config.py`):
+| Variable | Default | Used By |
+|----------|---------|---------|
+| `MEDGEMMA_ENDPOINT_URL` | HF cloud URL | MedGemmaExtractor |
+| `HF_TOKEN` | `""` | MedGemmaExtractor |
+| `GEMINI_API_KEY` | `""` | GeminiPlanner |
+| `GEMINI_MODEL` | `gemini-3-pro` | GeminiPlanner |
+| `MCP_URL` | `localhost:3000` | ClinicalTrialsMCPClient |
+| `PARLANT_URL` | `localhost:8800` | ParlantClient |
+| `SESSION_COST_BUDGET` | `$0.50` | Cost guardrail |
+## Implementation Status
+| Component | Status |
+|-----------|--------|
+| Data Models (22 types) | **Complete** |
+| MedGemma Extractor | Prompts ready, HF integration **pending** |
+| Gemini Planner | Prompts stubbed, API integration **pending** |
+| ClinicalTrials MCP | Wrapper done, needs running MCP server |
+| Parlant Agent | Journey/tools/guidelines defined, live integration **pending** |
+| Streamlit UI | **Complete** with mock data |
+| Tests (259 total) | **Complete** |

architecture/dual-model-evaluation.md ADDED Viewed

	@@ -0,0 +1,71 @@

+# Dual-Model Eligibility Evaluation
+**Entry point:** `trialpath/agent/tools.py` > `evaluate_trial_eligibility()`
+Two-model approach for criterion-level eligibility assessment: MedGemma handles medical criteria, Gemini handles structural criteria.
+## Flow
+```mermaid
+flowchart TD
+    A[PatientProfile + TrialCandidate] --> B[GeminiPlanner.slice_criteria]
+    B --> C[Atomic criteria list]
+    C --> D{For each criterion}
+    D --> E{Category?}
+    E -->|medical| F[MedGemmaExtractor.evaluate_medical_criterion]
+    E -->|structural| G[GeminiPlanner.evaluate_structural_criterion]
+    F --> H[CriterionAssessment]
+    G --> H
+    H --> I[GeminiPlanner.aggregate_assessments]
+    I --> J[EligibilityLedger]
+```
+## Steps
+### 1. Slice Criteria
+`GeminiPlanner.slice_criteria(trial)` breaks trial eligibility text into atomic, evaluable items. Each criterion is tagged with a `category`:
+- **`medical`**: Biomarker presence, lab values, staging, ECOG status, treatment history
+- **`structural`**: Age range, geography, consent, insurance, enrollment status
+### 2. Per-Criterion Evaluation
+Each criterion is routed to the appropriate model:
+| Category | Model | Why |
+|----------|-------|-----|
+| `medical` | MedGemma 4B | Specialized for clinical/biomedical reasoning, temporal lab interpretation |
+| `structural` | Gemini 3 Pro | General reasoning for demographic/administrative checks |
+Each evaluation returns:
+- `decision`: `met` / `not_met` / `unknown`
+- `confidence`: 0.0 - 1.0
+- `patient_evidence`: pointer to source document
+- `trial_evidence`: pointer to criterion text
+- `reasoning`: explanation
+### 3. Aggregate into Ledger
+`GeminiPlanner.aggregate_assessments()` combines all criterion results into an `EligibilityLedger`:
+- **`overall_assessment`**: `eligible` / `ineligible` / `needs_review`
+- **`criteria_assessments[]`**: Full list of `CriterionAssessment` objects
+- **`gaps[]`**: `GapItem` objects for `unknown`/`not_met` criteria with recommended actions
+- **Traffic-light status**: Visual summary (green/yellow/red per criterion)
+## Key Data Contracts
+- **`EligibilityLedger`**: Per-trial overall + criterion-level assessment
+- **`CriterionAssessment`**: Single criterion verdict with evidence pointers
+- **`GapItem`**: Actionable next step for a gap (what to provide, why it matters)
+- **`TemporalCheck`**: For lab values with date requirements (e.g., "ANC >= 1.5 within 14 days")
+## Key Files
+| File | Role |
+|------|------|
+| `trialpath/agent/tools.py:183-226` | `evaluate_trial_eligibility` tool |
+| `trialpath/services/gemini_planner.py` | slice/evaluate/aggregate logic |
+| `trialpath/services/medgemma_extractor.py` | medical criterion evaluation |
+| `trialpath/models/eligibility_ledger.py` | EligibilityLedger + CriterionAssessment |

architecture/parlant-bridge.md ADDED Viewed

	@@ -0,0 +1,79 @@

+# Parlant Bridge (Sync/Async)
+**Entry point:** `app/services/parlant_bridge.py`
+Bridges synchronous Streamlit with the async Parlant agent engine via a dedicated thread pool.
+## Architecture
+```mermaid
+sequenceDiagram
+    participant UI as Streamlit UI (sync)
+    participant Bridge as ParlantBridge
+    participant Pool as ThreadPoolExecutor
+    participant Client as ParlantClient (async)
+    participant Engine as Parlant Engine
+    UI->>Bridge: start_session()
+    Bridge->>Pool: _run_async()
+    Pool->>Client: create_session()
+    Client->>Engine: POST /sessions
+    Engine-->>Client: session_id
+    Client-->>Pool: session_id
+    Pool-->>Bridge: session_id
+    Bridge-->>UI: session_id
+    UI->>Bridge: send_and_poll(message)
+    Bridge->>Pool: _run_async()
+    Pool->>Client: send_message()
+    Client->>Engine: POST /sessions/{id}/messages
+    Pool->>Client: poll_events()
+    Client->>Engine: GET /sessions/{id}/events
+    Engine-->>Client: tool_events[]
+    Client-->>Pool: events
+    Pool-->>Bridge: events
+    Bridge->>Bridge: sync_journey_state()
+    Bridge-->>UI: updated state
+```
+## Sync/Async Bridge
+Streamlit runs synchronously. Parlant client is fully async (`httpx.AsyncClient`). The bridge uses `concurrent.futures.ThreadPoolExecutor` to run async code from sync context:
+```python
+# Simplified pattern
+def _run_async(coro):
+    with ThreadPoolExecutor(max_workers=1) as pool:
+        return pool.submit(asyncio.run, coro).result()
+```
+## Event-to-State Mapping
+`sync_journey_state()` parses Parlant tool events and updates `st.session_state`:
+| Tool Event | Session State Key |
+|------------|------------------|
+| `extract_patient_profile` | `patient_profile_data` |
+| `search_clinical_trials` | `trial_candidates_data` |
+| `evaluate_trial_eligibility` | `eligibility_ledger_data` |
+| `analyze_gaps` | `gap_analysis_data` |
+## Two Parlant Clients
+The codebase has two separate Parlant clients:
+| Client | Location | Purpose |
+|--------|----------|---------|
+| Backend | `trialpath/services/parlant_client.py` | Async REST wrapper for engine admin |
+| Frontend | `app/services/parlant_client.py` | Session/event management for UI |
+Both target the same Parlant engine at `PARLANT_URL` (default `localhost:8800`).
+## Key Files
+| File | Role |
+|------|------|
+| `app/services/parlant_bridge.py` | Sync/async bridge + state sync |
+| `app/services/parlant_client.py` | Frontend async REST client |
+| `trialpath/services/parlant_client.py` | Backend async REST client |
+| `trialpath/config.py` | `PARLANT_URL` configuration |

architecture/patient-journey.md ADDED Viewed

	@@ -0,0 +1,102 @@

+# Patient Journey (5-State Flow)
+**Entry point:** `trialpath/agent/journey.py` > `create_clinical_trial_journey()`
+The core orchestration process. A Parlant `Journey` with 5 states, conditional transitions, and one backward loop.
+## State Machine
+```mermaid
+stateDiagram-v2
+    [*] --> INGEST
+    INGEST --> PRESCREEN : profile has minimum prescreen data
+    PRESCREEN --> VALIDATE_TRIALS : 1-50 results found
+    PRESCREEN --> PRESCREEN : refine (>50) or relax (0)
+    VALIDATE_TRIALS --> GAP_FOLLOWUP : all trials evaluated
+    GAP_FOLLOWUP --> SUMMARY : patient ready for summary
+    GAP_FOLLOWUP --> INGEST : patient uploads new documents
+    SUMMARY --> [*] : patient reviewed summary
+```
+## States
+### 1. INGEST (`ForkJourneyState`)
+- **Action:** Extract patient profile from uploaded medical documents
+- **Tools:** `extract_patient_profile`
+- **Input:** PDF/image document URLs + patient metadata (age, sex)
+- **Output:** `PatientProfile` (demographics, diagnosis, biomarkers, labs, treatments, unknowns)
+- **Transition:** Advances to PRESCREEN when profile has minimum data
+### 2. PRESCREEN (`ForkJourneyState`)
+- **Action:** Generate search anchors, query ClinicalTrials.gov, refine/relax iteratively
+- **Tools:** `generate_search_anchors`, `search_clinical_trials`, `refine_search_query`, `relax_search_query`
+- **Input:** `PatientProfile`
+- **Output:** `SearchAnchors` + `TrialCandidate[]`
+- **Loop:** Max 5 refinement rounds. Refine if >50 results, relax if 0 results.
+- **Transition:** Advances to VALIDATE_TRIALS when 1-50 results found
+### 3. VALIDATE_TRIALS (`ToolJourneyState`)
+- **Action:** Dual-model eligibility evaluation per trial
+- **Tools:** `evaluate_trial_eligibility`
+- **Input:** `PatientProfile` + `TrialCandidate`
+- **Output:** `EligibilityLedger[]` (criterion-level verdicts + gaps)
+- **Transition:** Advances to GAP_FOLLOWUP when all candidates evaluated
+### 4. GAP_FOLLOWUP (`ForkJourneyState`)
+- **Action:** Analyze gaps, present actionable next steps
+- **Tools:** `analyze_gaps`
+- **Input:** `PatientProfile` + `EligibilityLedger[]`
+- **Output:** `GapItem[]` (recommended actions)
+- **Fork:** Patient can upload new documents (loop to INGEST) or proceed to SUMMARY
+### 5. SUMMARY (`ChatJourneyState`)
+- **Action:** Present final summary, generate doctor packet
+- **Tools:** none (chat-only)
+- **Output:** Doctor Packet (JSON/Markdown export)
+- **Transition:** END_JOURNEY when patient has reviewed
+## Data Flow
+```
+Patient Document (PDF/image)
+    |
+    v
+[INGEST] MedGemmaExtractor.extract()
+    --> PatientProfile
+    |
+    v
+[PRESCREEN] GeminiPlanner.generate_search_anchors()
+           + ClinicalTrialsMCPClient.search()
+           + iterative refine/relax (max 5 rounds)
+    --> SearchAnchors --> TrialCandidate[]
+    |
+    v
+[VALIDATE_TRIALS] GeminiPlanner.slice_criteria()
+                 + dual-model evaluation
+                 + GeminiPlanner.aggregate_assessments()
+    --> EligibilityLedger[]
+    |
+    v
+[GAP_FOLLOWUP] GeminiPlanner.analyze_gaps()
+    --> GapItem[]
+    --> (optional: loop back to INGEST)
+    |
+    v
+[SUMMARY] Final report generation
+    --> Doctor Packet
+```
+## Key Files
+| File | Role |
+|------|------|
+| `trialpath/agent/journey.py` | State machine definition |
+| `trialpath/agent/tools.py` | Tool implementations |
+| `trialpath/agent/guidelines.py` | Phase-specific behavioral rules |
+| `trialpath/agent/orchestrator.py` | Parlant PluginServer setup |
+| `trialpath/agent/setup.py` | Agent + NLP services init |

architecture/search-refinement-loop.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# Search Refinement Loop
+**Entry point:** `trialpath/agent/tools.py` > PRESCREEN state tools
+Iterative query refinement process that adjusts ClinicalTrials.gov queries until a manageable result set (1-50 trials) is found.
+## Flow
+```mermaid
+flowchart TD
+    A[PatientProfile] --> B[generate_search_anchors]
+    B --> C[SearchAnchors v1]
+    C --> D[search_clinical_trials]
+    D --> E{Result count?}
+    E -->|>50| F[refine_search_query]
+    E -->|0| G[relax_search_query]
+    E -->|1-50| H[Proceed to VALIDATE_TRIALS]
+    F --> I{Round < 5?}
+    G --> I
+    I -->|Yes| D
+    I -->|No| H
+```
+## How It Works
+1. **Generate anchors:** Gemini converts `PatientProfile` into `SearchAnchors` (condition, biomarkers, stage, geography, phase filters, relaxation order)
+2. **Search:** MCP client queries ClinicalTrials.gov REST API v2
+3. **Evaluate count:**
+   - **>50 results:** Call `refine_search_query` -- Gemini tightens filters (add biomarker, narrow geography, specific phase)
+   - **0 results:** Call `relax_search_query` -- Gemini loosens filters following the relaxation order in SearchAnchors
+   - **1-50 results:** Proceed to trial validation
+4. **Loop guard:** Maximum 5 refinement rounds (tracked in `SearchLog`)
+## Tools Involved
+| Tool | When | What |
+|------|------|------|
+| `generate_search_anchors` | Start | Profile -> SearchAnchors |
+| `search_clinical_trials` | Each round | SearchAnchors -> TrialCandidate[] |
+| `refine_search_query` | Too many results | Tighten SearchAnchors |
+| `relax_search_query` | Zero results | Loosen SearchAnchors |
+## Key Data Contracts
+- **`SearchAnchors`**: `condition`, `biomarkers[]`, `stage`, `geography`, `phase_filter`, `relaxation_order[]`
+- **`SearchLog`**: Tracks each round with `SearchStep` (query params, result count, action taken)
+## Key Files
+| File | Role |
+|------|------|
+| `trialpath/agent/tools.py:82-180` | Tool implementations |
+| `trialpath/services/mcp_client.py` | ClinicalTrials.gov wrapper |
+| `trialpath/services/gemini_planner.py` | Refine/relax logic |
+| `trialpath/models/search_anchors.py` | SearchAnchors contract |
+| `trialpath/models/search_log.py` | Refinement tracking |

architecture/ui-state-management.md ADDED Viewed

	@@ -0,0 +1,72 @@

+# UI State Management
+**Entry point:** `app/services/state_manager.py`
+Streamlit session-based state management that mirrors the 5 Parlant journey states with prerequisite guards.
+## State Machine
+```mermaid
+stateDiagram-v2
+    [*] --> INGEST : init_session_state()
+    INGEST --> PRESCREEN : patient_profile set
+    PRESCREEN --> VALIDATE_TRIALS : trial_candidates set
+    VALIDATE_TRIALS --> GAP_FOLLOWUP : eligibility_ledger set
+    GAP_FOLLOWUP --> SUMMARY : eligibility_ledger set
+    GAP_FOLLOWUP --> INGEST : reset_to_ingest()
+```
+## Session State Variables
+| Key | Type | Default | Set By |
+|-----|------|---------|--------|
+| `journey_state` | `str` | `"INGEST"` | `advance_journey()` |
+| `parlant_session_id` | `str | None` | `None` | Parlant bridge |
+| `parlant_agent_id` | `str | None` | `None` | Parlant bridge |
+| `parlant_session_active` | `bool` | `False` | Parlant bridge |
+| `patient_profile` | `dict | None` | `None` | INGEST tools |
+| `uploaded_files` | `list` | `[]` | Upload page |
+| `search_anchors` | `dict | None` | `None` | PRESCREEN tools |
+| `trial_candidates` | `list` | `[]` | PRESCREEN tools |
+| `eligibility_ledger` | `list` | `[]` | VALIDATE tools |
+| `last_event_offset` | `int` | `0` | Parlant bridge polling |
+## Key Functions
+| Function | Purpose |
+|----------|---------|
+| `init_session_state()` | Initialize defaults, no overwrite |
+| `get_current_journey_state()` | Read current state |
+| `advance_journey(target)` | Forward-only transition with validation |
+| `can_advance_to(target)` | Prerequisite check |
+| `reset_to_ingest()` | Special backward transition for gap re-ingestion |
+| `reset_session_state()` | Full reset to defaults |
+## Prerequisite Guards
+| Target State | Requires |
+|-------------|----------|
+| PRESCREEN | `patient_profile` is set |
+| VALIDATE_TRIALS | `patient_profile` is set |
+| GAP_FOLLOWUP | `patient_profile` + `trial_candidates` |
+| SUMMARY | `patient_profile` + `trial_candidates` + `eligibility_ledger` |
+`advance_journey()` enforces forward-only movement (raises `ValueError` on backward). The only exception is `reset_to_ingest()` for the gap re-ingestion loop.
+## Page Mapping
+| Page File | Journey State | Components Used |
+|-----------|--------------|-----------------|
+| `app/pages/1_upload.py` | INGEST | `file_uploader`, `disclaimer_banner` |
+| `app/pages/2_profile_review.py` | PRESCREEN | `profile_card` |
+| `app/pages/3_trial_matching.py` | VALIDATE_TRIALS | `trial_card` |
+| `app/pages/4_gap_analysis.py` | GAP_FOLLOWUP | `gap_card` |
+| `app/pages/5_summary.py` | SUMMARY | `profile_card`, `trial_card`, `gap_card` |
+## Key Files
+| File | Role |
+|------|------|
+| `app/services/state_manager.py` | State machine + prerequisites |
+| `streamlit_app.py` | Multi-page navigation entry point |
+| `app/components/progress_tracker.py` | Visual state indicator |