docs: add ARCHITECTURE directory with mermaid diagrams and process docs
Browse filesGenerated from codebase analysis: main README with system diagram,
module communities, and dependency graph, plus 5 process files covering
patient journey, search refinement loop, dual-model evaluation,
UI state management, and Parlant bridge.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- architecture/README.md +168 -0
- architecture/dual-model-evaluation.md +71 -0
- architecture/parlant-bridge.md +79 -0
- architecture/patient-journey.md +102 -0
- architecture/search-refinement-loop.md +56 -0
- architecture/ui-state-management.md +72 -0
architecture/README.md
ADDED
|
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TrialPath Architecture
|
| 2 |
+
|
| 3 |
+
AI-powered NSCLC clinical trial matching system. PoC phase.
|
| 4 |
+
|
| 5 |
+
## Stats
|
| 6 |
+
|
| 7 |
+
| Metric | Value |
|
| 8 |
+
|--------|-------|
|
| 9 |
+
| Python files | 63 |
|
| 10 |
+
| Lines of code | ~7,000 |
|
| 11 |
+
| Test functions | 259 |
|
| 12 |
+
| Data model types | 22 |
|
| 13 |
+
| Parlant tools | 7 |
|
| 14 |
+
| UI pages / components | 5 / 6 |
|
| 15 |
+
|
| 16 |
+
## System Diagram
|
| 17 |
+
|
| 18 |
+
```mermaid
|
| 19 |
+
graph TB
|
| 20 |
+
subgraph UI["Streamlit UI"]
|
| 21 |
+
upload[1_upload]
|
| 22 |
+
profile[2_profile_review]
|
| 23 |
+
matching[3_trial_matching]
|
| 24 |
+
gaps[4_gap_analysis]
|
| 25 |
+
summary[5_summary]
|
| 26 |
+
end
|
| 27 |
+
|
| 28 |
+
subgraph Frontend_Services["Frontend Services"]
|
| 29 |
+
state_mgr[StateManager]
|
| 30 |
+
parlant_bridge[ParlantBridge]
|
| 31 |
+
mock_data[MockData]
|
| 32 |
+
end
|
| 33 |
+
|
| 34 |
+
subgraph Agent["Parlant Agent"]
|
| 35 |
+
journey[Journey<br/>5 states]
|
| 36 |
+
tools[7 Tools]
|
| 37 |
+
guidelines[10 Guidelines]
|
| 38 |
+
end
|
| 39 |
+
|
| 40 |
+
subgraph Backend_Services["Backend Services"]
|
| 41 |
+
medgemma[MedGemma 4B<br/>HF Endpoint]
|
| 42 |
+
gemini[Gemini 3 Pro<br/>LLM Planner]
|
| 43 |
+
mcp[ClinicalTrials<br/>MCP Client]
|
| 44 |
+
end
|
| 45 |
+
|
| 46 |
+
subgraph Models["Data Contracts"]
|
| 47 |
+
patient[PatientProfile]
|
| 48 |
+
anchors[SearchAnchors]
|
| 49 |
+
trial[TrialCandidate]
|
| 50 |
+
ledger[EligibilityLedger]
|
| 51 |
+
searchlog[SearchLog]
|
| 52 |
+
end
|
| 53 |
+
|
| 54 |
+
subgraph External["External APIs"]
|
| 55 |
+
hf_api[HuggingFace API]
|
| 56 |
+
gemini_api[Google Gemini API]
|
| 57 |
+
ct_api[ClinicalTrials.gov v2]
|
| 58 |
+
end
|
| 59 |
+
|
| 60 |
+
UI --> Frontend_Services
|
| 61 |
+
Frontend_Services -->|async bridge| Agent
|
| 62 |
+
Agent --> Backend_Services
|
| 63 |
+
Backend_Services --> Models
|
| 64 |
+
medgemma --> hf_api
|
| 65 |
+
gemini --> gemini_api
|
| 66 |
+
mcp --> ct_api
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
## Module Communities
|
| 70 |
+
|
| 71 |
+
### 1. Data Models (`trialpath/models/`)
|
| 72 |
+
|
| 73 |
+
Shared language for the entire system. 5 Pydantic v2 contracts, 22 exported types.
|
| 74 |
+
|
| 75 |
+
| Contract | Purpose |
|
| 76 |
+
|----------|---------|
|
| 77 |
+
| `PatientProfile` | MedGemma output: demographics, diagnosis, biomarkers, labs, treatments, unknowns + evidence spans |
|
| 78 |
+
| `SearchAnchors` | Gemini-generated query params with relaxation order |
|
| 79 |
+
| `TrialCandidate` | Normalized ClinicalTrials.gov results |
|
| 80 |
+
| `EligibilityLedger` | Per-trial criterion assessment with traffic-light status + gaps |
|
| 81 |
+
| `SearchLog` | Iterative query refinement tracking (max 5 rounds) |
|
| 82 |
+
|
| 83 |
+
### 2. Backend Services (`trialpath/services/`)
|
| 84 |
+
|
| 85 |
+
4 service integrations, all currently stubbed.
|
| 86 |
+
|
| 87 |
+
| Service | File | External Dependency |
|
| 88 |
+
|---------|------|-------------------|
|
| 89 |
+
| `MedGemmaExtractor` | `medgemma_extractor.py` | HuggingFace Inference Endpoint |
|
| 90 |
+
| `GeminiPlanner` | `gemini_planner.py` | Google Gemini API (`google-genai`) |
|
| 91 |
+
| `ClinicalTrialsMCPClient` | `mcp_client.py` | ClinicalTrials.gov REST API v2 |
|
| 92 |
+
| `ParlantClient` | `parlant_client.py` | Parlant Engine REST API |
|
| 93 |
+
|
| 94 |
+
### 3. Parlant Agent (`trialpath/agent/`)
|
| 95 |
+
|
| 96 |
+
Orchestration layer using Parlant SDK. Defines the 5-state journey, 7 tools, and 10 guidelines.
|
| 97 |
+
|
| 98 |
+
- **Tools** are thin async wrappers around backend services (lazy singleton pattern)
|
| 99 |
+
- **Journey** defines state machine with conditional transitions and loops
|
| 100 |
+
- **Guidelines** provide phase-specific and global behavioral rules
|
| 101 |
+
|
| 102 |
+
### 4. Streamlit Frontend (`app/`)
|
| 103 |
+
|
| 104 |
+
5-page journey mirroring Parlant states. Currently running on mock data.
|
| 105 |
+
|
| 106 |
+
| Page | State | Prerequisite |
|
| 107 |
+
|------|-------|-------------|
|
| 108 |
+
| Upload | INGEST | none |
|
| 109 |
+
| Profile Review | PRESCREEN | `patient_profile` |
|
| 110 |
+
| Trial Matching | VALIDATE_TRIALS | `trial_candidates` |
|
| 111 |
+
| Gap Analysis | GAP_FOLLOWUP | `eligibility_ledger` |
|
| 112 |
+
| Summary | SUMMARY | `eligibility_ledger` |
|
| 113 |
+
|
| 114 |
+
### 5. Integration Tests (`tests/`)
|
| 115 |
+
|
| 116 |
+
Cross-module testing: 18 integration + 14 service integration + 7 e2e tests.
|
| 117 |
+
|
| 118 |
+
## Cross-Community Dependencies
|
| 119 |
+
|
| 120 |
+
```mermaid
|
| 121 |
+
graph LR
|
| 122 |
+
Models["Data Models"] --> Services["Backend Services"]
|
| 123 |
+
Models --> Agent["Parlant Agent"]
|
| 124 |
+
Models --> Frontend["Streamlit Frontend"]
|
| 125 |
+
Models --> Tests["Integration Tests"]
|
| 126 |
+
Services --> Agent
|
| 127 |
+
Agent -->|Parlant REST| Frontend
|
| 128 |
+
Config["config.py"] --> Services
|
| 129 |
+
Config --> Agent
|
| 130 |
+
MockData["mock_data.py"] --> Frontend
|
| 131 |
+
MockData --> Tests
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
## Key Processes
|
| 135 |
+
|
| 136 |
+
| Process | File |
|
| 137 |
+
|---------|------|
|
| 138 |
+
| [Patient Journey (5-state flow)](patient-journey.md) | `trialpath/agent/journey.py` |
|
| 139 |
+
| [Search Refinement Loop](search-refinement-loop.md) | `trialpath/agent/tools.py` |
|
| 140 |
+
| [Dual-Model Eligibility Evaluation](dual-model-evaluation.md) | `trialpath/agent/tools.py` |
|
| 141 |
+
| [UI State Management](ui-state-management.md) | `app/services/state_manager.py` |
|
| 142 |
+
| [Parlant Bridge (sync/async)](parlant-bridge.md) | `app/services/parlant_bridge.py` |
|
| 143 |
+
|
| 144 |
+
## Configuration
|
| 145 |
+
|
| 146 |
+
All via environment variables (`trialpath/config.py`):
|
| 147 |
+
|
| 148 |
+
| Variable | Default | Used By |
|
| 149 |
+
|----------|---------|---------|
|
| 150 |
+
| `MEDGEMMA_ENDPOINT_URL` | HF cloud URL | MedGemmaExtractor |
|
| 151 |
+
| `HF_TOKEN` | `""` | MedGemmaExtractor |
|
| 152 |
+
| `GEMINI_API_KEY` | `""` | GeminiPlanner |
|
| 153 |
+
| `GEMINI_MODEL` | `gemini-3-pro` | GeminiPlanner |
|
| 154 |
+
| `MCP_URL` | `localhost:3000` | ClinicalTrialsMCPClient |
|
| 155 |
+
| `PARLANT_URL` | `localhost:8800` | ParlantClient |
|
| 156 |
+
| `SESSION_COST_BUDGET` | `$0.50` | Cost guardrail |
|
| 157 |
+
|
| 158 |
+
## Implementation Status
|
| 159 |
+
|
| 160 |
+
| Component | Status |
|
| 161 |
+
|-----------|--------|
|
| 162 |
+
| Data Models (22 types) | **Complete** |
|
| 163 |
+
| MedGemma Extractor | Prompts ready, HF integration **pending** |
|
| 164 |
+
| Gemini Planner | Prompts stubbed, API integration **pending** |
|
| 165 |
+
| ClinicalTrials MCP | Wrapper done, needs running MCP server |
|
| 166 |
+
| Parlant Agent | Journey/tools/guidelines defined, live integration **pending** |
|
| 167 |
+
| Streamlit UI | **Complete** with mock data |
|
| 168 |
+
| Tests (259 total) | **Complete** |
|
architecture/dual-model-evaluation.md
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dual-Model Eligibility Evaluation
|
| 2 |
+
|
| 3 |
+
**Entry point:** `trialpath/agent/tools.py` > `evaluate_trial_eligibility()`
|
| 4 |
+
|
| 5 |
+
Two-model approach for criterion-level eligibility assessment: MedGemma handles medical criteria, Gemini handles structural criteria.
|
| 6 |
+
|
| 7 |
+
## Flow
|
| 8 |
+
|
| 9 |
+
```mermaid
|
| 10 |
+
flowchart TD
|
| 11 |
+
A[PatientProfile + TrialCandidate] --> B[GeminiPlanner.slice_criteria]
|
| 12 |
+
B --> C[Atomic criteria list]
|
| 13 |
+
C --> D{For each criterion}
|
| 14 |
+
D --> E{Category?}
|
| 15 |
+
E -->|medical| F[MedGemmaExtractor.evaluate_medical_criterion]
|
| 16 |
+
E -->|structural| G[GeminiPlanner.evaluate_structural_criterion]
|
| 17 |
+
F --> H[CriterionAssessment]
|
| 18 |
+
G --> H
|
| 19 |
+
H --> I[GeminiPlanner.aggregate_assessments]
|
| 20 |
+
I --> J[EligibilityLedger]
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
## Steps
|
| 24 |
+
|
| 25 |
+
### 1. Slice Criteria
|
| 26 |
+
|
| 27 |
+
`GeminiPlanner.slice_criteria(trial)` breaks trial eligibility text into atomic, evaluable items. Each criterion is tagged with a `category`:
|
| 28 |
+
|
| 29 |
+
- **`medical`**: Biomarker presence, lab values, staging, ECOG status, treatment history
|
| 30 |
+
- **`structural`**: Age range, geography, consent, insurance, enrollment status
|
| 31 |
+
|
| 32 |
+
### 2. Per-Criterion Evaluation
|
| 33 |
+
|
| 34 |
+
Each criterion is routed to the appropriate model:
|
| 35 |
+
|
| 36 |
+
| Category | Model | Why |
|
| 37 |
+
|----------|-------|-----|
|
| 38 |
+
| `medical` | MedGemma 4B | Specialized for clinical/biomedical reasoning, temporal lab interpretation |
|
| 39 |
+
| `structural` | Gemini 3 Pro | General reasoning for demographic/administrative checks |
|
| 40 |
+
|
| 41 |
+
Each evaluation returns:
|
| 42 |
+
- `decision`: `met` / `not_met` / `unknown`
|
| 43 |
+
- `confidence`: 0.0 - 1.0
|
| 44 |
+
- `patient_evidence`: pointer to source document
|
| 45 |
+
- `trial_evidence`: pointer to criterion text
|
| 46 |
+
- `reasoning`: explanation
|
| 47 |
+
|
| 48 |
+
### 3. Aggregate into Ledger
|
| 49 |
+
|
| 50 |
+
`GeminiPlanner.aggregate_assessments()` combines all criterion results into an `EligibilityLedger`:
|
| 51 |
+
|
| 52 |
+
- **`overall_assessment`**: `eligible` / `ineligible` / `needs_review`
|
| 53 |
+
- **`criteria_assessments[]`**: Full list of `CriterionAssessment` objects
|
| 54 |
+
- **`gaps[]`**: `GapItem` objects for `unknown`/`not_met` criteria with recommended actions
|
| 55 |
+
- **Traffic-light status**: Visual summary (green/yellow/red per criterion)
|
| 56 |
+
|
| 57 |
+
## Key Data Contracts
|
| 58 |
+
|
| 59 |
+
- **`EligibilityLedger`**: Per-trial overall + criterion-level assessment
|
| 60 |
+
- **`CriterionAssessment`**: Single criterion verdict with evidence pointers
|
| 61 |
+
- **`GapItem`**: Actionable next step for a gap (what to provide, why it matters)
|
| 62 |
+
- **`TemporalCheck`**: For lab values with date requirements (e.g., "ANC >= 1.5 within 14 days")
|
| 63 |
+
|
| 64 |
+
## Key Files
|
| 65 |
+
|
| 66 |
+
| File | Role |
|
| 67 |
+
|------|------|
|
| 68 |
+
| `trialpath/agent/tools.py:183-226` | `evaluate_trial_eligibility` tool |
|
| 69 |
+
| `trialpath/services/gemini_planner.py` | slice/evaluate/aggregate logic |
|
| 70 |
+
| `trialpath/services/medgemma_extractor.py` | medical criterion evaluation |
|
| 71 |
+
| `trialpath/models/eligibility_ledger.py` | EligibilityLedger + CriterionAssessment |
|
architecture/parlant-bridge.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Parlant Bridge (Sync/Async)
|
| 2 |
+
|
| 3 |
+
**Entry point:** `app/services/parlant_bridge.py`
|
| 4 |
+
|
| 5 |
+
Bridges synchronous Streamlit with the async Parlant agent engine via a dedicated thread pool.
|
| 6 |
+
|
| 7 |
+
## Architecture
|
| 8 |
+
|
| 9 |
+
```mermaid
|
| 10 |
+
sequenceDiagram
|
| 11 |
+
participant UI as Streamlit UI (sync)
|
| 12 |
+
participant Bridge as ParlantBridge
|
| 13 |
+
participant Pool as ThreadPoolExecutor
|
| 14 |
+
participant Client as ParlantClient (async)
|
| 15 |
+
participant Engine as Parlant Engine
|
| 16 |
+
|
| 17 |
+
UI->>Bridge: start_session()
|
| 18 |
+
Bridge->>Pool: _run_async()
|
| 19 |
+
Pool->>Client: create_session()
|
| 20 |
+
Client->>Engine: POST /sessions
|
| 21 |
+
Engine-->>Client: session_id
|
| 22 |
+
Client-->>Pool: session_id
|
| 23 |
+
Pool-->>Bridge: session_id
|
| 24 |
+
Bridge-->>UI: session_id
|
| 25 |
+
|
| 26 |
+
UI->>Bridge: send_and_poll(message)
|
| 27 |
+
Bridge->>Pool: _run_async()
|
| 28 |
+
Pool->>Client: send_message()
|
| 29 |
+
Client->>Engine: POST /sessions/{id}/messages
|
| 30 |
+
Pool->>Client: poll_events()
|
| 31 |
+
Client->>Engine: GET /sessions/{id}/events
|
| 32 |
+
Engine-->>Client: tool_events[]
|
| 33 |
+
Client-->>Pool: events
|
| 34 |
+
Pool-->>Bridge: events
|
| 35 |
+
Bridge->>Bridge: sync_journey_state()
|
| 36 |
+
Bridge-->>UI: updated state
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
## Sync/Async Bridge
|
| 40 |
+
|
| 41 |
+
Streamlit runs synchronously. Parlant client is fully async (`httpx.AsyncClient`). The bridge uses `concurrent.futures.ThreadPoolExecutor` to run async code from sync context:
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
# Simplified pattern
|
| 45 |
+
def _run_async(coro):
|
| 46 |
+
with ThreadPoolExecutor(max_workers=1) as pool:
|
| 47 |
+
return pool.submit(asyncio.run, coro).result()
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
## Event-to-State Mapping
|
| 51 |
+
|
| 52 |
+
`sync_journey_state()` parses Parlant tool events and updates `st.session_state`:
|
| 53 |
+
|
| 54 |
+
| Tool Event | Session State Key |
|
| 55 |
+
|------------|------------------|
|
| 56 |
+
| `extract_patient_profile` | `patient_profile_data` |
|
| 57 |
+
| `search_clinical_trials` | `trial_candidates_data` |
|
| 58 |
+
| `evaluate_trial_eligibility` | `eligibility_ledger_data` |
|
| 59 |
+
| `analyze_gaps` | `gap_analysis_data` |
|
| 60 |
+
|
| 61 |
+
## Two Parlant Clients
|
| 62 |
+
|
| 63 |
+
The codebase has two separate Parlant clients:
|
| 64 |
+
|
| 65 |
+
| Client | Location | Purpose |
|
| 66 |
+
|--------|----------|---------|
|
| 67 |
+
| Backend | `trialpath/services/parlant_client.py` | Async REST wrapper for engine admin |
|
| 68 |
+
| Frontend | `app/services/parlant_client.py` | Session/event management for UI |
|
| 69 |
+
|
| 70 |
+
Both target the same Parlant engine at `PARLANT_URL` (default `localhost:8800`).
|
| 71 |
+
|
| 72 |
+
## Key Files
|
| 73 |
+
|
| 74 |
+
| File | Role |
|
| 75 |
+
|------|------|
|
| 76 |
+
| `app/services/parlant_bridge.py` | Sync/async bridge + state sync |
|
| 77 |
+
| `app/services/parlant_client.py` | Frontend async REST client |
|
| 78 |
+
| `trialpath/services/parlant_client.py` | Backend async REST client |
|
| 79 |
+
| `trialpath/config.py` | `PARLANT_URL` configuration |
|
architecture/patient-journey.md
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Patient Journey (5-State Flow)
|
| 2 |
+
|
| 3 |
+
**Entry point:** `trialpath/agent/journey.py` > `create_clinical_trial_journey()`
|
| 4 |
+
|
| 5 |
+
The core orchestration process. A Parlant `Journey` with 5 states, conditional transitions, and one backward loop.
|
| 6 |
+
|
| 7 |
+
## State Machine
|
| 8 |
+
|
| 9 |
+
```mermaid
|
| 10 |
+
stateDiagram-v2
|
| 11 |
+
[*] --> INGEST
|
| 12 |
+
INGEST --> PRESCREEN : profile has minimum prescreen data
|
| 13 |
+
PRESCREEN --> VALIDATE_TRIALS : 1-50 results found
|
| 14 |
+
PRESCREEN --> PRESCREEN : refine (>50) or relax (0)
|
| 15 |
+
VALIDATE_TRIALS --> GAP_FOLLOWUP : all trials evaluated
|
| 16 |
+
GAP_FOLLOWUP --> SUMMARY : patient ready for summary
|
| 17 |
+
GAP_FOLLOWUP --> INGEST : patient uploads new documents
|
| 18 |
+
SUMMARY --> [*] : patient reviewed summary
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
## States
|
| 22 |
+
|
| 23 |
+
### 1. INGEST (`ForkJourneyState`)
|
| 24 |
+
|
| 25 |
+
- **Action:** Extract patient profile from uploaded medical documents
|
| 26 |
+
- **Tools:** `extract_patient_profile`
|
| 27 |
+
- **Input:** PDF/image document URLs + patient metadata (age, sex)
|
| 28 |
+
- **Output:** `PatientProfile` (demographics, diagnosis, biomarkers, labs, treatments, unknowns)
|
| 29 |
+
- **Transition:** Advances to PRESCREEN when profile has minimum data
|
| 30 |
+
|
| 31 |
+
### 2. PRESCREEN (`ForkJourneyState`)
|
| 32 |
+
|
| 33 |
+
- **Action:** Generate search anchors, query ClinicalTrials.gov, refine/relax iteratively
|
| 34 |
+
- **Tools:** `generate_search_anchors`, `search_clinical_trials`, `refine_search_query`, `relax_search_query`
|
| 35 |
+
- **Input:** `PatientProfile`
|
| 36 |
+
- **Output:** `SearchAnchors` + `TrialCandidate[]`
|
| 37 |
+
- **Loop:** Max 5 refinement rounds. Refine if >50 results, relax if 0 results.
|
| 38 |
+
- **Transition:** Advances to VALIDATE_TRIALS when 1-50 results found
|
| 39 |
+
|
| 40 |
+
### 3. VALIDATE_TRIALS (`ToolJourneyState`)
|
| 41 |
+
|
| 42 |
+
- **Action:** Dual-model eligibility evaluation per trial
|
| 43 |
+
- **Tools:** `evaluate_trial_eligibility`
|
| 44 |
+
- **Input:** `PatientProfile` + `TrialCandidate`
|
| 45 |
+
- **Output:** `EligibilityLedger[]` (criterion-level verdicts + gaps)
|
| 46 |
+
- **Transition:** Advances to GAP_FOLLOWUP when all candidates evaluated
|
| 47 |
+
|
| 48 |
+
### 4. GAP_FOLLOWUP (`ForkJourneyState`)
|
| 49 |
+
|
| 50 |
+
- **Action:** Analyze gaps, present actionable next steps
|
| 51 |
+
- **Tools:** `analyze_gaps`
|
| 52 |
+
- **Input:** `PatientProfile` + `EligibilityLedger[]`
|
| 53 |
+
- **Output:** `GapItem[]` (recommended actions)
|
| 54 |
+
- **Fork:** Patient can upload new documents (loop to INGEST) or proceed to SUMMARY
|
| 55 |
+
|
| 56 |
+
### 5. SUMMARY (`ChatJourneyState`)
|
| 57 |
+
|
| 58 |
+
- **Action:** Present final summary, generate doctor packet
|
| 59 |
+
- **Tools:** none (chat-only)
|
| 60 |
+
- **Output:** Doctor Packet (JSON/Markdown export)
|
| 61 |
+
- **Transition:** END_JOURNEY when patient has reviewed
|
| 62 |
+
|
| 63 |
+
## Data Flow
|
| 64 |
+
|
| 65 |
+
```
|
| 66 |
+
Patient Document (PDF/image)
|
| 67 |
+
|
|
| 68 |
+
v
|
| 69 |
+
[INGEST] MedGemmaExtractor.extract()
|
| 70 |
+
--> PatientProfile
|
| 71 |
+
|
|
| 72 |
+
v
|
| 73 |
+
[PRESCREEN] GeminiPlanner.generate_search_anchors()
|
| 74 |
+
+ ClinicalTrialsMCPClient.search()
|
| 75 |
+
+ iterative refine/relax (max 5 rounds)
|
| 76 |
+
--> SearchAnchors --> TrialCandidate[]
|
| 77 |
+
|
|
| 78 |
+
v
|
| 79 |
+
[VALIDATE_TRIALS] GeminiPlanner.slice_criteria()
|
| 80 |
+
+ dual-model evaluation
|
| 81 |
+
+ GeminiPlanner.aggregate_assessments()
|
| 82 |
+
--> EligibilityLedger[]
|
| 83 |
+
|
|
| 84 |
+
v
|
| 85 |
+
[GAP_FOLLOWUP] GeminiPlanner.analyze_gaps()
|
| 86 |
+
--> GapItem[]
|
| 87 |
+
--> (optional: loop back to INGEST)
|
| 88 |
+
|
|
| 89 |
+
v
|
| 90 |
+
[SUMMARY] Final report generation
|
| 91 |
+
--> Doctor Packet
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
## Key Files
|
| 95 |
+
|
| 96 |
+
| File | Role |
|
| 97 |
+
|------|------|
|
| 98 |
+
| `trialpath/agent/journey.py` | State machine definition |
|
| 99 |
+
| `trialpath/agent/tools.py` | Tool implementations |
|
| 100 |
+
| `trialpath/agent/guidelines.py` | Phase-specific behavioral rules |
|
| 101 |
+
| `trialpath/agent/orchestrator.py` | Parlant PluginServer setup |
|
| 102 |
+
| `trialpath/agent/setup.py` | Agent + NLP services init |
|
architecture/search-refinement-loop.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Search Refinement Loop
|
| 2 |
+
|
| 3 |
+
**Entry point:** `trialpath/agent/tools.py` > PRESCREEN state tools
|
| 4 |
+
|
| 5 |
+
Iterative query refinement process that adjusts ClinicalTrials.gov queries until a manageable result set (1-50 trials) is found.
|
| 6 |
+
|
| 7 |
+
## Flow
|
| 8 |
+
|
| 9 |
+
```mermaid
|
| 10 |
+
flowchart TD
|
| 11 |
+
A[PatientProfile] --> B[generate_search_anchors]
|
| 12 |
+
B --> C[SearchAnchors v1]
|
| 13 |
+
C --> D[search_clinical_trials]
|
| 14 |
+
D --> E{Result count?}
|
| 15 |
+
E -->|>50| F[refine_search_query]
|
| 16 |
+
E -->|0| G[relax_search_query]
|
| 17 |
+
E -->|1-50| H[Proceed to VALIDATE_TRIALS]
|
| 18 |
+
F --> I{Round < 5?}
|
| 19 |
+
G --> I
|
| 20 |
+
I -->|Yes| D
|
| 21 |
+
I -->|No| H
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
## How It Works
|
| 25 |
+
|
| 26 |
+
1. **Generate anchors:** Gemini converts `PatientProfile` into `SearchAnchors` (condition, biomarkers, stage, geography, phase filters, relaxation order)
|
| 27 |
+
2. **Search:** MCP client queries ClinicalTrials.gov REST API v2
|
| 28 |
+
3. **Evaluate count:**
|
| 29 |
+
- **>50 results:** Call `refine_search_query` -- Gemini tightens filters (add biomarker, narrow geography, specific phase)
|
| 30 |
+
- **0 results:** Call `relax_search_query` -- Gemini loosens filters following the relaxation order in SearchAnchors
|
| 31 |
+
- **1-50 results:** Proceed to trial validation
|
| 32 |
+
4. **Loop guard:** Maximum 5 refinement rounds (tracked in `SearchLog`)
|
| 33 |
+
|
| 34 |
+
## Tools Involved
|
| 35 |
+
|
| 36 |
+
| Tool | When | What |
|
| 37 |
+
|------|------|------|
|
| 38 |
+
| `generate_search_anchors` | Start | Profile -> SearchAnchors |
|
| 39 |
+
| `search_clinical_trials` | Each round | SearchAnchors -> TrialCandidate[] |
|
| 40 |
+
| `refine_search_query` | Too many results | Tighten SearchAnchors |
|
| 41 |
+
| `relax_search_query` | Zero results | Loosen SearchAnchors |
|
| 42 |
+
|
| 43 |
+
## Key Data Contracts
|
| 44 |
+
|
| 45 |
+
- **`SearchAnchors`**: `condition`, `biomarkers[]`, `stage`, `geography`, `phase_filter`, `relaxation_order[]`
|
| 46 |
+
- **`SearchLog`**: Tracks each round with `SearchStep` (query params, result count, action taken)
|
| 47 |
+
|
| 48 |
+
## Key Files
|
| 49 |
+
|
| 50 |
+
| File | Role |
|
| 51 |
+
|------|------|
|
| 52 |
+
| `trialpath/agent/tools.py:82-180` | Tool implementations |
|
| 53 |
+
| `trialpath/services/mcp_client.py` | ClinicalTrials.gov wrapper |
|
| 54 |
+
| `trialpath/services/gemini_planner.py` | Refine/relax logic |
|
| 55 |
+
| `trialpath/models/search_anchors.py` | SearchAnchors contract |
|
| 56 |
+
| `trialpath/models/search_log.py` | Refinement tracking |
|
architecture/ui-state-management.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# UI State Management
|
| 2 |
+
|
| 3 |
+
**Entry point:** `app/services/state_manager.py`
|
| 4 |
+
|
| 5 |
+
Streamlit session-based state management that mirrors the 5 Parlant journey states with prerequisite guards.
|
| 6 |
+
|
| 7 |
+
## State Machine
|
| 8 |
+
|
| 9 |
+
```mermaid
|
| 10 |
+
stateDiagram-v2
|
| 11 |
+
[*] --> INGEST : init_session_state()
|
| 12 |
+
INGEST --> PRESCREEN : patient_profile set
|
| 13 |
+
PRESCREEN --> VALIDATE_TRIALS : trial_candidates set
|
| 14 |
+
VALIDATE_TRIALS --> GAP_FOLLOWUP : eligibility_ledger set
|
| 15 |
+
GAP_FOLLOWUP --> SUMMARY : eligibility_ledger set
|
| 16 |
+
GAP_FOLLOWUP --> INGEST : reset_to_ingest()
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
## Session State Variables
|
| 20 |
+
|
| 21 |
+
| Key | Type | Default | Set By |
|
| 22 |
+
|-----|------|---------|--------|
|
| 23 |
+
| `journey_state` | `str` | `"INGEST"` | `advance_journey()` |
|
| 24 |
+
| `parlant_session_id` | `str | None` | `None` | Parlant bridge |
|
| 25 |
+
| `parlant_agent_id` | `str | None` | `None` | Parlant bridge |
|
| 26 |
+
| `parlant_session_active` | `bool` | `False` | Parlant bridge |
|
| 27 |
+
| `patient_profile` | `dict | None` | `None` | INGEST tools |
|
| 28 |
+
| `uploaded_files` | `list` | `[]` | Upload page |
|
| 29 |
+
| `search_anchors` | `dict | None` | `None` | PRESCREEN tools |
|
| 30 |
+
| `trial_candidates` | `list` | `[]` | PRESCREEN tools |
|
| 31 |
+
| `eligibility_ledger` | `list` | `[]` | VALIDATE tools |
|
| 32 |
+
| `last_event_offset` | `int` | `0` | Parlant bridge polling |
|
| 33 |
+
|
| 34 |
+
## Key Functions
|
| 35 |
+
|
| 36 |
+
| Function | Purpose |
|
| 37 |
+
|----------|---------|
|
| 38 |
+
| `init_session_state()` | Initialize defaults, no overwrite |
|
| 39 |
+
| `get_current_journey_state()` | Read current state |
|
| 40 |
+
| `advance_journey(target)` | Forward-only transition with validation |
|
| 41 |
+
| `can_advance_to(target)` | Prerequisite check |
|
| 42 |
+
| `reset_to_ingest()` | Special backward transition for gap re-ingestion |
|
| 43 |
+
| `reset_session_state()` | Full reset to defaults |
|
| 44 |
+
|
| 45 |
+
## Prerequisite Guards
|
| 46 |
+
|
| 47 |
+
| Target State | Requires |
|
| 48 |
+
|-------------|----------|
|
| 49 |
+
| PRESCREEN | `patient_profile` is set |
|
| 50 |
+
| VALIDATE_TRIALS | `patient_profile` is set |
|
| 51 |
+
| GAP_FOLLOWUP | `patient_profile` + `trial_candidates` |
|
| 52 |
+
| SUMMARY | `patient_profile` + `trial_candidates` + `eligibility_ledger` |
|
| 53 |
+
|
| 54 |
+
`advance_journey()` enforces forward-only movement (raises `ValueError` on backward). The only exception is `reset_to_ingest()` for the gap re-ingestion loop.
|
| 55 |
+
|
| 56 |
+
## Page Mapping
|
| 57 |
+
|
| 58 |
+
| Page File | Journey State | Components Used |
|
| 59 |
+
|-----------|--------------|-----------------|
|
| 60 |
+
| `app/pages/1_upload.py` | INGEST | `file_uploader`, `disclaimer_banner` |
|
| 61 |
+
| `app/pages/2_profile_review.py` | PRESCREEN | `profile_card` |
|
| 62 |
+
| `app/pages/3_trial_matching.py` | VALIDATE_TRIALS | `trial_card` |
|
| 63 |
+
| `app/pages/4_gap_analysis.py` | GAP_FOLLOWUP | `gap_card` |
|
| 64 |
+
| `app/pages/5_summary.py` | SUMMARY | `profile_card`, `trial_card`, `gap_card` |
|
| 65 |
+
|
| 66 |
+
## Key Files
|
| 67 |
+
|
| 68 |
+
| File | Role |
|
| 69 |
+
|------|------|
|
| 70 |
+
| `app/services/state_manager.py` | State machine + prerequisites |
|
| 71 |
+
| `streamlit_app.py` | Multi-page navigation entry point |
|
| 72 |
+
| `app/components/progress_tracker.py` | Visual state indicator |
|