Spaces:

Ideogen
/

Clinical-Trial-Design-Assistant

Sleeping

App Files Files Community

Clinical-Trial-Design-Assistant / implementation_plan.md

Hesham Gibriel

v2.0.0: Citations/References panel + Permanent SQLite chat history

b2f6740 19 days ago

preview code

raw

history blame contribute delete

20.3 kB

	# Clinical Trial Design Assistant
	Implementation Plan v1.0

	---

	## Executive Summary

	The Clinical Trial Design Assistant is an AI-powered chatbot that helps clinical researchers design superior clinical trials by analyzing successful examples from ClinicalTrials.gov. It provides evidence-based recommendations on trial parameters and regulatory strategies—specifically designed for rare diseases and orphan drug development.

	Primary Users: Clinical researchers evaluating drug partnerships and designing superiority trials.

	---

	## 1. Problem & Solution Overview

	\| 🚨 Current Challenges \| ✅ Our Solution \|
	\|---------------------------\|---------------------\|
	\| Information Overload<br/>500,000+ trials make research slow. \| Evidence-Based Recommendations<br/>Automated retrieval of proven trial designs. \|
	\| Rare Disease Complexity<br/>Limited TCL data for successful patterns. \| Orphan Drug Expertise<br/>Fallback logic for similar rare diseases. \|
	\| Regulatory Uncertainty<br/>Unclear reasons for past FDA/EMA outcomes. \| Regulatory Intelligence<br/>Extracts specific objections and success factors. \|
	\| Statistical Rigor<br/>Complex specialized expertise required. \| Statistical Guidance<br/>Automated sample size and power calculations. \|

	How It Works (Simple Example):
	```
	Clinician types: "TCL"

	Chatbot returns:
	→ Recommended trial design parameters
	→ Sample size: 100-120 patients
	→ Primary endpoint: ORR
	→ Comparator: Physician's choice
	→ Evidence: Based on NCT01482962, NCT00901147
	```

	---

	## 2. Architecture Overview

	### High-Level Architecture

	System Flow: User → AI Agent → Web Search → ClinicalTrials.gov/PubMed → AI Analysis → Recommendations

	```mermaid
	flowchart LR
	classDef large font-size:22px,stroke-width:2px;

	A[User Question]:::large --> B[Claude<br/>with Web Search]:::large
	B <--> C[Real-time<br/>Web Search]:::large
	C <--> D[ClinicalTrials.gov<br/>+ PubMed]:::large
	B --> E[Structured<br/>Recommendations]:::large
	```

	High-Level Processing Flow:
	1. User Query → Natural language question about trial design.
	2. Session Memory → Retrieve conversation history from `session_state`.
	3. Claude AI → Interprets intent with full context and constructs search strategy.
	4. Web Search → Claude uses native web search to query ClinicalTrials.gov and medical literature.
	5. Analysis → AI analyzes retrieved data using training rules.
	6. Output → Formatted report with cited evidence and design parameters.
	7. Memory Update → Store response in session for follow-up questions.

	---

	### Detailed Architecture

	```mermaid
	flowchart TD
	A[User Query] -->\|1. Submit question\| B[Session Memory Check]
	B -->\|2. Load context\| C[Claude + System Prompt]
	C -->\|3. Web search queries\| D[ClinicalTrials.gov<br/>+ PubMed]
	D -->\|4. Return search results\| C
	C -->\|5. Analyze with guardrails\| E{Validation}
	E -->\|Pass\| F[Generate Recommendations]
	E -->\|Fail: No results\| G[Orphan Drug Fallback]
	E -->\|Fail: Invalid data\| H[Error Handler]
	G -->\|Retry with related diseases\| D
	H -->\|Retry or show error\| I[User Notification]
	F -->\|6. Format output\| J[Streamlit Display]
	J -->\|7. Store in session\| K[Session Memory]
	K -->\|8. Follow-up question?\| A
	K -->\|Done\| L[End]

	style C fill:#e1f5ff
	style E fill:#fff4e1
	style F fill:#e8f5e9
	style J fill:#f3e5f5
	```

	---

	### Step-by-Step Processing Details

	#### Step 1: User Query Submission
	```
	Input: "Design a trial for R/R TCL"
	↓
	Streamlit captures query + checks session state
	```

	---

	#### Step 2: Session Memory Check

	What happens: Check if there's previous conversation history
	- If yes: Load the last 5 messages to maintain context
	- If no: Start a fresh conversation

	Purpose: Enables follow-up questions like "What about sample size?"

	---

	#### Step 3: Claude + System Prompt
	```python
	# System prompt includes:
	- Role definition
	- Disease hierarchy (Appendix A)
	- Trial design rules (Appendix B)
	- Anti-hallucination rules
	- Output format template

	# Claude receives:
	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": conversation_history + new_query}
	]
	```

	Claude's Internal Process:
	1. Parse user intent (disease, trial phase, endpoint preference)
	2. Construct search query for connector
	3. Call `search_trials()` tool with filters
	4. Receive trial data (NCT IDs, protocols, outcomes)
	5. Apply guardrails from system prompt
	6. Generate recommendations

	---

	#### Step 4: Clinical Trials Connector

	What happens: Claude uses its native web search tool to search ClinicalTrials.gov and retrieve trial data (NCT IDs, eligibility criteria, endpoints, sample sizes, outcomes).

	See Appendix C for web search details.

	---

	#### Step 5: Validation with Guardrails

	Validation Checks:

	1. Results found?
	- If no trials found → Trigger orphan drug fallback
	- Search related diseases (AITL, ALCL-ALK-)

	2. NCT IDs valid?
	- Check each cited trial ID follows format: NCT + 8 digits
	- Flag any invalid IDs

	3. Disease match?
	- Verify response mentions TCL or lymphoma
	- Flag if disease mismatch detected

	4. Realistic values?
	- ORR must be between 0-100%
	- Sample size must be reasonable (not >10,000)
	- Flag unrealistic numbers

	5. Apply trial design rules
	- Check against Appendix B:
	- Prior lines: 1 vs 2+
	- Comparator: Single-agent salvage chemo, Single-agent novel, BBv, or Investigator's choice
	- Endpoints: ORR/CRR/PFS valid for R/R TCL

	Guardrail Actions:
	- Pass → Proceed to recommendations
	- Fail (no results) → Orphan drug fallback
	- Fail (invalid data) → Error handler

	---

	#### Step 6: Orphan Drug Fallback
	```
	If no TCL trials found:
	1. Search disease hierarchy (Appendix A)
	2. Try related diseases:
	- AITL (most common nodal TCL)
	- ALCL-ALK- (CD30+)
	- TFH-TCL (PI3K responsive)
	3. Notify user: "No exact TCL trials. Showing related: AITL"
	```

	---

	#### Step 7: Generate Recommendations

	What happens: Claude synthesizes the trial data and generates a structured recommendation including inclusion/exclusion criteria, primary endpoint with rationale, sample size with power basis, comparator selection, and NCT citations for evidence.

	See User Input/Output Flow section for example output format.

	---

	#### Step 8: Error Handling
	```
	Error Types:
	1. API timeout → Retry 3x (1s, 2s, 4s backoff)
	2. Rate limit → Wait + auto-retry
	3. Connector down → Show cached example
	4. Invalid response → Log + retry option
	5. No results after fallback → "Unable to find trials"
	```

	---

	#### Step 9: Streamlit Display

	What happens:
	- Display the response in the chat interface
	- Add the response to session history for context
	- User sees formatted recommendations with NCT citations

	---

	#### Step 10: Follow-up Loop
	```
	User can ask:
	- "What about safety endpoints?"
	- "Compare to BBv trial"
	- "Show me AITL-specific trials"

	→ Loop back to Step 1 with full context
	```

	---

	### Key Architecture Principles

	\| Principle \| Implementation \|
	\|-----------\|----------------\|
	\| Simplicity \| Claude handles all logic; no custom parsers \|
	\| Guardrails \| Embedded in system prompt + validation step \|
	\| Conversational \| Session memory enables follow-up questions \|
	\| Evidence-based \| All recommendations cite NCT IDs \|
	\| Fallback logic \| Orphan drug search for rare diseases \|
	\| Error resilience \| Retry logic + graceful degradation \|

	---

	### Technology Stack

	\| Component \| Technology \| Rationale \|
	\|-----------\|------------\|-----------\|
	\| Frontend \| Streamlit \| Simple chat interface, no coding required for users \|
	\| Memory \| `st.session_state` \| Maintains conversation history across turns \|
	\| AI Engine \| Claude 3.5 Sonnet \| Best-in-class medical text understanding \|
	\| Data Access \| Native Web Search \| Claude's built-in web search tool (`web_search_20250305`) \|
	\| Sources \| ClinicalTrials.gov + PubMed \| Real-time search of clinical trial registries and literature \|
	\| Deployment \| Docker on HuggingFace Spaces \| Free hosting, web-accessible \|

	---

	### Processing Pipeline

	```mermaid
	sequenceDiagram
	autonumber
	participant User
	participant App as Streamlit/Orchestrator
	participant AI as Claude AI
	participant Data as ClinicalTrials.gov

	User->>App: Enter disease "TCL"
	App->>AI: Process query
	AI->>Data: search_trials(TCL)
	Data-->>AI: Trial list (NCT IDs)
	AI->>Data: Get detailed protocols
	Data-->>AI: Protocol documents
	AI->>AI: Analyze & Compare
	AI->>AI: Generate recommendations
	AI-->>App: Results
	App-->>User: Display report
	```

	---

	### Project Structure

	```
	Clinical-Trial-Design-Assistant/
	├── app.py # Streamlit main entry + Claude integration + system prompt
	├── requirements.txt # Python dependencies
	├── Dockerfile # Container configuration
	├── README.md # Documentation
	├── CHANGELOG.md # Version history
	└── .gitignore # Git ignore file
	```

	Note: Claude handles all parsing, extraction, and analysis via web search. Single-file architecture for simplicity.

	---

	### System Prompt Template

	```python
	SYSTEM_PROMPT = """
	You are a Clinical Trial Design Assistant specializing in R/R TCL.

	ROLE:
	- Help researchers design superiority trials
	- Provide evidence-based recommendations from ClinicalTrials.gov
	- Cite NCT IDs for all claims

	RULES (NEVER VIOLATE):
	1. NEVER invent trial data - only cite real NCT IDs
	2. If no trials found, say "No matching trials found" and suggest related searches
	3. Use orphan drug fallback for rare diseases with <5 results
	4. Apply disease hierarchy (AITL > ALCL-ALK- > PTCL-NOS)
	5. Validate against R/R TCL framework for all recommendations
	6. If prior treatment lines not specified, ASK user before providing recommendations
	7. If regulatory region (FDA/EMA) not specified, ASK user for target submission region

	SCOPE:
	- Focus: Nodal and extranodal TCL subtypes
	- Out of scope: Cutaneous T-cell lymphomas (Mycosis Fungoides, Sézary Syndrome, Primary Cutaneous ALCL, etc.) - redirect to CTCL-specific resources

	DISEASE HIERARCHY:
	- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
	- ALCL-ALK-: CD30+; responds to brentuximab vedotin
	- PTCL-NOS: Heterogeneous; worst R/R outcomes
	- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
	- Analyze separately: ALCL-ALK+ (best prognosis)
	- Analyze separately: NKTCL/EATL/MEITL (rare, distinct biology)

	COMPARATOR CATEGORIES:
	- Single-agent salvage chemo (GDP/DHAP/ICE)
	- Single-agent novel (pralatrexate/romidepsin)
	- BBv (brentuximab + bendamustine)
	- Investigator's choice

	OUTPUT FORMAT:
	- Recommended inclusion/exclusion criteria
	- Primary endpoint with justification
	- Sample size with power calculation basis
	- Comparator with rationale
	- NCT citations for evidence
	"""
	```

	---

	### Error Handling

	\| Error \| Handling \|
	\|-------\|----------\|
	\| API timeout \| Retry 3x with exponential backoff (1s, 2s, 4s) \|
	\| Rate limit \| Show "Please wait" message, auto-retry after delay \|
	\| Connector unavailable \| Display cached example response + "Try again later" \|
	\| No results \| Trigger orphan drug fallback → search related diseases \|
	\| Invalid response \| Log error, show "Unable to process" + retry option \|

	---

	### Response Validation

	Before displaying results, validate:
	- [ ] All NCT IDs cited exist (format check: NCT + 8 digits)
	- [ ] Disease in response matches user query
	- [ ] ORR/endpoint values are within realistic ranges
	- [ ] No hallucinated drug names (cross-check against known TCL treatments)

	---

	### User Input/Output Flow

	Input:
	```
	User enters: "TCL" (disease name)
	Optional: Drug of interest, trial type filter
	```

	Output:

	1. Trial Discovery Report
	- List of relevant trials (NCT IDs, sponsors, status)
	- Trial outcomes summary

	2. Feature Analysis
	- Common inclusion/exclusion patterns in successful trials
	- Endpoint selection trends
	- Sample size ranges by trial phase
	- Comparator arm choices

	3. Recommendations
	```
	┌─────────────────────────────────────────────────────────────┐
	│ RECOMMENDED TRIAL DESIGN PARAMETERS FOR TCL │
	├─────────────────────────────────────────────────────────────┤
	│ Inclusion Criteria: │
	│ ✓ Relapsed/refractory TCL (≥1 prior therapy) │
	│ ✓ ECOG PS 0-2 │
	│ ✓ Measurable disease per Lugano criteria │
	│ │
	│ Exclusion Criteria: │
	│ ✗ Prior allogeneic transplant │
	│ ✗ Active CNS involvement │
	│ │
	│ Primary Endpoint: ORR (recommended based on precedent) │
	│ Sample Size: 100-120 (based on similar trials) │
	│ Comparator: Physician's choice (pralatrexate, belinostat, │
	│ romidepsin, or gemcitabine-based) │
	│ │
	│ Evidence: Based on NCT01482962, NCT00901147, NCT01280526 │
	└─────────────────────────────────────────────────────────────┘
	```

	---

	## 3. Implementation Phases

	### Phase 1: Setup & Configuration
	- [x] Clinical Trials connector verified and available
	- [ ] Obtain Anthropic API key
	- [ ] Set up project structure (Streamlit + Python 3.11)
	- [ ] Configure Docker environment

	### Phase 2: Core Development
	- [ ] Implement conversation memory with `st.session_state`
	- [ ] Implement anti-hallucination system prompt
	- [ ] Build orphan drug fallback logic
	- [ ] Develop trial data extraction pipeline
	- [ ] Add error handling for API failures/timeouts

	### Phase 3: Testing & Refinement
	- [ ] Test with TCL/Pralatrexate/Belinostat cases
	- [ ] Validate anti-hallucination rules
	- [ ] Verify statistical recommendations
	- [ ] Clinical researcher user testing

	### Phase 4: Deployment
	- [ ] Deploy to HuggingFace Spaces
	- [ ] User documentation
	- [ ] Handoff to clinical team

	---

	## 4. Post-MVP Enhancements

	### 4.1 Competitive Landscape Tool

	\| Aspect \| Details \|
	\|--------\|---------\|
	\| Problem \| Need to compare a candidate drug against competitors in efficacy/safety \|
	\| Solution \| Drug comparison feature with structured competitive analysis \|
	\| Requested By \| Vittoria \|

	Implementation Tasks:
	- [ ] Add drug comparison feature
	- [ ] Extract efficacy metrics (ORR, CRR, PFS, OS)
	- [ ] Extract safety profiles (AEs, SAEs, discontinuation rates)
	- [ ] Generate comparative analysis report with visualizations

	---

	### 4.2 Hybrid Retrieval Architecture

	\| Aspect \| Details \|
	\|--------\|---------\|
	\| Problem \| Web search returns only 5-10 results; PTCL has 1,000+ NCT IDs \|
	\| Solution \| Combine ClinicalTrials.gov API + Web Search for complete coverage \|

	Architecture:
	```
	User Query
	│
	├──▶ ClinicalTrials.gov API (complete, filtered, reproducible)
	│
	└──▶ Claude Web Search (news, publications, context)
	│
	▼
	Claude Synthesizes from BOTH
	```

	Implementation Tasks:
	- [ ] Integrate ClinicalTrials.gov API v2
	- [ ] Add query filters (phase, status, condition)
	- [ ] Implement result pagination
	- [ ] Merge API results with web search context
	- [ ] Add audit logging for retrieved NCT IDs

	Benefits:
	\| MVP \| Hybrid \|
	\|-----\|--------\|
	\| ~10 trials \| All matching trials \|
	\| Web-ranked \| Clinically-filtered \|
	\| Non-reproducible \| Fully auditable \|

	---

	## 5. Next Steps

	### Immediate Actions
	1. Obtain Anthropic API key
	2. Set up project structure
	3. Begin Phase 1 development

	---

	## Appendix A: Disease Hierarchy

	```
	TCL (Parent)
	├── Nodal: AITL, ALCL-ALK+/-, PTCL-NOS, TFH-TCL
	├── Extranodal: NKTCL, EATL, MEITL
	└── Cutaneous (CTCL): Mycosis Fungoides, Sézary Syndrome, Primary Cutaneous ALCL, Lymphomatoid Papulosis, Subcutaneous Panniculitis-like TCL, Primary Cutaneous Gamma-Delta TCL
	```

	Subtype-Specific Notes:
	- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
	- ALCL-ALK-: CD30+; responds to brentuximab vedotin
	- ALCL-ALK+: Best prognosis (analyze separately if needed)
	- PTCL-NOS: Heterogeneous; worst R/R outcomes
	- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
	- NKTCL/EATL/MEITL: Rare; distinct biology

	Exclusions: Solid tumors mimicking TCL

	---

	## Appendix B: R/R TCL Trial Design Rules

	### Patient Population
	- Prior lines: 1 vs 2+ (changes prognosis significantly)
	- Refractory definition: PD within 1 month of treatment end
	- Relapsed vs primary refractory: Separate analysis (OS 1.97 vs 0.89 years)
	- Transplant eligibility: Age, comorbidities, performance status, organ function

	### Transplant Strategy
	- Transplant-eligible: Goal = salvage → bridge to allo-SCT (curative intent)
	- Transplant-ineligible: Salvage is potentially palliative
	- Conversion endpoint: % converting from ineligible → eligible

	### Comparator Categories
	- Single-agent salvage chemo (GDP/DHAP/ICE)
	- Single-agent novel (pralatrexate/romidepsin)
	- BBv (brentuximab + bendamustine)
	- Investigator's choice

	### Endpoints
	- Primary: ORR (rapid), CRR (stronger signal), PFS (durability), TFS (transplant-ineligible), OS
	- Secondary: DoR, TTR, transplant conversion, post-transplant outcomes, TTNT
	- ORR threshold: 50% or higher vs 30% historical = superiority
	- ICR preferred for regulatory; investigator-assessed OK for exploratory

	### Biomarkers
	- TET2, DNMT3A, RHOA: These biomarkers predict response to HDAC inhibitors (common in AITL/TFH-TCL)
	- TP53: Poor prognosis
	- KMT2D: AITL-enriched; epigenetic modifier sensitivity
	- CD30: Expression level correlates with targeted therapy response
	- Early PET-CT (2-4 cycles): Metabolic response, identify progressors

	### Safety Considerations
	- Acceptable toxicity for frail population
	- Dose modifications for elderly/comorbid
	- Cumulative organ toxicity monitoring (cardiac/renal/hepatic)
	- Grade 3/4 cytopenia management
	- Prophylactic antimicrobials (antibiotics/antivirals/antifungals)

	### Regulatory Pathways
	- Accelerated approval: ORR primary + clinical benefit demonstration
	- Traditional approval: PFS/OS primary
	- Breakthrough therapy designation for orphan drugs
	- Confirmatory trial typically required post-accelerated approval

	### Sample Size
	- Historical controls: Use established benchmarks for each comparator category
	- Power for: 50% or higher vs 30% (20% or greater difference)
	- Dropout: Expect 5-10% screen failures in R/R population
	- Consider subtype-stratified power (AITL vs ALCL-ALK- separately)

	---

	## Appendix C: Web Search Details

	Tool Used: `web_search_20250305` (Claude's native web search)

	How It Works:
	- Claude autonomously constructs search queries based on user intent
	- Searches ClinicalTrials.gov, PubMed, and other medical sources
	- Returns structured results with URLs and content
	- Content is encrypted (only Claude can read it internally)

	Data Extracted:
	- Trial metadata (NCT ID, sponsor, phase, status)
	- Eligibility criteria
	- Endpoints (primary/secondary)
	- Sample sizes
	- Published outcomes (ORR, PFS, OS)

	---

	Document Version: 1.0
	Status: 🔄 In Progress