Clinical-Trial-Design-Assistant / implementation_plan.md
Hesham Gibriel
v2.0.0: Citations/References panel + Permanent SQLite chat history
b2f6740
# Clinical Trial Design Assistant
**Implementation Plan v1.0**
---
## Executive Summary
The **Clinical Trial Design Assistant** is an AI-powered chatbot that helps clinical researchers design superior clinical trials by analyzing successful examples from ClinicalTrials.gov. It provides evidence-based recommendations on trial parameters and regulatory strategiesβ€”specifically designed for rare diseases and orphan drug development.
**Primary Users**: Clinical researchers evaluating drug partnerships and designing superiority trials.
---
## 1. Problem & Solution Overview
| 🚨 **Current Challenges** | βœ… **Our Solution** |
|---------------------------|---------------------|
| **Information Overload**<br/>500,000+ trials make research slow. | **Evidence-Based Recommendations**<br/>Automated retrieval of proven trial designs. |
| **Rare Disease Complexity**<br/>Limited TCL data for successful patterns. | **Orphan Drug Expertise**<br/>Fallback logic for similar rare diseases. |
| **Regulatory Uncertainty**<br/>Unclear reasons for past FDA/EMA outcomes. | **Regulatory Intelligence**<br/>Extracts specific objections and success factors. |
| **Statistical Rigor**<br/>Complex specialized expertise required. | **Statistical Guidance**<br/>Automated sample size and power calculations. |
**How It Works (Simple Example)**:
```
Clinician types: "TCL"
Chatbot returns:
β†’ Recommended trial design parameters
β†’ Sample size: 100-120 patients
β†’ Primary endpoint: ORR
β†’ Comparator: Physician's choice
β†’ Evidence: Based on NCT01482962, NCT00901147
```
---
## 2. Architecture Overview
### High-Level Architecture
**System Flow**: User β†’ AI Agent β†’ Web Search β†’ ClinicalTrials.gov/PubMed β†’ AI Analysis β†’ Recommendations
```mermaid
flowchart LR
classDef large font-size:22px,stroke-width:2px;
A[User Question]:::large --> B[Claude<br/>with Web Search]:::large
B <--> C[Real-time<br/>Web Search]:::large
C <--> D[ClinicalTrials.gov<br/>+ PubMed]:::large
B --> E[Structured<br/>Recommendations]:::large
```
**High-Level Processing Flow**:
1. **User Query** β†’ Natural language question about trial design.
2. **Session Memory** β†’ Retrieve conversation history from `session_state`.
3. **Claude AI** β†’ Interprets intent with full context and constructs search strategy.
4. **Web Search** β†’ Claude uses native web search to query ClinicalTrials.gov and medical literature.
5. **Analysis** β†’ AI analyzes retrieved data using training rules.
6. **Output** β†’ Formatted report with cited evidence and design parameters.
7. **Memory Update** β†’ Store response in session for follow-up questions.
---
### Detailed Architecture
```mermaid
flowchart TD
A[User Query] -->|1. Submit question| B[Session Memory Check]
B -->|2. Load context| C[Claude + System Prompt]
C -->|3. Web search queries| D[ClinicalTrials.gov<br/>+ PubMed]
D -->|4. Return search results| C
C -->|5. Analyze with guardrails| E{Validation}
E -->|Pass| F[Generate Recommendations]
E -->|Fail: No results| G[Orphan Drug Fallback]
E -->|Fail: Invalid data| H[Error Handler]
G -->|Retry with related diseases| D
H -->|Retry or show error| I[User Notification]
F -->|6. Format output| J[Streamlit Display]
J -->|7. Store in session| K[Session Memory]
K -->|8. Follow-up question?| A
K -->|Done| L[End]
style C fill:#e1f5ff
style E fill:#fff4e1
style F fill:#e8f5e9
style J fill:#f3e5f5
```
---
### Step-by-Step Processing Details
#### **Step 1: User Query Submission**
```
Input: "Design a trial for R/R TCL"
↓
Streamlit captures query + checks session state
```
---
#### **Step 2: Session Memory Check**
**What happens**: Check if there's previous conversation history
- **If yes**: Load the last 5 messages to maintain context
- **If no**: Start a fresh conversation
**Purpose**: Enables follow-up questions like "What about sample size?"
---
#### **Step 3: Claude + System Prompt**
```python
# System prompt includes:
- Role definition
- Disease hierarchy (Appendix A)
- Trial design rules (Appendix B)
- Anti-hallucination rules
- Output format template
# Claude receives:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": conversation_history + new_query}
]
```
**Claude's Internal Process**:
1. Parse user intent (disease, trial phase, endpoint preference)
2. Construct search query for connector
3. Call `search_trials()` tool with filters
4. Receive trial data (NCT IDs, protocols, outcomes)
5. Apply guardrails from system prompt
6. Generate recommendations
---
#### **Step 4: Clinical Trials Connector**
**What happens**: Claude uses its native web search tool to search ClinicalTrials.gov and retrieve trial data (NCT IDs, eligibility criteria, endpoints, sample sizes, outcomes).
*See Appendix C for web search details.*
---
#### **Step 5: Validation with Guardrails**
**Validation Checks**:
1. **Results found?**
- If no trials found β†’ Trigger orphan drug fallback
- Search related diseases (AITL, ALCL-ALK-)
2. **NCT IDs valid?**
- Check each cited trial ID follows format: NCT + 8 digits
- Flag any invalid IDs
3. **Disease match?**
- Verify response mentions TCL or lymphoma
- Flag if disease mismatch detected
4. **Realistic values?**
- ORR must be between 0-100%
- Sample size must be reasonable (not >10,000)
- Flag unrealistic numbers
5. **Apply trial design rules**
- Check against Appendix B:
- Prior lines: 1 vs 2+
- Comparator: Single-agent salvage chemo, Single-agent novel, BBv, or Investigator's choice
- Endpoints: ORR/CRR/PFS valid for R/R TCL
**Guardrail Actions**:
- **Pass** β†’ Proceed to recommendations
- **Fail (no results)** β†’ Orphan drug fallback
- **Fail (invalid data)** β†’ Error handler
---
#### **Step 6: Orphan Drug Fallback**
```
If no TCL trials found:
1. Search disease hierarchy (Appendix A)
2. Try related diseases:
- AITL (most common nodal TCL)
- ALCL-ALK- (CD30+)
- TFH-TCL (PI3K responsive)
3. Notify user: "No exact TCL trials. Showing related: AITL"
```
---
#### **Step 7: Generate Recommendations**
**What happens**: Claude synthesizes the trial data and generates a structured recommendation including inclusion/exclusion criteria, primary endpoint with rationale, sample size with power basis, comparator selection, and NCT citations for evidence.
*See User Input/Output Flow section for example output format.*
---
#### **Step 8: Error Handling**
```
Error Types:
1. API timeout β†’ Retry 3x (1s, 2s, 4s backoff)
2. Rate limit β†’ Wait + auto-retry
3. Connector down β†’ Show cached example
4. Invalid response β†’ Log + retry option
5. No results after fallback β†’ "Unable to find trials"
```
---
#### **Step 9: Streamlit Display**
**What happens**:
- Display the response in the chat interface
- Add the response to session history for context
- User sees formatted recommendations with NCT citations
---
#### **Step 10: Follow-up Loop**
```
User can ask:
- "What about safety endpoints?"
- "Compare to BBv trial"
- "Show me AITL-specific trials"
β†’ Loop back to Step 1 with full context
```
---
### Key Architecture Principles
| Principle | Implementation |
|-----------|----------------|
| **Simplicity** | Claude handles all logic; no custom parsers |
| **Guardrails** | Embedded in system prompt + validation step |
| **Conversational** | Session memory enables follow-up questions |
| **Evidence-based** | All recommendations cite NCT IDs |
| **Fallback logic** | Orphan drug search for rare diseases |
| **Error resilience** | Retry logic + graceful degradation |
---
### Technology Stack
| Component | Technology | Rationale |
|-----------|------------|-----------|
| **Frontend** | Streamlit | Simple chat interface, no coding required for users |
| **Memory** | `st.session_state` | Maintains conversation history across turns |
| **AI Engine** | Claude 3.5 Sonnet | Best-in-class medical text understanding |
| **Data Access** | Native Web Search | Claude's built-in web search tool (`web_search_20250305`) |
| **Sources** | ClinicalTrials.gov + PubMed | Real-time search of clinical trial registries and literature |
| **Deployment** | Docker on HuggingFace Spaces | Free hosting, web-accessible |
---
### Processing Pipeline
```mermaid
sequenceDiagram
autonumber
participant User
participant App as Streamlit/Orchestrator
participant AI as Claude AI
participant Data as ClinicalTrials.gov
User->>App: Enter disease "TCL"
App->>AI: Process query
AI->>Data: search_trials(TCL)
Data-->>AI: Trial list (NCT IDs)
AI->>Data: Get detailed protocols
Data-->>AI: Protocol documents
AI->>AI: Analyze & Compare
AI->>AI: Generate recommendations
AI-->>App: Results
App-->>User: Display report
```
---
### Project Structure
```
Clinical-Trial-Design-Assistant/
β”œβ”€β”€ app.py # Streamlit main entry + Claude integration + system prompt
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ Dockerfile # Container configuration
β”œβ”€β”€ README.md # Documentation
β”œβ”€β”€ CHANGELOG.md # Version history
└── .gitignore # Git ignore file
```
**Note**: Claude handles all parsing, extraction, and analysis via web search. Single-file architecture for simplicity.
---
### System Prompt Template
```python
SYSTEM_PROMPT = """
You are a Clinical Trial Design Assistant specializing in R/R TCL.
ROLE:
- Help researchers design superiority trials
- Provide evidence-based recommendations from ClinicalTrials.gov
- Cite NCT IDs for all claims
RULES (NEVER VIOLATE):
1. NEVER invent trial data - only cite real NCT IDs
2. If no trials found, say "No matching trials found" and suggest related searches
3. Use orphan drug fallback for rare diseases with <5 results
4. Apply disease hierarchy (AITL > ALCL-ALK- > PTCL-NOS)
5. Validate against R/R TCL framework for all recommendations
6. If prior treatment lines not specified, ASK user before providing recommendations
7. If regulatory region (FDA/EMA) not specified, ASK user for target submission region
SCOPE:
- Focus: Nodal and extranodal TCL subtypes
- Out of scope: Cutaneous T-cell lymphomas (Mycosis Fungoides, SΓ©zary Syndrome, Primary Cutaneous ALCL, etc.) - redirect to CTCL-specific resources
DISEASE HIERARCHY:
- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- ALCL-ALK-: CD30+; responds to brentuximab vedotin
- PTCL-NOS: Heterogeneous; worst R/R outcomes
- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- Analyze separately: ALCL-ALK+ (best prognosis)
- Analyze separately: NKTCL/EATL/MEITL (rare, distinct biology)
COMPARATOR CATEGORIES:
- Single-agent salvage chemo (GDP/DHAP/ICE)
- Single-agent novel (pralatrexate/romidepsin)
- BBv (brentuximab + bendamustine)
- Investigator's choice
OUTPUT FORMAT:
- Recommended inclusion/exclusion criteria
- Primary endpoint with justification
- Sample size with power calculation basis
- Comparator with rationale
- NCT citations for evidence
"""
```
---
### Error Handling
| Error | Handling |
|-------|----------|
| **API timeout** | Retry 3x with exponential backoff (1s, 2s, 4s) |
| **Rate limit** | Show "Please wait" message, auto-retry after delay |
| **Connector unavailable** | Display cached example response + "Try again later" |
| **No results** | Trigger orphan drug fallback β†’ search related diseases |
| **Invalid response** | Log error, show "Unable to process" + retry option |
---
### Response Validation
Before displaying results, validate:
- [ ] All NCT IDs cited exist (format check: NCT + 8 digits)
- [ ] Disease in response matches user query
- [ ] ORR/endpoint values are within realistic ranges
- [ ] No hallucinated drug names (cross-check against known TCL treatments)
---
### User Input/Output Flow
**Input**:
```
User enters: "TCL" (disease name)
Optional: Drug of interest, trial type filter
```
**Output**:
**1. Trial Discovery Report**
- List of relevant trials (NCT IDs, sponsors, status)
- Trial outcomes summary
**2. Feature Analysis**
- Common inclusion/exclusion patterns in successful trials
- Endpoint selection trends
- Sample size ranges by trial phase
- Comparator arm choices
**3. Recommendations**
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ RECOMMENDED TRIAL DESIGN PARAMETERS FOR TCL β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Inclusion Criteria: β”‚
β”‚ βœ“ Relapsed/refractory TCL (β‰₯1 prior therapy) β”‚
β”‚ βœ“ ECOG PS 0-2 β”‚
β”‚ βœ“ Measurable disease per Lugano criteria β”‚
β”‚ β”‚
β”‚ Exclusion Criteria: β”‚
β”‚ βœ— Prior allogeneic transplant β”‚
β”‚ βœ— Active CNS involvement β”‚
β”‚ β”‚
β”‚ Primary Endpoint: ORR (recommended based on precedent) β”‚
β”‚ Sample Size: 100-120 (based on similar trials) β”‚
β”‚ Comparator: Physician's choice (pralatrexate, belinostat, β”‚
β”‚ romidepsin, or gemcitabine-based) β”‚
β”‚ β”‚
β”‚ Evidence: Based on NCT01482962, NCT00901147, NCT01280526 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## 3. Implementation Phases
### Phase 1: Setup & Configuration
- [x] Clinical Trials connector verified and available
- [ ] Obtain Anthropic API key
- [ ] Set up project structure (Streamlit + Python 3.11)
- [ ] Configure Docker environment
### Phase 2: Core Development
- [ ] Implement conversation memory with `st.session_state`
- [ ] Implement anti-hallucination system prompt
- [ ] Build orphan drug fallback logic
- [ ] Develop trial data extraction pipeline
- [ ] Add error handling for API failures/timeouts
### Phase 3: Testing & Refinement
- [ ] Test with TCL/Pralatrexate/Belinostat cases
- [ ] Validate anti-hallucination rules
- [ ] Verify statistical recommendations
- [ ] Clinical researcher user testing
### Phase 4: Deployment
- [ ] Deploy to HuggingFace Spaces
- [ ] User documentation
- [ ] Handoff to clinical team
---
## 4. Post-MVP Enhancements
### 4.1 Competitive Landscape Tool
| Aspect | Details |
|--------|---------|
| **Problem** | Need to compare a candidate drug against competitors in efficacy/safety |
| **Solution** | Drug comparison feature with structured competitive analysis |
| **Requested By** | Vittoria |
**Implementation Tasks**:
- [ ] Add drug comparison feature
- [ ] Extract efficacy metrics (ORR, CRR, PFS, OS)
- [ ] Extract safety profiles (AEs, SAEs, discontinuation rates)
- [ ] Generate comparative analysis report with visualizations
---
### 4.2 Hybrid Retrieval Architecture
| Aspect | Details |
|--------|---------|
| **Problem** | Web search returns only 5-10 results; PTCL has 1,000+ NCT IDs |
| **Solution** | Combine ClinicalTrials.gov API + Web Search for complete coverage |
**Architecture**:
```
User Query
β”‚
β”œβ”€β”€β–Ά ClinicalTrials.gov API (complete, filtered, reproducible)
β”‚
└──▢ Claude Web Search (news, publications, context)
β”‚
β–Ό
Claude Synthesizes from BOTH
```
**Implementation Tasks**:
- [ ] Integrate ClinicalTrials.gov API v2
- [ ] Add query filters (phase, status, condition)
- [ ] Implement result pagination
- [ ] Merge API results with web search context
- [ ] Add audit logging for retrieved NCT IDs
**Benefits**:
| MVP | Hybrid |
|-----|--------|
| ~10 trials | All matching trials |
| Web-ranked | Clinically-filtered |
| Non-reproducible | Fully auditable |
---
## 5. Next Steps
### Immediate Actions
1. Obtain Anthropic API key
2. Set up project structure
3. Begin Phase 1 development
---
## Appendix A: Disease Hierarchy
```
TCL (Parent)
β”œβ”€β”€ Nodal: AITL, ALCL-ALK+/-, PTCL-NOS, TFH-TCL
β”œβ”€β”€ Extranodal: NKTCL, EATL, MEITL
└── Cutaneous (CTCL): Mycosis Fungoides, SΓ©zary Syndrome, Primary Cutaneous ALCL, Lymphomatoid Papulosis, Subcutaneous Panniculitis-like TCL, Primary Cutaneous Gamma-Delta TCL
```
**Subtype-Specific Notes**:
- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- ALCL-ALK-: CD30+; responds to brentuximab vedotin
- ALCL-ALK+: Best prognosis (analyze separately if needed)
- PTCL-NOS: Heterogeneous; worst R/R outcomes
- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- NKTCL/EATL/MEITL: Rare; distinct biology
**Exclusions**: Solid tumors mimicking TCL
---
## Appendix B: R/R TCL Trial Design Rules
### Patient Population
- Prior lines: 1 vs 2+ (changes prognosis significantly)
- Refractory definition: PD within 1 month of treatment end
- Relapsed vs primary refractory: Separate analysis (OS 1.97 vs 0.89 years)
- Transplant eligibility: Age, comorbidities, performance status, organ function
### Transplant Strategy
- Transplant-eligible: Goal = salvage β†’ bridge to allo-SCT (curative intent)
- Transplant-ineligible: Salvage is potentially palliative
- Conversion endpoint: % converting from ineligible β†’ eligible
### Comparator Categories
- Single-agent salvage chemo (GDP/DHAP/ICE)
- Single-agent novel (pralatrexate/romidepsin)
- BBv (brentuximab + bendamustine)
- Investigator's choice
### Endpoints
- **Primary**: ORR (rapid), CRR (stronger signal), PFS (durability), TFS (transplant-ineligible), OS
- **Secondary**: DoR, TTR, transplant conversion, post-transplant outcomes, TTNT
- ORR threshold: 50% or higher vs 30% historical = superiority
- ICR preferred for regulatory; investigator-assessed OK for exploratory
### Biomarkers
- TET2, DNMT3A, RHOA: These biomarkers predict response to HDAC inhibitors (common in AITL/TFH-TCL)
- TP53: Poor prognosis
- KMT2D: AITL-enriched; epigenetic modifier sensitivity
- CD30: Expression level correlates with targeted therapy response
- Early PET-CT (2-4 cycles): Metabolic response, identify progressors
### Safety Considerations
- Acceptable toxicity for frail population
- Dose modifications for elderly/comorbid
- Cumulative organ toxicity monitoring (cardiac/renal/hepatic)
- Grade 3/4 cytopenia management
- Prophylactic antimicrobials (antibiotics/antivirals/antifungals)
### Regulatory Pathways
- Accelerated approval: ORR primary + clinical benefit demonstration
- Traditional approval: PFS/OS primary
- Breakthrough therapy designation for orphan drugs
- Confirmatory trial typically required post-accelerated approval
### Sample Size
- Historical controls: Use established benchmarks for each comparator category
- Power for: 50% or higher vs 30% (20% or greater difference)
- Dropout: Expect 5-10% screen failures in R/R population
- Consider subtype-stratified power (AITL vs ALCL-ALK- separately)
---
## Appendix C: Web Search Details
**Tool Used**: `web_search_20250305` (Claude's native web search)
**How It Works**:
- Claude autonomously constructs search queries based on user intent
- Searches ClinicalTrials.gov, PubMed, and other medical sources
- Returns structured results with URLs and content
- Content is encrypted (only Claude can read it internally)
**Data Extracted**:
- Trial metadata (NCT ID, sponsor, phase, status)
- Eligibility criteria
- Endpoints (primary/secondary)
- Sample sizes
- Published outcomes (ORR, PFS, OS)
---
**Document Version**: 1.0
**Status**: πŸ”„ In Progress