Spaces:

Ideogen
/

Clinical-Trial-Design-Assistant

Running

File size: 20,317 Bytes

b2f6740

# Clinical Trial Design Assistant
**Implementation Plan v1.0**

---

## Executive Summary

The **Clinical Trial Design Assistant** is an AI-powered chatbot that helps clinical researchers design superior clinical trials by analyzing successful examples from ClinicalTrials.gov. It provides evidence-based recommendations on trial parameters and regulatory strategies—specifically designed for rare diseases and orphan drug development.

**Primary Users**: Clinical researchers evaluating drug partnerships and designing superiority trials.

---

## 1. Problem & Solution Overview

| 🚨 **Current Challenges** | ✅ **Our Solution** |
|---------------------------|---------------------|
| **Information Overload**<br/>500,000+ trials make research slow. | **Evidence-Based Recommendations**<br/>Automated retrieval of proven trial designs. |
| **Rare Disease Complexity**<br/>Limited TCL data for successful patterns. | **Orphan Drug Expertise**<br/>Fallback logic for similar rare diseases. |
| **Regulatory Uncertainty**<br/>Unclear reasons for past FDA/EMA outcomes. | **Regulatory Intelligence**<br/>Extracts specific objections and success factors. |
| **Statistical Rigor**<br/>Complex specialized expertise required. | **Statistical Guidance**<br/>Automated sample size and power calculations. |

**How It Works (Simple Example)**:
```
Clinician types: "TCL"

Chatbot returns:
→ Recommended trial design parameters
→ Sample size: 100-120 patients
→ Primary endpoint: ORR
→ Comparator: Physician's choice
→ Evidence: Based on NCT01482962, NCT00901147
```

---

## 2. Architecture Overview

### High-Level Architecture

**System Flow**: User → AI Agent → Web Search → ClinicalTrials.gov/PubMed → AI Analysis → Recommendations

```mermaid
flowchart LR
    classDef large font-size:22px,stroke-width:2px;

    A[User Question]:::large --> B[Claude<br/>with Web Search]:::large
    B <--> C[Real-time<br/>Web Search]:::large
    C <--> D[ClinicalTrials.gov<br/>+ PubMed]:::large
    B --> E[Structured<br/>Recommendations]:::large
```

**High-Level Processing Flow**:
1. **User Query** → Natural language question about trial design.
2. **Session Memory** → Retrieve conversation history from `session_state`.
3. **Claude AI** → Interprets intent with full context and constructs search strategy.
4. **Web Search** → Claude uses native web search to query ClinicalTrials.gov and medical literature.
5. **Analysis** → AI analyzes retrieved data using training rules.
6. **Output** → Formatted report with cited evidence and design parameters.
7. **Memory Update** → Store response in session for follow-up questions.

---

### Detailed Architecture

```mermaid
flowchart TD
    A[User Query] -->|1. Submit question| B[Session Memory Check]
    B -->|2. Load context| C[Claude + System Prompt]
    C -->|3. Web search queries| D[ClinicalTrials.gov<br/>+ PubMed]
    D -->|4. Return search results| C
    C -->|5. Analyze with guardrails| E{Validation}
    E -->|Pass| F[Generate Recommendations]
    E -->|Fail: No results| G[Orphan Drug Fallback]
    E -->|Fail: Invalid data| H[Error Handler]
    G -->|Retry with related diseases| D
    H -->|Retry or show error| I[User Notification]
    F -->|6. Format output| J[Streamlit Display]
    J -->|7. Store in session| K[Session Memory]
    K -->|8. Follow-up question?| A
    K -->|Done| L[End]
    
    style C fill:#e1f5ff
    style E fill:#fff4e1
    style F fill:#e8f5e9
    style J fill:#f3e5f5
```

---

### Step-by-Step Processing Details

#### **Step 1: User Query Submission**
```
Input: "Design a trial for R/R TCL"
↓
Streamlit captures query + checks session state
```

---

#### **Step 2: Session Memory Check**

**What happens**: Check if there's previous conversation history
- **If yes**: Load the last 5 messages to maintain context
- **If no**: Start a fresh conversation

**Purpose**: Enables follow-up questions like "What about sample size?"

---

#### **Step 3: Claude + System Prompt**
```python
# System prompt includes:
- Role definition
- Disease hierarchy (Appendix A)
- Trial design rules (Appendix B)
- Anti-hallucination rules
- Output format template

# Claude receives:
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": conversation_history + new_query}
]
```

**Claude's Internal Process**:
1. Parse user intent (disease, trial phase, endpoint preference)
2. Construct search query for connector
3. Call `search_trials()` tool with filters
4. Receive trial data (NCT IDs, protocols, outcomes)
5. Apply guardrails from system prompt
6. Generate recommendations

---

#### **Step 4: Clinical Trials Connector**

**What happens**: Claude uses its native web search tool to search ClinicalTrials.gov and retrieve trial data (NCT IDs, eligibility criteria, endpoints, sample sizes, outcomes).

*See Appendix C for web search details.*

---

#### **Step 5: Validation with Guardrails**

**Validation Checks**:

1. **Results found?**
   - If no trials found → Trigger orphan drug fallback
   - Search related diseases (AITL, ALCL-ALK-)

2. **NCT IDs valid?**
   - Check each cited trial ID follows format: NCT + 8 digits
   - Flag any invalid IDs

3. **Disease match?**
   - Verify response mentions TCL or lymphoma
   - Flag if disease mismatch detected

4. **Realistic values?**
   - ORR must be between 0-100%
   - Sample size must be reasonable (not >10,000)
   - Flag unrealistic numbers

5. **Apply trial design rules**
   - Check against Appendix B:
   - Prior lines: 1 vs 2+
   - Comparator: Single-agent salvage chemo, Single-agent novel, BBv, or Investigator's choice
   - Endpoints: ORR/CRR/PFS valid for R/R TCL

**Guardrail Actions**:
- **Pass** → Proceed to recommendations
- **Fail (no results)** → Orphan drug fallback
- **Fail (invalid data)** → Error handler

---

#### **Step 6: Orphan Drug Fallback**
```
If no TCL trials found:
1. Search disease hierarchy (Appendix A)
2. Try related diseases:
   - AITL (most common nodal TCL)
   - ALCL-ALK- (CD30+)
   - TFH-TCL (PI3K responsive)
3. Notify user: "No exact TCL trials. Showing related: AITL"
```

---

#### **Step 7: Generate Recommendations**

**What happens**: Claude synthesizes the trial data and generates a structured recommendation including inclusion/exclusion criteria, primary endpoint with rationale, sample size with power basis, comparator selection, and NCT citations for evidence.

*See User Input/Output Flow section for example output format.*

---

#### **Step 8: Error Handling**
```
Error Types:
1. API timeout → Retry 3x (1s, 2s, 4s backoff)
2. Rate limit → Wait + auto-retry
3. Connector down → Show cached example
4. Invalid response → Log + retry option
5. No results after fallback → "Unable to find trials"
```

---

#### **Step 9: Streamlit Display**

**What happens**:
- Display the response in the chat interface
- Add the response to session history for context
- User sees formatted recommendations with NCT citations

---

#### **Step 10: Follow-up Loop**
```
User can ask:
- "What about safety endpoints?"
- "Compare to BBv trial"
- "Show me AITL-specific trials"

→ Loop back to Step 1 with full context
```

---

### Key Architecture Principles

| Principle | Implementation |
|-----------|----------------|
| **Simplicity** | Claude handles all logic; no custom parsers |
| **Guardrails** | Embedded in system prompt + validation step |
| **Conversational** | Session memory enables follow-up questions |
| **Evidence-based** | All recommendations cite NCT IDs |
| **Fallback logic** | Orphan drug search for rare diseases |
| **Error resilience** | Retry logic + graceful degradation |

---

### Technology Stack

| Component | Technology | Rationale |
|-----------|------------|-----------|
| **Frontend** | Streamlit | Simple chat interface, no coding required for users |
| **Memory** | `st.session_state` | Maintains conversation history across turns |
| **AI Engine** | Claude 3.5 Sonnet | Best-in-class medical text understanding |
| **Data Access** | Native Web Search | Claude's built-in web search tool (`web_search_20250305`) |
| **Sources** | ClinicalTrials.gov + PubMed | Real-time search of clinical trial registries and literature |
| **Deployment** | Docker on HuggingFace Spaces | Free hosting, web-accessible |

---

### Processing Pipeline

```mermaid
sequenceDiagram
    autonumber
    participant User
    participant App as Streamlit/Orchestrator
    participant AI as Claude AI
    participant Data as ClinicalTrials.gov

    User->>App: Enter disease "TCL"
    App->>AI: Process query
    AI->>Data: search_trials(TCL)
    Data-->>AI: Trial list (NCT IDs)
    AI->>Data: Get detailed protocols
    Data-->>AI: Protocol documents
    AI->>AI: Analyze & Compare
    AI->>AI: Generate recommendations
    AI-->>App: Results
    App-->>User: Display report
```

---

### Project Structure

```
Clinical-Trial-Design-Assistant/
├── app.py                  # Streamlit main entry + Claude integration + system prompt
├── requirements.txt        # Python dependencies
├── Dockerfile              # Container configuration
├── README.md               # Documentation
├── CHANGELOG.md            # Version history
└── .gitignore              # Git ignore file
```

**Note**: Claude handles all parsing, extraction, and analysis via web search. Single-file architecture for simplicity.

---

### System Prompt Template

```python
SYSTEM_PROMPT = """
You are a Clinical Trial Design Assistant specializing in R/R TCL.

ROLE:
- Help researchers design superiority trials
- Provide evidence-based recommendations from ClinicalTrials.gov
- Cite NCT IDs for all claims

RULES (NEVER VIOLATE):
1. NEVER invent trial data - only cite real NCT IDs
2. If no trials found, say "No matching trials found" and suggest related searches
3. Use orphan drug fallback for rare diseases with <5 results
4. Apply disease hierarchy (AITL > ALCL-ALK- > PTCL-NOS)
5. Validate against R/R TCL framework for all recommendations
6. If prior treatment lines not specified, ASK user before providing recommendations
7. If regulatory region (FDA/EMA) not specified, ASK user for target submission region

SCOPE:
- Focus: Nodal and extranodal TCL subtypes
- Out of scope: Cutaneous T-cell lymphomas (Mycosis Fungoides, Sézary Syndrome, Primary Cutaneous ALCL, etc.) - redirect to CTCL-specific resources

DISEASE HIERARCHY:
- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- ALCL-ALK-: CD30+; responds to brentuximab vedotin
- PTCL-NOS: Heterogeneous; worst R/R outcomes
- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- Analyze separately: ALCL-ALK+ (best prognosis)
- Analyze separately: NKTCL/EATL/MEITL (rare, distinct biology)

COMPARATOR CATEGORIES:
- Single-agent salvage chemo (GDP/DHAP/ICE)
- Single-agent novel (pralatrexate/romidepsin)
- BBv (brentuximab + bendamustine)
- Investigator's choice

OUTPUT FORMAT:
- Recommended inclusion/exclusion criteria
- Primary endpoint with justification
- Sample size with power calculation basis
- Comparator with rationale
- NCT citations for evidence
"""
```

---

### Error Handling

| Error | Handling |
|-------|----------|
| **API timeout** | Retry 3x with exponential backoff (1s, 2s, 4s) |
| **Rate limit** | Show "Please wait" message, auto-retry after delay |
| **Connector unavailable** | Display cached example response + "Try again later" |
| **No results** | Trigger orphan drug fallback → search related diseases |
| **Invalid response** | Log error, show "Unable to process" + retry option |

---

### Response Validation

Before displaying results, validate:
- [ ] All NCT IDs cited exist (format check: NCT + 8 digits)
- [ ] Disease in response matches user query
- [ ] ORR/endpoint values are within realistic ranges
- [ ] No hallucinated drug names (cross-check against known TCL treatments)

---

### User Input/Output Flow

**Input**:
```
User enters: "TCL" (disease name)
Optional: Drug of interest, trial type filter
```

**Output**:

**1. Trial Discovery Report**
- List of relevant trials (NCT IDs, sponsors, status)
- Trial outcomes summary

**2. Feature Analysis**
- Common inclusion/exclusion patterns in successful trials
- Endpoint selection trends
- Sample size ranges by trial phase
- Comparator arm choices

**3. Recommendations**
```
┌─────────────────────────────────────────────────────────────┐
│ RECOMMENDED TRIAL DESIGN PARAMETERS FOR TCL                │
├─────────────────────────────────────────────────────────────┤
│ Inclusion Criteria:                                         │
│   ✓ Relapsed/refractory TCL (≥1 prior therapy)            │
│   ✓ ECOG PS 0-2                                            │
│   ✓ Measurable disease per Lugano criteria                 │
│                                                             │
│ Exclusion Criteria:                                         │
│   ✗ Prior allogeneic transplant                            │
│   ✗ Active CNS involvement                                 │
│                                                             │
│ Primary Endpoint: ORR (recommended based on precedent)      │
│ Sample Size: 100-120 (based on similar trials)              │
│ Comparator: Physician's choice (pralatrexate, belinostat,   │
│             romidepsin, or gemcitabine-based)               │
│                                                             │
│ Evidence: Based on NCT01482962, NCT00901147, NCT01280526    │
└─────────────────────────────────────────────────────────────┘
```

---

## 3. Implementation Phases

### Phase 1: Setup & Configuration
- [x] Clinical Trials connector verified and available
- [ ] Obtain Anthropic API key
- [ ] Set up project structure (Streamlit + Python 3.11)
- [ ] Configure Docker environment

### Phase 2: Core Development
- [ ] Implement conversation memory with `st.session_state`
- [ ] Implement anti-hallucination system prompt
- [ ] Build orphan drug fallback logic
- [ ] Develop trial data extraction pipeline
- [ ] Add error handling for API failures/timeouts

### Phase 3: Testing & Refinement
- [ ] Test with TCL/Pralatrexate/Belinostat cases
- [ ] Validate anti-hallucination rules
- [ ] Verify statistical recommendations
- [ ] Clinical researcher user testing

### Phase 4: Deployment
- [ ] Deploy to HuggingFace Spaces
- [ ] User documentation
- [ ] Handoff to clinical team

---

## 4. Post-MVP Enhancements

### 4.1 Competitive Landscape Tool

| Aspect | Details |
|--------|---------|
| **Problem** | Need to compare a candidate drug against competitors in efficacy/safety |
| **Solution** | Drug comparison feature with structured competitive analysis |
| **Requested By** | Vittoria |

**Implementation Tasks**:
- [ ] Add drug comparison feature
- [ ] Extract efficacy metrics (ORR, CRR, PFS, OS)
- [ ] Extract safety profiles (AEs, SAEs, discontinuation rates)
- [ ] Generate comparative analysis report with visualizations

---

### 4.2 Hybrid Retrieval Architecture

| Aspect | Details |
|--------|---------|
| **Problem** | Web search returns only 5-10 results; PTCL has 1,000+ NCT IDs |
| **Solution** | Combine ClinicalTrials.gov API + Web Search for complete coverage |

**Architecture**:
```
User Query
     │
     ├──▶ ClinicalTrials.gov API (complete, filtered, reproducible)
     │
     └──▶ Claude Web Search (news, publications, context)
             │
             ▼
      Claude Synthesizes from BOTH
```

**Implementation Tasks**:
- [ ] Integrate ClinicalTrials.gov API v2
- [ ] Add query filters (phase, status, condition)
- [ ] Implement result pagination
- [ ] Merge API results with web search context
- [ ] Add audit logging for retrieved NCT IDs

**Benefits**:
| MVP | Hybrid |
|-----|--------|
| ~10 trials | All matching trials |
| Web-ranked | Clinically-filtered |
| Non-reproducible | Fully auditable |

---

## 5. Next Steps

### Immediate Actions
1. Obtain Anthropic API key
2. Set up project structure
3. Begin Phase 1 development

---

## Appendix A: Disease Hierarchy

```
TCL (Parent)
├── Nodal: AITL, ALCL-ALK+/-, PTCL-NOS, TFH-TCL
├── Extranodal: NKTCL, EATL, MEITL
└── Cutaneous (CTCL): Mycosis Fungoides, Sézary Syndrome, Primary Cutaneous ALCL, Lymphomatoid Papulosis, Subcutaneous Panniculitis-like TCL, Primary Cutaneous Gamma-Delta TCL
```

**Subtype-Specific Notes**:
- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- ALCL-ALK-: CD30+; responds to brentuximab vedotin
- ALCL-ALK+: Best prognosis (analyze separately if needed)
- PTCL-NOS: Heterogeneous; worst R/R outcomes
- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- NKTCL/EATL/MEITL: Rare; distinct biology

**Exclusions**: Solid tumors mimicking TCL

---

## Appendix B: R/R TCL Trial Design Rules

### Patient Population
- Prior lines: 1 vs 2+ (changes prognosis significantly)
- Refractory definition: PD within 1 month of treatment end
- Relapsed vs primary refractory: Separate analysis (OS 1.97 vs 0.89 years)
- Transplant eligibility: Age, comorbidities, performance status, organ function

### Transplant Strategy
- Transplant-eligible: Goal = salvage → bridge to allo-SCT (curative intent)
- Transplant-ineligible: Salvage is potentially palliative
- Conversion endpoint: % converting from ineligible → eligible

### Comparator Categories
- Single-agent salvage chemo (GDP/DHAP/ICE)
- Single-agent novel (pralatrexate/romidepsin)
- BBv (brentuximab + bendamustine)
- Investigator's choice

### Endpoints
- **Primary**: ORR (rapid), CRR (stronger signal), PFS (durability), TFS (transplant-ineligible), OS
- **Secondary**: DoR, TTR, transplant conversion, post-transplant outcomes, TTNT
- ORR threshold: 50% or higher vs 30% historical = superiority
- ICR preferred for regulatory; investigator-assessed OK for exploratory

### Biomarkers
- TET2, DNMT3A, RHOA: These biomarkers predict response to HDAC inhibitors (common in AITL/TFH-TCL)
- TP53: Poor prognosis
- KMT2D: AITL-enriched; epigenetic modifier sensitivity
- CD30: Expression level correlates with targeted therapy response
- Early PET-CT (2-4 cycles): Metabolic response, identify progressors

### Safety Considerations
- Acceptable toxicity for frail population
- Dose modifications for elderly/comorbid
- Cumulative organ toxicity monitoring (cardiac/renal/hepatic)
- Grade 3/4 cytopenia management
- Prophylactic antimicrobials (antibiotics/antivirals/antifungals)

### Regulatory Pathways
- Accelerated approval: ORR primary + clinical benefit demonstration
- Traditional approval: PFS/OS primary
- Breakthrough therapy designation for orphan drugs
- Confirmatory trial typically required post-accelerated approval

### Sample Size
- Historical controls: Use established benchmarks for each comparator category
- Power for: 50% or higher vs 30% (20% or greater difference)
- Dropout: Expect 5-10% screen failures in R/R population
- Consider subtype-stratified power (AITL vs ALCL-ALK- separately)

---

## Appendix C: Web Search Details

**Tool Used**: `web_search_20250305` (Claude's native web search)

**How It Works**:
- Claude autonomously constructs search queries based on user intent
- Searches ClinicalTrials.gov, PubMed, and other medical sources
- Returns structured results with URLs and content
- Content is encrypted (only Claude can read it internally)

**Data Extracted**:
- Trial metadata (NCT ID, sponsor, phase, status)
- Eligibility criteria
- Endpoints (primary/secondary)
- Sample sizes
- Published outcomes (ORR, PFS, OS)

---

**Document Version**: 1.0  
**Status**: 🔄 In Progress