Clinical-Trial-Design-Assistant / implementation_plan.md
Hesham Gibriel
v2.0.0: Citations/References panel + Permanent SQLite chat history
b2f6740

Clinical Trial Design Assistant

Implementation Plan v1.0


Executive Summary

The Clinical Trial Design Assistant is an AI-powered chatbot that helps clinical researchers design superior clinical trials by analyzing successful examples from ClinicalTrials.gov. It provides evidence-based recommendations on trial parameters and regulatory strategiesβ€”specifically designed for rare diseases and orphan drug development.

Primary Users: Clinical researchers evaluating drug partnerships and designing superiority trials.


1. Problem & Solution Overview

🚨 Current Challenges βœ… Our Solution
Information Overload
500,000+ trials make research slow.
Evidence-Based Recommendations
Automated retrieval of proven trial designs.
Rare Disease Complexity
Limited TCL data for successful patterns.
Orphan Drug Expertise
Fallback logic for similar rare diseases.
Regulatory Uncertainty
Unclear reasons for past FDA/EMA outcomes.
Regulatory Intelligence
Extracts specific objections and success factors.
Statistical Rigor
Complex specialized expertise required.
Statistical Guidance
Automated sample size and power calculations.

How It Works (Simple Example):

Clinician types: "TCL"

Chatbot returns:
β†’ Recommended trial design parameters
β†’ Sample size: 100-120 patients
β†’ Primary endpoint: ORR
β†’ Comparator: Physician's choice
β†’ Evidence: Based on NCT01482962, NCT00901147

2. Architecture Overview

High-Level Architecture

System Flow: User β†’ AI Agent β†’ Web Search β†’ ClinicalTrials.gov/PubMed β†’ AI Analysis β†’ Recommendations

flowchart LR
    classDef large font-size:22px,stroke-width:2px;

    A[User Question]:::large --> B[Claude<br/>with Web Search]:::large
    B <--> C[Real-time<br/>Web Search]:::large
    C <--> D[ClinicalTrials.gov<br/>+ PubMed]:::large
    B --> E[Structured<br/>Recommendations]:::large

High-Level Processing Flow:

  1. User Query β†’ Natural language question about trial design.
  2. Session Memory β†’ Retrieve conversation history from session_state.
  3. Claude AI β†’ Interprets intent with full context and constructs search strategy.
  4. Web Search β†’ Claude uses native web search to query ClinicalTrials.gov and medical literature.
  5. Analysis β†’ AI analyzes retrieved data using training rules.
  6. Output β†’ Formatted report with cited evidence and design parameters.
  7. Memory Update β†’ Store response in session for follow-up questions.

Detailed Architecture

flowchart TD
    A[User Query] -->|1. Submit question| B[Session Memory Check]
    B -->|2. Load context| C[Claude + System Prompt]
    C -->|3. Web search queries| D[ClinicalTrials.gov<br/>+ PubMed]
    D -->|4. Return search results| C
    C -->|5. Analyze with guardrails| E{Validation}
    E -->|Pass| F[Generate Recommendations]
    E -->|Fail: No results| G[Orphan Drug Fallback]
    E -->|Fail: Invalid data| H[Error Handler]
    G -->|Retry with related diseases| D
    H -->|Retry or show error| I[User Notification]
    F -->|6. Format output| J[Streamlit Display]
    J -->|7. Store in session| K[Session Memory]
    K -->|8. Follow-up question?| A
    K -->|Done| L[End]
    
    style C fill:#e1f5ff
    style E fill:#fff4e1
    style F fill:#e8f5e9
    style J fill:#f3e5f5

Step-by-Step Processing Details

Step 1: User Query Submission

Input: "Design a trial for R/R TCL"
↓
Streamlit captures query + checks session state

Step 2: Session Memory Check

What happens: Check if there's previous conversation history

  • If yes: Load the last 5 messages to maintain context
  • If no: Start a fresh conversation

Purpose: Enables follow-up questions like "What about sample size?"


Step 3: Claude + System Prompt

# System prompt includes:
- Role definition
- Disease hierarchy (Appendix A)
- Trial design rules (Appendix B)
- Anti-hallucination rules
- Output format template

# Claude receives:
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": conversation_history + new_query}
]

Claude's Internal Process:

  1. Parse user intent (disease, trial phase, endpoint preference)
  2. Construct search query for connector
  3. Call search_trials() tool with filters
  4. Receive trial data (NCT IDs, protocols, outcomes)
  5. Apply guardrails from system prompt
  6. Generate recommendations

Step 4: Clinical Trials Connector

What happens: Claude uses its native web search tool to search ClinicalTrials.gov and retrieve trial data (NCT IDs, eligibility criteria, endpoints, sample sizes, outcomes).

See Appendix C for web search details.


Step 5: Validation with Guardrails

Validation Checks:

  1. Results found?

    • If no trials found β†’ Trigger orphan drug fallback
    • Search related diseases (AITL, ALCL-ALK-)
  2. NCT IDs valid?

    • Check each cited trial ID follows format: NCT + 8 digits
    • Flag any invalid IDs
  3. Disease match?

    • Verify response mentions TCL or lymphoma
    • Flag if disease mismatch detected
  4. Realistic values?

    • ORR must be between 0-100%
    • Sample size must be reasonable (not >10,000)
    • Flag unrealistic numbers
  5. Apply trial design rules

    • Check against Appendix B:
    • Prior lines: 1 vs 2+
    • Comparator: Single-agent salvage chemo, Single-agent novel, BBv, or Investigator's choice
    • Endpoints: ORR/CRR/PFS valid for R/R TCL

Guardrail Actions:

  • Pass β†’ Proceed to recommendations
  • Fail (no results) β†’ Orphan drug fallback
  • Fail (invalid data) β†’ Error handler

Step 6: Orphan Drug Fallback

If no TCL trials found:
1. Search disease hierarchy (Appendix A)
2. Try related diseases:
   - AITL (most common nodal TCL)
   - ALCL-ALK- (CD30+)
   - TFH-TCL (PI3K responsive)
3. Notify user: "No exact TCL trials. Showing related: AITL"

Step 7: Generate Recommendations

What happens: Claude synthesizes the trial data and generates a structured recommendation including inclusion/exclusion criteria, primary endpoint with rationale, sample size with power basis, comparator selection, and NCT citations for evidence.

See User Input/Output Flow section for example output format.


Step 8: Error Handling

Error Types:
1. API timeout β†’ Retry 3x (1s, 2s, 4s backoff)
2. Rate limit β†’ Wait + auto-retry
3. Connector down β†’ Show cached example
4. Invalid response β†’ Log + retry option
5. No results after fallback β†’ "Unable to find trials"

Step 9: Streamlit Display

What happens:

  • Display the response in the chat interface
  • Add the response to session history for context
  • User sees formatted recommendations with NCT citations

Step 10: Follow-up Loop

User can ask:
- "What about safety endpoints?"
- "Compare to BBv trial"
- "Show me AITL-specific trials"

β†’ Loop back to Step 1 with full context

Key Architecture Principles

Principle Implementation
Simplicity Claude handles all logic; no custom parsers
Guardrails Embedded in system prompt + validation step
Conversational Session memory enables follow-up questions
Evidence-based All recommendations cite NCT IDs
Fallback logic Orphan drug search for rare diseases
Error resilience Retry logic + graceful degradation

Technology Stack

Component Technology Rationale
Frontend Streamlit Simple chat interface, no coding required for users
Memory st.session_state Maintains conversation history across turns
AI Engine Claude 3.5 Sonnet Best-in-class medical text understanding
Data Access Native Web Search Claude's built-in web search tool (web_search_20250305)
Sources ClinicalTrials.gov + PubMed Real-time search of clinical trial registries and literature
Deployment Docker on HuggingFace Spaces Free hosting, web-accessible

Processing Pipeline

sequenceDiagram
    autonumber
    participant User
    participant App as Streamlit/Orchestrator
    participant AI as Claude AI
    participant Data as ClinicalTrials.gov

    User->>App: Enter disease "TCL"
    App->>AI: Process query
    AI->>Data: search_trials(TCL)
    Data-->>AI: Trial list (NCT IDs)
    AI->>Data: Get detailed protocols
    Data-->>AI: Protocol documents
    AI->>AI: Analyze & Compare
    AI->>AI: Generate recommendations
    AI-->>App: Results
    App-->>User: Display report

Project Structure

Clinical-Trial-Design-Assistant/
β”œβ”€β”€ app.py                  # Streamlit main entry + Claude integration + system prompt
β”œβ”€β”€ requirements.txt        # Python dependencies
β”œβ”€β”€ Dockerfile              # Container configuration
β”œβ”€β”€ README.md               # Documentation
β”œβ”€β”€ CHANGELOG.md            # Version history
└── .gitignore              # Git ignore file

Note: Claude handles all parsing, extraction, and analysis via web search. Single-file architecture for simplicity.


System Prompt Template

SYSTEM_PROMPT = """
You are a Clinical Trial Design Assistant specializing in R/R TCL.

ROLE:
- Help researchers design superiority trials
- Provide evidence-based recommendations from ClinicalTrials.gov
- Cite NCT IDs for all claims

RULES (NEVER VIOLATE):
1. NEVER invent trial data - only cite real NCT IDs
2. If no trials found, say "No matching trials found" and suggest related searches
3. Use orphan drug fallback for rare diseases with <5 results
4. Apply disease hierarchy (AITL > ALCL-ALK- > PTCL-NOS)
5. Validate against R/R TCL framework for all recommendations
6. If prior treatment lines not specified, ASK user before providing recommendations
7. If regulatory region (FDA/EMA) not specified, ASK user for target submission region

SCOPE:
- Focus: Nodal and extranodal TCL subtypes
- Out of scope: Cutaneous T-cell lymphomas (Mycosis Fungoides, SΓ©zary Syndrome, Primary Cutaneous ALCL, etc.) - redirect to CTCL-specific resources

DISEASE HIERARCHY:
- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- ALCL-ALK-: CD30+; responds to brentuximab vedotin
- PTCL-NOS: Heterogeneous; worst R/R outcomes
- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- Analyze separately: ALCL-ALK+ (best prognosis)
- Analyze separately: NKTCL/EATL/MEITL (rare, distinct biology)

COMPARATOR CATEGORIES:
- Single-agent salvage chemo (GDP/DHAP/ICE)
- Single-agent novel (pralatrexate/romidepsin)
- BBv (brentuximab + bendamustine)
- Investigator's choice

OUTPUT FORMAT:
- Recommended inclusion/exclusion criteria
- Primary endpoint with justification
- Sample size with power calculation basis
- Comparator with rationale
- NCT citations for evidence
"""

Error Handling

Error Handling
API timeout Retry 3x with exponential backoff (1s, 2s, 4s)
Rate limit Show "Please wait" message, auto-retry after delay
Connector unavailable Display cached example response + "Try again later"
No results Trigger orphan drug fallback β†’ search related diseases
Invalid response Log error, show "Unable to process" + retry option

Response Validation

Before displaying results, validate:

  • All NCT IDs cited exist (format check: NCT + 8 digits)
  • Disease in response matches user query
  • ORR/endpoint values are within realistic ranges
  • No hallucinated drug names (cross-check against known TCL treatments)

User Input/Output Flow

Input:

User enters: "TCL" (disease name)
Optional: Drug of interest, trial type filter

Output:

1. Trial Discovery Report

  • List of relevant trials (NCT IDs, sponsors, status)
  • Trial outcomes summary

2. Feature Analysis

  • Common inclusion/exclusion patterns in successful trials
  • Endpoint selection trends
  • Sample size ranges by trial phase
  • Comparator arm choices

3. Recommendations

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ RECOMMENDED TRIAL DESIGN PARAMETERS FOR TCL                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Inclusion Criteria:                                         β”‚
β”‚   βœ“ Relapsed/refractory TCL (β‰₯1 prior therapy)            β”‚
β”‚   βœ“ ECOG PS 0-2                                            β”‚
β”‚   βœ“ Measurable disease per Lugano criteria                 β”‚
β”‚                                                             β”‚
β”‚ Exclusion Criteria:                                         β”‚
β”‚   βœ— Prior allogeneic transplant                            β”‚
β”‚   βœ— Active CNS involvement                                 β”‚
β”‚                                                             β”‚
β”‚ Primary Endpoint: ORR (recommended based on precedent)      β”‚
β”‚ Sample Size: 100-120 (based on similar trials)              β”‚
β”‚ Comparator: Physician's choice (pralatrexate, belinostat,   β”‚
β”‚             romidepsin, or gemcitabine-based)               β”‚
β”‚                                                             β”‚
β”‚ Evidence: Based on NCT01482962, NCT00901147, NCT01280526    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Implementation Phases

Phase 1: Setup & Configuration

  • Clinical Trials connector verified and available
  • Obtain Anthropic API key
  • Set up project structure (Streamlit + Python 3.11)
  • Configure Docker environment

Phase 2: Core Development

  • Implement conversation memory with st.session_state
  • Implement anti-hallucination system prompt
  • Build orphan drug fallback logic
  • Develop trial data extraction pipeline
  • Add error handling for API failures/timeouts

Phase 3: Testing & Refinement

  • Test with TCL/Pralatrexate/Belinostat cases
  • Validate anti-hallucination rules
  • Verify statistical recommendations
  • Clinical researcher user testing

Phase 4: Deployment

  • Deploy to HuggingFace Spaces
  • User documentation
  • Handoff to clinical team

4. Post-MVP Enhancements

4.1 Competitive Landscape Tool

Aspect Details
Problem Need to compare a candidate drug against competitors in efficacy/safety
Solution Drug comparison feature with structured competitive analysis
Requested By Vittoria

Implementation Tasks:

  • Add drug comparison feature
  • Extract efficacy metrics (ORR, CRR, PFS, OS)
  • Extract safety profiles (AEs, SAEs, discontinuation rates)
  • Generate comparative analysis report with visualizations

4.2 Hybrid Retrieval Architecture

Aspect Details
Problem Web search returns only 5-10 results; PTCL has 1,000+ NCT IDs
Solution Combine ClinicalTrials.gov API + Web Search for complete coverage

Architecture:

User Query
     β”‚
     β”œβ”€β”€β–Ά ClinicalTrials.gov API (complete, filtered, reproducible)
     β”‚
     └──▢ Claude Web Search (news, publications, context)
             β”‚
             β–Ό
      Claude Synthesizes from BOTH

Implementation Tasks:

  • Integrate ClinicalTrials.gov API v2
  • Add query filters (phase, status, condition)
  • Implement result pagination
  • Merge API results with web search context
  • Add audit logging for retrieved NCT IDs

Benefits:

MVP Hybrid
~10 trials All matching trials
Web-ranked Clinically-filtered
Non-reproducible Fully auditable

5. Next Steps

Immediate Actions

  1. Obtain Anthropic API key
  2. Set up project structure
  3. Begin Phase 1 development

Appendix A: Disease Hierarchy

TCL (Parent)
β”œβ”€β”€ Nodal: AITL, ALCL-ALK+/-, PTCL-NOS, TFH-TCL
β”œβ”€β”€ Extranodal: NKTCL, EATL, MEITL
└── Cutaneous (CTCL): Mycosis Fungoides, SΓ©zary Syndrome, Primary Cutaneous ALCL, Lymphomatoid Papulosis, Subcutaneous Panniculitis-like TCL, Primary Cutaneous Gamma-Delta TCL

Subtype-Specific Notes:

  • AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
  • ALCL-ALK-: CD30+; responds to brentuximab vedotin
  • ALCL-ALK+: Best prognosis (analyze separately if needed)
  • PTCL-NOS: Heterogeneous; worst R/R outcomes
  • TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
  • NKTCL/EATL/MEITL: Rare; distinct biology

Exclusions: Solid tumors mimicking TCL


Appendix B: R/R TCL Trial Design Rules

Patient Population

  • Prior lines: 1 vs 2+ (changes prognosis significantly)
  • Refractory definition: PD within 1 month of treatment end
  • Relapsed vs primary refractory: Separate analysis (OS 1.97 vs 0.89 years)
  • Transplant eligibility: Age, comorbidities, performance status, organ function

Transplant Strategy

  • Transplant-eligible: Goal = salvage β†’ bridge to allo-SCT (curative intent)
  • Transplant-ineligible: Salvage is potentially palliative
  • Conversion endpoint: % converting from ineligible β†’ eligible

Comparator Categories

  • Single-agent salvage chemo (GDP/DHAP/ICE)
  • Single-agent novel (pralatrexate/romidepsin)
  • BBv (brentuximab + bendamustine)
  • Investigator's choice

Endpoints

  • Primary: ORR (rapid), CRR (stronger signal), PFS (durability), TFS (transplant-ineligible), OS
  • Secondary: DoR, TTR, transplant conversion, post-transplant outcomes, TTNT
  • ORR threshold: 50% or higher vs 30% historical = superiority
  • ICR preferred for regulatory; investigator-assessed OK for exploratory

Biomarkers

  • TET2, DNMT3A, RHOA: These biomarkers predict response to HDAC inhibitors (common in AITL/TFH-TCL)
  • TP53: Poor prognosis
  • KMT2D: AITL-enriched; epigenetic modifier sensitivity
  • CD30: Expression level correlates with targeted therapy response
  • Early PET-CT (2-4 cycles): Metabolic response, identify progressors

Safety Considerations

  • Acceptable toxicity for frail population
  • Dose modifications for elderly/comorbid
  • Cumulative organ toxicity monitoring (cardiac/renal/hepatic)
  • Grade 3/4 cytopenia management
  • Prophylactic antimicrobials (antibiotics/antivirals/antifungals)

Regulatory Pathways

  • Accelerated approval: ORR primary + clinical benefit demonstration
  • Traditional approval: PFS/OS primary
  • Breakthrough therapy designation for orphan drugs
  • Confirmatory trial typically required post-accelerated approval

Sample Size

  • Historical controls: Use established benchmarks for each comparator category
  • Power for: 50% or higher vs 30% (20% or greater difference)
  • Dropout: Expect 5-10% screen failures in R/R population
  • Consider subtype-stratified power (AITL vs ALCL-ALK- separately)

Appendix C: Web Search Details

Tool Used: web_search_20250305 (Claude's native web search)

How It Works:

  • Claude autonomously constructs search queries based on user intent
  • Searches ClinicalTrials.gov, PubMed, and other medical sources
  • Returns structured results with URLs and content
  • Content is encrypted (only Claude can read it internally)

Data Extracted:

  • Trial metadata (NCT ID, sponsor, phase, status)
  • Eligibility criteria
  • Endpoints (primary/secondary)
  • Sample sizes
  • Published outcomes (ORR, PFS, OS)

Document Version: 1.0
Status: πŸ”„ In Progress