Spaces:

Ideogen
/

Clinical-Trial-Design-Assistant

Sleeping

App Files Files Community

Clinical-Trial-Design-Assistant / implementation_plan.md

Hesham Gibriel

v2.0.0: Citations/References panel + Permanent SQLite chat history

b2f6740 19 days ago

preview code

raw

history blame contribute delete

20.3 kB

Clinical Trial Design Assistant

Implementation Plan v1.0

Executive Summary

The Clinical Trial Design Assistant is an AI-powered chatbot that helps clinical researchers design superior clinical trials by analyzing successful examples from ClinicalTrials.gov. It provides evidence-based recommendations on trial parameters and regulatory strategies—specifically designed for rare diseases and orphan drug development.

Primary Users: Clinical researchers evaluating drug partnerships and designing superiority trials.

1. Problem & Solution Overview

🚨 Current Challenges	✅ Our Solution
Information Overload 500,000+ trials make research slow.	Evidence-Based Recommendations Automated retrieval of proven trial designs.
Rare Disease Complexity Limited TCL data for successful patterns.	Orphan Drug Expertise Fallback logic for similar rare diseases.
Regulatory Uncertainty Unclear reasons for past FDA/EMA outcomes.	Regulatory Intelligence Extracts specific objections and success factors.
Statistical Rigor Complex specialized expertise required.	Statistical Guidance Automated sample size and power calculations.

How It Works (Simple Example):

Clinician types: "TCL"

Chatbot returns:
→ Recommended trial design parameters
→ Sample size: 100-120 patients
→ Primary endpoint: ORR
→ Comparator: Physician's choice
→ Evidence: Based on NCT01482962, NCT00901147

2. Architecture Overview

High-Level Architecture

System Flow: User → AI Agent → Web Search → ClinicalTrials.gov/PubMed → AI Analysis → Recommendations

flowchart LR
    classDef large font-size:22px,stroke-width:2px;

    A[User Question]:::large --> B[Claude<br/>with Web Search]:::large
    B <--> C[Real-time<br/>Web Search]:::large
    C <--> D[ClinicalTrials.gov<br/>+ PubMed]:::large
    B --> E[Structured<br/>Recommendations]:::large

High-Level Processing Flow:

User Query → Natural language question about trial design.
Session Memory → Retrieve conversation history from session_state.
Claude AI → Interprets intent with full context and constructs search strategy.
Web Search → Claude uses native web search to query ClinicalTrials.gov and medical literature.
Analysis → AI analyzes retrieved data using training rules.
Output → Formatted report with cited evidence and design parameters.
Memory Update → Store response in session for follow-up questions.

Detailed Architecture

flowchart TD
    A[User Query] -->|1. Submit question| B[Session Memory Check]
    B -->|2. Load context| C[Claude + System Prompt]
    C -->|3. Web search queries| D[ClinicalTrials.gov<br/>+ PubMed]
    D -->|4. Return search results| C
    C -->|5. Analyze with guardrails| E{Validation}
    E -->|Pass| F[Generate Recommendations]
    E -->|Fail: No results| G[Orphan Drug Fallback]
    E -->|Fail: Invalid data| H[Error Handler]
    G -->|Retry with related diseases| D
    H -->|Retry or show error| I[User Notification]
    F -->|6. Format output| J[Streamlit Display]
    J -->|7. Store in session| K[Session Memory]
    K -->|8. Follow-up question?| A
    K -->|Done| L[End]
    
    style C fill:#e1f5ff
    style E fill:#fff4e1
    style F fill:#e8f5e9
    style J fill:#f3e5f5

Step-by-Step Processing Details

Step 1: User Query Submission

Input: "Design a trial for R/R TCL"
↓
Streamlit captures query + checks session state

Step 2: Session Memory Check

What happens: Check if there's previous conversation history

If yes: Load the last 5 messages to maintain context
If no: Start a fresh conversation

Purpose: Enables follow-up questions like "What about sample size?"

Step 3: Claude + System Prompt

# System prompt includes:
- Role definition
- Disease hierarchy (Appendix A)
- Trial design rules (Appendix B)
- Anti-hallucination rules
- Output format template

# Claude receives:
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": conversation_history + new_query}
]

Claude's Internal Process:

Parse user intent (disease, trial phase, endpoint preference)
Construct search query for connector
Call search_trials() tool with filters
Receive trial data (NCT IDs, protocols, outcomes)
Apply guardrails from system prompt
Generate recommendations

Step 4: Clinical Trials Connector

What happens: Claude uses its native web search tool to search ClinicalTrials.gov and retrieve trial data (NCT IDs, eligibility criteria, endpoints, sample sizes, outcomes).

See Appendix C for web search details.

Step 5: Validation with Guardrails

Validation Checks:

Results found?
- If no trials found → Trigger orphan drug fallback
- Search related diseases (AITL, ALCL-ALK-)
NCT IDs valid?
- Check each cited trial ID follows format: NCT + 8 digits
- Flag any invalid IDs
Disease match?
- Verify response mentions TCL or lymphoma
- Flag if disease mismatch detected
Realistic values?
- ORR must be between 0-100%
- Sample size must be reasonable (not >10,000)
- Flag unrealistic numbers
Apply trial design rules
- Check against Appendix B:
- Prior lines: 1 vs 2+
- Comparator: Single-agent salvage chemo, Single-agent novel, BBv, or Investigator's choice
- Endpoints: ORR/CRR/PFS valid for R/R TCL

Guardrail Actions:

Pass → Proceed to recommendations
Fail (no results) → Orphan drug fallback
Fail (invalid data) → Error handler

Step 6: Orphan Drug Fallback

If no TCL trials found:
1. Search disease hierarchy (Appendix A)
2. Try related diseases:
   - AITL (most common nodal TCL)
   - ALCL-ALK- (CD30+)
   - TFH-TCL (PI3K responsive)
3. Notify user: "No exact TCL trials. Showing related: AITL"

Step 7: Generate Recommendations

What happens: Claude synthesizes the trial data and generates a structured recommendation including inclusion/exclusion criteria, primary endpoint with rationale, sample size with power basis, comparator selection, and NCT citations for evidence.

See User Input/Output Flow section for example output format.

Step 8: Error Handling

Error Types:
1. API timeout → Retry 3x (1s, 2s, 4s backoff)
2. Rate limit → Wait + auto-retry
3. Connector down → Show cached example
4. Invalid response → Log + retry option
5. No results after fallback → "Unable to find trials"

Step 9: Streamlit Display

What happens:

Display the response in the chat interface
Add the response to session history for context
User sees formatted recommendations with NCT citations

Step 10: Follow-up Loop

User can ask:
- "What about safety endpoints?"
- "Compare to BBv trial"
- "Show me AITL-specific trials"

→ Loop back to Step 1 with full context

Key Architecture Principles

Principle	Implementation
Simplicity	Claude handles all logic; no custom parsers
Guardrails	Embedded in system prompt + validation step
Conversational	Session memory enables follow-up questions
Evidence-based	All recommendations cite NCT IDs
Fallback logic	Orphan drug search for rare diseases
Error resilience	Retry logic + graceful degradation

Technology Stack

Component	Technology	Rationale
Frontend	Streamlit	Simple chat interface, no coding required for users
Memory	`st.session_state`	Maintains conversation history across turns
AI Engine	Claude 3.5 Sonnet	Best-in-class medical text understanding
Data Access	Native Web Search	Claude's built-in web search tool (`web_search_20250305`)
Sources	ClinicalTrials.gov + PubMed	Real-time search of clinical trial registries and literature
Deployment	Docker on HuggingFace Spaces	Free hosting, web-accessible

Processing Pipeline

sequenceDiagram
    autonumber
    participant User
    participant App as Streamlit/Orchestrator
    participant AI as Claude AI
    participant Data as ClinicalTrials.gov

    User->>App: Enter disease "TCL"
    App->>AI: Process query
    AI->>Data: search_trials(TCL)
    Data-->>AI: Trial list (NCT IDs)
    AI->>Data: Get detailed protocols
    Data-->>AI: Protocol documents
    AI->>AI: Analyze & Compare
    AI->>AI: Generate recommendations
    AI-->>App: Results
    App-->>User: Display report

Project Structure

Clinical-Trial-Design-Assistant/
├── app.py                  # Streamlit main entry + Claude integration + system prompt
├── requirements.txt        # Python dependencies
├── Dockerfile              # Container configuration
├── README.md               # Documentation
├── CHANGELOG.md            # Version history
└── .gitignore              # Git ignore file

Note: Claude handles all parsing, extraction, and analysis via web search. Single-file architecture for simplicity.

System Prompt Template

SYSTEM_PROMPT = """
You are a Clinical Trial Design Assistant specializing in R/R TCL.

ROLE:
- Help researchers design superiority trials
- Provide evidence-based recommendations from ClinicalTrials.gov
- Cite NCT IDs for all claims

RULES (NEVER VIOLATE):
1. NEVER invent trial data - only cite real NCT IDs
2. If no trials found, say "No matching trials found" and suggest related searches
3. Use orphan drug fallback for rare diseases with <5 results
4. Apply disease hierarchy (AITL > ALCL-ALK- > PTCL-NOS)
5. Validate against R/R TCL framework for all recommendations
6. If prior treatment lines not specified, ASK user before providing recommendations
7. If regulatory region (FDA/EMA) not specified, ASK user for target submission region

SCOPE:
- Focus: Nodal and extranodal TCL subtypes
- Out of scope: Cutaneous T-cell lymphomas (Mycosis Fungoides, Sézary Syndrome, Primary Cutaneous ALCL, etc.) - redirect to CTCL-specific resources

DISEASE HIERARCHY:
- AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- ALCL-ALK-: CD30+; responds to brentuximab vedotin
- PTCL-NOS: Heterogeneous; worst R/R outcomes
- TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
- Analyze separately: ALCL-ALK+ (best prognosis)
- Analyze separately: NKTCL/EATL/MEITL (rare, distinct biology)

COMPARATOR CATEGORIES:
- Single-agent salvage chemo (GDP/DHAP/ICE)
- Single-agent novel (pralatrexate/romidepsin)
- BBv (brentuximab + bendamustine)
- Investigator's choice

OUTPUT FORMAT:
- Recommended inclusion/exclusion criteria
- Primary endpoint with justification
- Sample size with power calculation basis
- Comparator with rationale
- NCT citations for evidence
"""

Error Handling

Error	Handling
API timeout	Retry 3x with exponential backoff (1s, 2s, 4s)
Rate limit	Show "Please wait" message, auto-retry after delay
Connector unavailable	Display cached example response + "Try again later"
No results	Trigger orphan drug fallback → search related diseases
Invalid response	Log error, show "Unable to process" + retry option

Response Validation

Before displaying results, validate:

All NCT IDs cited exist (format check: NCT + 8 digits)
Disease in response matches user query
ORR/endpoint values are within realistic ranges
No hallucinated drug names (cross-check against known TCL treatments)

User Input/Output Flow

Input:

User enters: "TCL" (disease name)
Optional: Drug of interest, trial type filter

Output:

1. Trial Discovery Report

List of relevant trials (NCT IDs, sponsors, status)
Trial outcomes summary

2. Feature Analysis

Common inclusion/exclusion patterns in successful trials
Endpoint selection trends
Sample size ranges by trial phase
Comparator arm choices

3. Recommendations

┌─────────────────────────────────────────────────────────────┐
│ RECOMMENDED TRIAL DESIGN PARAMETERS FOR TCL                │
├─────────────────────────────────────────────────────────────┤
│ Inclusion Criteria:                                         │
│   ✓ Relapsed/refractory TCL (≥1 prior therapy)            │
│   ✓ ECOG PS 0-2                                            │
│   ✓ Measurable disease per Lugano criteria                 │
│                                                             │
│ Exclusion Criteria:                                         │
│   ✗ Prior allogeneic transplant                            │
│   ✗ Active CNS involvement                                 │
│                                                             │
│ Primary Endpoint: ORR (recommended based on precedent)      │
│ Sample Size: 100-120 (based on similar trials)              │
│ Comparator: Physician's choice (pralatrexate, belinostat,   │
│             romidepsin, or gemcitabine-based)               │
│                                                             │
│ Evidence: Based on NCT01482962, NCT00901147, NCT01280526    │
└─────────────────────────────────────────────────────────────┘

3. Implementation Phases

Phase 1: Setup & Configuration

Clinical Trials connector verified and available
Obtain Anthropic API key
Set up project structure (Streamlit + Python 3.11)
Configure Docker environment

Phase 2: Core Development

Implement conversation memory with st.session_state
Implement anti-hallucination system prompt
Build orphan drug fallback logic
Develop trial data extraction pipeline
Add error handling for API failures/timeouts

Phase 3: Testing & Refinement

Test with TCL/Pralatrexate/Belinostat cases
Validate anti-hallucination rules
Verify statistical recommendations
Clinical researcher user testing

Phase 4: Deployment

Deploy to HuggingFace Spaces
User documentation
Handoff to clinical team

4. Post-MVP Enhancements

4.1 Competitive Landscape Tool

Aspect	Details
Problem	Need to compare a candidate drug against competitors in efficacy/safety
Solution	Drug comparison feature with structured competitive analysis
Requested By	Vittoria

Implementation Tasks:

Add drug comparison feature
Extract efficacy metrics (ORR, CRR, PFS, OS)
Extract safety profiles (AEs, SAEs, discontinuation rates)
Generate comparative analysis report with visualizations

4.2 Hybrid Retrieval Architecture

Aspect	Details
Problem	Web search returns only 5-10 results; PTCL has 1,000+ NCT IDs
Solution	Combine ClinicalTrials.gov API + Web Search for complete coverage

Architecture:

User Query
     │
     ├──▶ ClinicalTrials.gov API (complete, filtered, reproducible)
     │
     └──▶ Claude Web Search (news, publications, context)
             │
             ▼
      Claude Synthesizes from BOTH

Implementation Tasks:

Integrate ClinicalTrials.gov API v2
Add query filters (phase, status, condition)
Implement result pagination
Merge API results with web search context
Add audit logging for retrieved NCT IDs

Benefits:

MVP	Hybrid
~10 trials	All matching trials
Web-ranked	Clinically-filtered
Non-reproducible	Fully auditable

5. Next Steps

Immediate Actions

Obtain Anthropic API key
Set up project structure
Begin Phase 1 development

Appendix A: Disease Hierarchy

TCL (Parent)
├── Nodal: AITL, ALCL-ALK+/-, PTCL-NOS, TFH-TCL
├── Extranodal: NKTCL, EATL, MEITL
└── Cutaneous (CTCL): Mycosis Fungoides, Sézary Syndrome, Primary Cutaneous ALCL, Lymphomatoid Papulosis, Subcutaneous Panniculitis-like TCL, Primary Cutaneous Gamma-Delta TCL

Subtype-Specific Notes:

AITL: TET2/KMT2D mutations; responds to HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
ALCL-ALK-: CD30+; responds to brentuximab vedotin
ALCL-ALK+: Best prognosis (analyze separately if needed)
PTCL-NOS: Heterogeneous; worst R/R outcomes
TFH-TCL: Responds to duvelisib (PI3K/delta), HDAC inhibitors, EZH2 inhibitors, checkpoint inhibitors, and other targeted agents
NKTCL/EATL/MEITL: Rare; distinct biology

Exclusions: Solid tumors mimicking TCL

Appendix B: R/R TCL Trial Design Rules

Patient Population

Prior lines: 1 vs 2+ (changes prognosis significantly)
Refractory definition: PD within 1 month of treatment end
Relapsed vs primary refractory: Separate analysis (OS 1.97 vs 0.89 years)
Transplant eligibility: Age, comorbidities, performance status, organ function

Transplant Strategy

Transplant-eligible: Goal = salvage → bridge to allo-SCT (curative intent)
Transplant-ineligible: Salvage is potentially palliative
Conversion endpoint: % converting from ineligible → eligible

Comparator Categories

Single-agent salvage chemo (GDP/DHAP/ICE)
Single-agent novel (pralatrexate/romidepsin)
BBv (brentuximab + bendamustine)
Investigator's choice

Endpoints

Primary: ORR (rapid), CRR (stronger signal), PFS (durability), TFS (transplant-ineligible), OS
Secondary: DoR, TTR, transplant conversion, post-transplant outcomes, TTNT
ORR threshold: 50% or higher vs 30% historical = superiority
ICR preferred for regulatory; investigator-assessed OK for exploratory

Biomarkers

TET2, DNMT3A, RHOA: These biomarkers predict response to HDAC inhibitors (common in AITL/TFH-TCL)
TP53: Poor prognosis
KMT2D: AITL-enriched; epigenetic modifier sensitivity
CD30: Expression level correlates with targeted therapy response
Early PET-CT (2-4 cycles): Metabolic response, identify progressors

Safety Considerations

Acceptable toxicity for frail population
Dose modifications for elderly/comorbid
Cumulative organ toxicity monitoring (cardiac/renal/hepatic)
Grade 3/4 cytopenia management
Prophylactic antimicrobials (antibiotics/antivirals/antifungals)

Regulatory Pathways

Accelerated approval: ORR primary + clinical benefit demonstration
Traditional approval: PFS/OS primary
Breakthrough therapy designation for orphan drugs
Confirmatory trial typically required post-accelerated approval

Sample Size

Historical controls: Use established benchmarks for each comparator category
Power for: 50% or higher vs 30% (20% or greater difference)
Dropout: Expect 5-10% screen failures in R/R population
Consider subtype-stratified power (AITL vs ALCL-ALK- separately)

Appendix C: Web Search Details

Tool Used: web_search_20250305 (Claude's native web search)

How It Works:

Claude autonomously constructs search queries based on user intent
Searches ClinicalTrials.gov, PubMed, and other medical sources
Returns structured results with URLs and content
Content is encrypted (only Claude can read it internally)

Data Extracted:

Trial metadata (NCT ID, sponsor, phase, status)
Eligibility criteria
Endpoints (primary/secondary)
Sample sizes
Published outcomes (ORR, PFS, OS)

Document Version: 1.0
Status: 🔄 In Progress