DeepBoner / docs /brainstorming /02_CLINICALTRIALS_IMPROVEMENTS.md
VibecoderMcSwaggins's picture
feat: add roadmap summary and detailed improvement plans for data sources
9286db5

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

ClinicalTrials.gov Tool: Current State & Future Improvements

Status: Currently Implemented Priority: High (Core Data Source for Drug Repurposing)


Current Implementation

What We Have (src/tools/clinicaltrials.py)

  • V2 API search via clinicaltrials.gov/api/v2/studies
  • Filters: INTERVENTIONAL study type, RECRUITING status
  • Returns: NCT ID, title, conditions, interventions, phase, status
  • Query preprocessing via shared query_utils.py

Current Strengths

  1. Good Filtering: Already filtering for interventional + recruiting
  2. V2 API: Using the modern API (v1 deprecated)
  3. Phase Info: Extracting trial phases for drug development context

Current Limitations

  1. No Outcome Data: Missing primary/secondary outcomes
  2. No Eligibility Criteria: Missing inclusion/exclusion details
  3. No Sponsor Info: Missing who's running the trial
  4. No Result Data: For completed trials, no efficacy data
  5. Limited Drug Mapping: No integration with drug databases

API Capabilities We're Not Using

Fields We Could Request

# Current fields
fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"]

# Additional valuable fields
additional_fields = [
    "PrimaryOutcomeMeasure",      # What are they measuring?
    "SecondaryOutcomeMeasure",    # Secondary endpoints
    "EligibilityCriteria",        # Who can participate?
    "LeadSponsorName",            # Who's funding?
    "ResultsFirstPostDate",       # Has results?
    "StudyFirstPostDate",         # When started?
    "CompletionDate",             # When finished?
    "EnrollmentCount",            # Sample size
    "InterventionDescription",    # Drug details
    "ArmGroupLabel",              # Treatment arms
    "InterventionOtherName",      # Drug aliases
]

Filter Enhancements

# Current
aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING"

# Could add
"status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED"  # Include completed for results
"phase:PHASE2,PHASE3"  # Only later-stage trials
"resultsFirstPostDateRange:2020-01-01_"  # Trials with posted results

Recommended Improvements

Phase 1: Richer Metadata

EXTENDED_FIELDS = [
    "NCTId",
    "BriefTitle",
    "OfficialTitle",
    "Condition",
    "InterventionName",
    "InterventionDescription",
    "InterventionOtherName",  # Drug synonyms!
    "Phase",
    "OverallStatus",
    "PrimaryOutcomeMeasure",
    "EnrollmentCount",
    "LeadSponsorName",
    "StudyFirstPostDate",
]

Phase 2: Results Retrieval

For completed trials, we can get actual efficacy data:

async def get_trial_results(nct_id: str) -> dict | None:
    """Fetch results for completed trials."""
    url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
    params = {
        "fields": "ResultsSection",
    }
    # Returns outcome measures and statistics

Phase 3: Drug Name Normalization

Map intervention names to standard identifiers:

# Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug
# Solution: Use RxNorm or DrugBank for normalization

async def normalize_drug_name(intervention: str) -> str:
    """Normalize drug name via RxNorm API."""
    url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}"
    # Returns standardized RxCUI

Integration Opportunities

With PubMed

Cross-reference trials with publications:

# ClinicalTrials.gov provides PMID links
# Can correlate trial results with published papers

With DrugBank/ChEMBL

Map interventions to:

  • Mechanism of action
  • Known targets
  • Adverse effects
  • Drug-drug interactions

Python Libraries to Consider

Library Purpose Notes
pytrials CT.gov wrapper V2 API support unclear
clinicaltrials Data tracking More for analysis
drugbank-downloader Drug mapping Requires license

API Quirks & Gotchas

  1. Rate Limiting: Undocumented, be conservative
  2. Pagination: Max 1000 results per request
  3. Field Names: Case-sensitive, camelCase
  4. Empty Results: Some fields may be null even if requested
  5. Status Changes: Trials change status frequently

Example Enhanced Query

async def search_drug_repurposing_trials(
    drug_name: str,
    condition: str,
    include_completed: bool = True,
) -> list[Evidence]:
    """Search for trials repurposing a drug for a new condition."""

    statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"]
    if include_completed:
        statuses.append("COMPLETED")

    params = {
        "query.intr": drug_name,
        "query.cond": condition,
        "filter.overallStatus": ",".join(statuses),
        "filter.studyType": "INTERVENTIONAL",
        "fields": ",".join(EXTENDED_FIELDS),
        "pageSize": 50,
    }

Sources