| # ClinicalTrials.gov Tool: Current State & Future Improvements |
|
|
| **Status**: Currently Implemented |
| **Priority**: High (Core Data Source for Drug Repurposing) |
|
|
| --- |
|
|
| ## Current Implementation |
|
|
| ### What We Have (`src/tools/clinicaltrials.py`) |
|
|
| - V2 API search via `clinicaltrials.gov/api/v2/studies` |
| - Filters: `INTERVENTIONAL` study type, `RECRUITING` status |
| - Returns: NCT ID, title, conditions, interventions, phase, status |
| - Query preprocessing via shared `query_utils.py` |
|
|
| ### Current Strengths |
|
|
| 1. **Good Filtering**: Already filtering for interventional + recruiting |
| 2. **V2 API**: Using the modern API (v1 deprecated) |
| 3. **Phase Info**: Extracting trial phases for drug development context |
|
|
| ### Current Limitations |
|
|
| 1. **No Outcome Data**: Missing primary/secondary outcomes |
| 2. **No Eligibility Criteria**: Missing inclusion/exclusion details |
| 3. **No Sponsor Info**: Missing who's running the trial |
| 4. **No Result Data**: For completed trials, no efficacy data |
| 5. **Limited Drug Mapping**: No integration with drug databases |
|
|
| --- |
|
|
| ## API Capabilities We're Not Using |
|
|
| ### Fields We Could Request |
|
|
| ```python |
| # Current fields |
| fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"] |
| |
| # Additional valuable fields |
| additional_fields = [ |
| "PrimaryOutcomeMeasure", # What are they measuring? |
| "SecondaryOutcomeMeasure", # Secondary endpoints |
| "EligibilityCriteria", # Who can participate? |
| "LeadSponsorName", # Who's funding? |
| "ResultsFirstPostDate", # Has results? |
| "StudyFirstPostDate", # When started? |
| "CompletionDate", # When finished? |
| "EnrollmentCount", # Sample size |
| "InterventionDescription", # Drug details |
| "ArmGroupLabel", # Treatment arms |
| "InterventionOtherName", # Drug aliases |
| ] |
| ``` |
|
|
| ### Filter Enhancements |
|
|
| ```python |
| # Current |
| aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING" |
| |
| # Could add |
| "status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED" # Include completed for results |
| "phase:PHASE2,PHASE3" # Only later-stage trials |
| "resultsFirstPostDateRange:2020-01-01_" # Trials with posted results |
| ``` |
|
|
| --- |
|
|
| ## Recommended Improvements |
|
|
| ### Phase 1: Richer Metadata |
|
|
| ```python |
| EXTENDED_FIELDS = [ |
| "NCTId", |
| "BriefTitle", |
| "OfficialTitle", |
| "Condition", |
| "InterventionName", |
| "InterventionDescription", |
| "InterventionOtherName", # Drug synonyms! |
| "Phase", |
| "OverallStatus", |
| "PrimaryOutcomeMeasure", |
| "EnrollmentCount", |
| "LeadSponsorName", |
| "StudyFirstPostDate", |
| ] |
| ``` |
|
|
| ### Phase 2: Results Retrieval |
|
|
| For completed trials, we can get actual efficacy data: |
|
|
| ```python |
| async def get_trial_results(nct_id: str) -> dict | None: |
| """Fetch results for completed trials.""" |
| url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}" |
| params = { |
| "fields": "ResultsSection", |
| } |
| # Returns outcome measures and statistics |
| ``` |
|
|
| ### Phase 3: Drug Name Normalization |
|
|
| Map intervention names to standard identifiers: |
|
|
| ```python |
| # Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug |
| # Solution: Use RxNorm or DrugBank for normalization |
| |
| async def normalize_drug_name(intervention: str) -> str: |
| """Normalize drug name via RxNorm API.""" |
| url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}" |
| # Returns standardized RxCUI |
| ``` |
|
|
| --- |
|
|
| ## Integration Opportunities |
|
|
| ### With PubMed |
|
|
| Cross-reference trials with publications: |
| ```python |
| # ClinicalTrials.gov provides PMID links |
| # Can correlate trial results with published papers |
| ``` |
|
|
| ### With DrugBank/ChEMBL |
|
|
| Map interventions to: |
| - Mechanism of action |
| - Known targets |
| - Adverse effects |
| - Drug-drug interactions |
|
|
| --- |
|
|
| ## Python Libraries to Consider |
|
|
| | Library | Purpose | Notes | |
| |---------|---------|-------| |
| | [pytrials](https://pypi.org/project/pytrials/) | CT.gov wrapper | V2 API support unclear | |
| | [clinicaltrials](https://github.com/ebmdatalab/clinicaltrials-act-tracker) | Data tracking | More for analysis | |
| | [drugbank-downloader](https://pypi.org/project/drugbank-downloader/) | Drug mapping | Requires license | |
|
|
| --- |
|
|
| ## API Quirks & Gotchas |
|
|
| 1. **Rate Limiting**: Undocumented, be conservative |
| 2. **Pagination**: Max 1000 results per request |
| 3. **Field Names**: Case-sensitive, camelCase |
| 4. **Empty Results**: Some fields may be null even if requested |
| 5. **Status Changes**: Trials change status frequently |
|
|
| --- |
|
|
| ## Example Enhanced Query |
|
|
| ```python |
| async def search_drug_repurposing_trials( |
| drug_name: str, |
| condition: str, |
| include_completed: bool = True, |
| ) -> list[Evidence]: |
| """Search for trials repurposing a drug for a new condition.""" |
| |
| statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"] |
| if include_completed: |
| statuses.append("COMPLETED") |
| |
| params = { |
| "query.intr": drug_name, |
| "query.cond": condition, |
| "filter.overallStatus": ",".join(statuses), |
| "filter.studyType": "INTERVENTIONAL", |
| "fields": ",".join(EXTENDED_FIELDS), |
| "pageSize": 50, |
| } |
| ``` |
|
|
| --- |
|
|
| ## Sources |
|
|
| - [ClinicalTrials.gov API Documentation](https://clinicaltrials.gov/data-api/api) |
| - [CT.gov Field Definitions](https://clinicaltrials.gov/data-api/about-api/study-data-structure) |
| - [RxNorm API](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiByString.html) |
|
|