| # Phase 10 Implementation Spec: ClinicalTrials.gov Integration |
|
|
| **Goal**: Add clinical trial search for drug repurposing evidence. |
| **Philosophy**: "Clinical trials are the bridge from hypothesis to therapy." |
| **Prerequisite**: Phase 9 complete (DuckDuckGo removed) |
| **Estimated Time**: 2-3 hours |
|
|
| --- |
|
|
| ## 1. Why ClinicalTrials.gov? |
|
|
| ### Scientific Value |
|
|
| | Feature | Value for Drug Repurposing | |
| |---------|---------------------------| |
| | **400,000+ studies** | Massive evidence base | |
| | **Trial phase data** | Phase I/II/III = evidence strength | |
| | **Intervention details** | Exact drug + dosing | |
| | **Outcome measures** | What was measured | |
| | **Status tracking** | Completed vs recruiting | |
| | **Free API** | No cost, no key required | |
|
|
| ### Example Query Response |
|
|
| Query: "metformin Alzheimer's" |
|
|
| ```json |
| { |
| "studies": [ |
| { |
| "nctId": "NCT04098666", |
| "briefTitle": "Metformin in Alzheimer's Dementia Prevention", |
| "phase": "Phase 2", |
| "status": "Recruiting", |
| "conditions": ["Alzheimer Disease"], |
| "interventions": ["Drug: Metformin"] |
| } |
| ] |
| } |
| ``` |
|
|
| **This is GOLD for drug repurposing** - actual trials testing the hypothesis! |
|
|
| --- |
|
|
| ## 2. API Specification |
|
|
| ### Endpoint |
|
|
| ``` |
| Base URL: https://clinicaltrials.gov/api/v2/studies |
| ``` |
|
|
| ### Key Parameters |
|
|
| | Parameter | Description | Example | |
| |-----------|-------------|---------| |
| | `query.cond` | Condition/disease | `Alzheimer` | |
| | `query.intr` | Intervention/drug | `Metformin` | |
| | `query.term` | General search | `metformin alzheimer` | |
| | `pageSize` | Results per page | `20` | |
| | `fields` | Fields to return | See below | |
|
|
| ### Fields We Need |
|
|
| ``` |
| NCTId, BriefTitle, Phase, OverallStatus, Condition, |
| InterventionName, StartDate, CompletionDate, BriefSummary |
| ``` |
|
|
| ### Rate Limits |
|
|
| - ~50 requests/minute per IP |
| - No authentication required |
| - Paginated (100 results max per call) |
|
|
| ### Documentation |
|
|
| - [API v2 Docs](https://clinicaltrials.gov/data-api/api) |
| - [Migration Guide](https://www.nlm.nih.gov/pubs/techbull/ma24/ma24_clinicaltrials_api.html) |
|
|
| --- |
|
|
| ## 3. Data Model |
|
|
| ### 3.1 Update Citation Source Type (`src/utils/models.py`) |
|
|
| ```python |
| # BEFORE |
| source: Literal["pubmed", "web"] |
| |
| # AFTER |
| source: Literal["pubmed", "clinicaltrials", "europepmc"] |
| ``` |
|
|
| ### 3.2 Evidence from Clinical Trials |
|
|
| Clinical trial data maps to our existing `Evidence` model: |
|
|
| ```python |
| Evidence( |
| content=f"{brief_summary}. Phase: {phase}. Status: {status}.", |
| citation=Citation( |
| source="clinicaltrials", |
| title=brief_title, |
| url=f"https://clinicaltrials.gov/study/{nct_id}", |
| date=start_date or "Unknown", |
| authors=[] # Trials don't have authors in the same way |
| ), |
| relevance=0.8 # Trials are highly relevant for repurposing |
| ) |
| ``` |
|
|
| --- |
|
|
| ## 4. Implementation |
|
|
| ### 4.0 Important: HTTP Client Selection |
|
|
| **ClinicalTrials.gov's WAF blocks `httpx`'s TLS fingerprint.** Use `requests` instead. |
|
|
| | Library | Status | Notes | |
| |---------|--------|-------| |
| | `httpx` | ❌ 403 Blocked | TLS/JA3 fingerprint flagged | |
| | `httpx[http2]` | ❌ 403 Blocked | HTTP/2 doesn't help | |
| | `requests` | ✅ Works | Industry standard, not blocked | |
| | `urllib` | ✅ Works | Stdlib alternative | |
|
|
| We use `requests` wrapped in `asyncio.to_thread()` for async compatibility. |
|
|
| ### 4.1 ClinicalTrials Tool (`src/tools/clinicaltrials.py`) |
|
|
| ```python |
| """ClinicalTrials.gov search tool using API v2.""" |
| |
| import asyncio |
| from typing import Any, ClassVar |
| |
| import requests |
| from tenacity import retry, stop_after_attempt, wait_exponential |
| |
| from src.utils.exceptions import SearchError |
| from src.utils.models import Citation, Evidence |
| |
| |
| class ClinicalTrialsTool: |
| """Search tool for ClinicalTrials.gov. |
| |
| Note: Uses `requests` library instead of `httpx` because ClinicalTrials.gov's |
| WAF blocks httpx's TLS fingerprint. The `requests` library is not blocked. |
| """ |
| |
| BASE_URL = "https://clinicaltrials.gov/api/v2/studies" |
| FIELDS: ClassVar[list[str]] = [ |
| "NCTId", |
| "BriefTitle", |
| "Phase", |
| "OverallStatus", |
| "Condition", |
| "InterventionName", |
| "StartDate", |
| "BriefSummary", |
| ] |
| |
| @property |
| def name(self) -> str: |
| return "clinicaltrials" |
| |
| @retry( |
| stop=stop_after_attempt(3), |
| wait=wait_exponential(multiplier=1, min=1, max=10), |
| reraise=True, |
| ) |
| async def search(self, query: str, max_results: int = 10) -> list[Evidence]: |
| """Search ClinicalTrials.gov for studies.""" |
| params = { |
| "query.term": query, |
| "pageSize": min(max_results, 100), |
| "fields": "|".join(self.FIELDS), |
| } |
| |
| try: |
| # Run blocking requests.get in a separate thread for async compatibility |
| response = await asyncio.to_thread( |
| requests.get, |
| self.BASE_URL, |
| params=params, |
| headers={"User-Agent": "DeepBoner-Research-Agent/1.0"}, |
| timeout=30, |
| ) |
| response.raise_for_status() |
| |
| data = response.json() |
| studies = data.get("studies", []) |
| return [self._study_to_evidence(study) for study in studies[:max_results]] |
| |
| except requests.HTTPError as e: |
| raise SearchError(f"ClinicalTrials.gov API error: {e}") from e |
| except requests.RequestException as e: |
| raise SearchError(f"ClinicalTrials.gov request failed: {e}") from e |
| |
| def _study_to_evidence(self, study: dict) -> Evidence: |
| """Convert a clinical trial study to Evidence.""" |
| # Navigate nested structure |
| protocol = study.get("protocolSection", {}) |
| id_module = protocol.get("identificationModule", {}) |
| status_module = protocol.get("statusModule", {}) |
| desc_module = protocol.get("descriptionModule", {}) |
| design_module = protocol.get("designModule", {}) |
| conditions_module = protocol.get("conditionsModule", {}) |
| arms_module = protocol.get("armsInterventionsModule", {}) |
| |
| nct_id = id_module.get("nctId", "Unknown") |
| title = id_module.get("briefTitle", "Untitled Study") |
| status = status_module.get("overallStatus", "Unknown") |
| start_date = status_module.get("startDateStruct", {}).get("date", "Unknown") |
| |
| # Get phase (might be a list) |
| phases = design_module.get("phases", []) |
| phase = phases[0] if phases else "Not Applicable" |
| |
| # Get conditions |
| conditions = conditions_module.get("conditions", []) |
| conditions_str = ", ".join(conditions[:3]) if conditions else "Unknown" |
| |
| # Get interventions |
| interventions = arms_module.get("interventions", []) |
| intervention_names = [i.get("name", "") for i in interventions[:3]] |
| interventions_str = ", ".join(intervention_names) if intervention_names else "Unknown" |
| |
| # Get summary |
| summary = desc_module.get("briefSummary", "No summary available.") |
| |
| # Build content with key trial info |
| content = ( |
| f"{summary[:500]}... " |
| f"Trial Phase: {phase}. " |
| f"Status: {status}. " |
| f"Conditions: {conditions_str}. " |
| f"Interventions: {interventions_str}." |
| ) |
| |
| return Evidence( |
| content=content[:2000], |
| citation=Citation( |
| source="clinicaltrials", |
| title=title[:500], |
| url=f"https://clinicaltrials.gov/study/{nct_id}", |
| date=start_date, |
| authors=[], # Trials don't have traditional authors |
| ), |
| relevance=0.85, # Trials are highly relevant for repurposing |
| ) |
| ``` |
|
|
| --- |
|
|
| ## 5. TDD Test Suite |
|
|
| ### 5.1 Unit Tests (`tests/unit/tools/test_clinicaltrials.py`) |
| |
| Uses `unittest.mock.patch` to mock `requests.get` (not `respx` since we're not using `httpx`). |
| |
| ```python |
| """Unit tests for ClinicalTrials.gov tool.""" |
| |
| from unittest.mock import MagicMock, patch |
| |
| import pytest |
| import requests |
| |
| from src.tools.clinicaltrials import ClinicalTrialsTool |
| from src.utils.exceptions import SearchError |
| from src.utils.models import Evidence |
| |
| |
| @pytest.fixture |
| def mock_clinicaltrials_response() -> dict: |
| """Mock ClinicalTrials.gov API response.""" |
| return { |
| "studies": [ |
| { |
| "protocolSection": { |
| "identificationModule": { |
| "nctId": "NCT04098666", |
| "briefTitle": "Metformin in Alzheimer's Dementia Prevention", |
| }, |
| "statusModule": { |
| "overallStatus": "Recruiting", |
| "startDateStruct": {"date": "2020-01-15"}, |
| }, |
| "descriptionModule": { |
| "briefSummary": "This study evaluates metformin for Alzheimer's prevention." |
| }, |
| "designModule": {"phases": ["PHASE2"]}, |
| "conditionsModule": {"conditions": ["Alzheimer Disease", "Dementia"]}, |
| "armsInterventionsModule": { |
| "interventions": [{"name": "Metformin", "type": "Drug"}] |
| }, |
| } |
| } |
| ] |
| } |
| |
| |
| class TestClinicalTrialsTool: |
| """Tests for ClinicalTrialsTool.""" |
| |
| def test_tool_name(self) -> None: |
| """Tool should have correct name.""" |
| tool = ClinicalTrialsTool() |
| assert tool.name == "clinicaltrials" |
| |
| @pytest.mark.asyncio |
| async def test_search_returns_evidence( |
| self, mock_clinicaltrials_response: dict |
| ) -> None: |
| """Search should return Evidence objects.""" |
| with patch("src.tools.clinicaltrials.requests.get") as mock_get: |
| mock_response = MagicMock() |
| mock_response.json.return_value = mock_clinicaltrials_response |
| mock_response.raise_for_status = MagicMock() |
| mock_get.return_value = mock_response |
| |
| tool = ClinicalTrialsTool() |
| results = await tool.search("metformin alzheimer", max_results=5) |
| |
| assert len(results) == 1 |
| assert isinstance(results[0], Evidence) |
| assert results[0].citation.source == "clinicaltrials" |
| assert "NCT04098666" in results[0].citation.url |
| assert "Metformin" in results[0].citation.title |
| |
| @pytest.mark.asyncio |
| async def test_search_api_error(self) -> None: |
| """Search should raise SearchError on API failure.""" |
| with patch("src.tools.clinicaltrials.requests.get") as mock_get: |
| mock_response = MagicMock() |
| mock_response.raise_for_status.side_effect = requests.HTTPError( |
| "500 Server Error" |
| ) |
| mock_get.return_value = mock_response |
| |
| tool = ClinicalTrialsTool() |
| |
| with pytest.raises(SearchError): |
| await tool.search("metformin alzheimer") |
| |
|
|
| class TestClinicalTrialsIntegration: |
| """Integration tests (marked for separate run).""" |
| |
| @pytest.mark.integration |
| @pytest.mark.asyncio |
| async def test_real_api_call(self) -> None: |
| """Test actual API call (requires network).""" |
| tool = ClinicalTrialsTool() |
| results = await tool.search("metformin diabetes", max_results=3) |
| |
| assert len(results) > 0 |
| assert all(isinstance(r, Evidence) for r in results) |
| assert all(r.citation.source == "clinicaltrials" for r in results) |
| ``` |
| |
| --- |
|
|
| ## 6. Integration with SearchHandler |
|
|
| ### 6.1 Update Example Files |
|
|
| ```python |
| # examples/search_demo/run_search.py |
| from src.tools.clinicaltrials import ClinicalTrialsTool |
| from src.tools.pubmed import PubMedTool |
| from src.tools.search_handler import SearchHandler |
| |
| search_handler = SearchHandler( |
| tools=[PubMedTool(), ClinicalTrialsTool()], |
| timeout=30.0 |
| ) |
| ``` |
|
|
| ### 6.2 Update SearchResult Type |
|
|
| ```python |
| # src/utils/models.py |
| sources_searched: list[Literal["pubmed", "clinicaltrials"]] |
| ``` |
|
|
| --- |
|
|
| ## 7. Definition of Done |
|
|
| Phase 10 is **COMPLETE** when: |
|
|
| - [ ] `src/tools/clinicaltrials.py` implemented |
| - [ ] Unit tests in `tests/unit/tools/test_clinicaltrials.py` |
| - [ ] Integration test marked with `@pytest.mark.integration` |
| - [ ] SearchHandler updated to include ClinicalTrialsTool |
| - [ ] Type definitions updated in models.py |
| - [ ] Example files updated |
| - [ ] All unit tests pass |
| - [ ] Lints pass |
| - [ ] Manual verification with real API |
|
|
| --- |
|
|
| ## 8. Verification Commands |
|
|
| ```bash |
| # 1. Run unit tests |
| uv run pytest tests/unit/tools/test_clinicaltrials.py -v |
| |
| # 2. Run integration test (requires network) |
| uv run pytest tests/unit/tools/test_clinicaltrials.py -v -m integration |
| |
| # 3. Run full test suite |
| uv run pytest tests/unit/ -v |
| |
| # 4. Run example |
| source .env && uv run python examples/search_demo/run_search.py "metformin alzheimer" |
| # Should show results from BOTH PubMed AND ClinicalTrials.gov |
| ``` |
|
|
| --- |
|
|
| ## 9. Value Delivered |
|
|
| | Before | After | |
| |--------|-------| |
| | Papers only | Papers + Clinical Trials | |
| | "Drug X might help" | "Drug X is in Phase II trial" | |
| | No trial status | Recruiting/Completed/Terminated | |
| | No phase info | Phase I/II/III evidence strength | |
|
|
| **Demo pitch addition**: |
| > "DeepBoner searches PubMed for peer-reviewed evidence AND ClinicalTrials.gov for 400,000+ clinical trials." |
|
|