Spaces:
Running
Running
| # Phase 11 Implementation Spec: Europe PMC Integration | |
| > **Status**: β COMPLETE | |
| > **Implemented**: `src/tools/europepmc.py` | |
| > **Tests**: `tests/unit/tools/test_europepmc.py` | |
| ## Overview | |
| Europe PMC provides access to preprints and peer-reviewed literature through a single, well-designed REST API. This replaces the originally planned bioRxiv integration due to bioRxiv's API limitations (no keyword search). | |
| ## Why Europe PMC Over bioRxiv? | |
| ### bioRxiv API Limitations (Why We Abandoned It) | |
| - bioRxiv API does NOT support keyword search | |
| - Only supports date-range queries returning all papers | |
| - Would require downloading entire date ranges and filtering client-side | |
| - Inefficient and impractical for our use case | |
| ### Europe PMC Advantages | |
| 1. **Full keyword search** - Query by any term | |
| 2. **Aggregates preprints** - Includes bioRxiv, medRxiv, ChemRxiv content | |
| 3. **No authentication required** - Free, open API | |
| 4. **34+ preprint servers indexed** - Not just bioRxiv | |
| 5. **REST API with JSON** - Easy integration | |
| ## API Reference | |
| **Base URL**: `https://www.ebi.ac.uk/europepmc/webservices/rest/search` | |
| **Documentation**: https://europepmc.org/RestfulWebService | |
| ### Parameters | |
| | Parameter | Value | Description | | |
| |-----------|-------|-------------| | |
| | `query` | string | Search keywords | | |
| | `resultType` | `core` | Full metadata including abstracts | | |
| | `pageSize` | 1-100 | Results per page | | |
| | `format` | `json` | Response format | | |
| ### Example Request | |
| ``` | |
| GET https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=metformin+alzheimer&resultType=core&pageSize=10&format=json | |
| ``` | |
| ## Implementation | |
| ### EuropePMCTool (`src/tools/europepmc.py`) | |
| ```python | |
| class EuropePMCTool: | |
| """ | |
| Search Europe PMC for papers and preprints. | |
| Europe PMC indexes: | |
| - PubMed/MEDLINE articles | |
| - PMC full-text articles | |
| - Preprints from bioRxiv, medRxiv, ChemRxiv, etc. | |
| - Patents and clinical guidelines | |
| """ | |
| BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search" | |
| @property | |
| def name(self) -> str: | |
| return "europepmc" | |
| async def search(self, query: str, max_results: int = 10) -> list[Evidence]: | |
| """Search Europe PMC for papers matching query.""" | |
| # Implementation with retry logic, error handling | |
| ``` | |
| ### Key Features | |
| 1. **Preprint Detection**: Automatically identifies preprints via `pubTypeList` | |
| 2. **Preprint Marking**: Adds `[PREPRINT - Not peer-reviewed]` prefix to content | |
| 3. **Relevance Scoring**: Preprints get 0.75, peer-reviewed get 0.9 | |
| 4. **URL Resolution**: DOI β PubMed β Europe PMC fallback chain | |
| 5. **Retry Logic**: 3 attempts with exponential backoff via tenacity | |
| ### Response Mapping | |
| | Europe PMC Field | Evidence Field | | |
| |------------------|----------------| | |
| | `title` | `citation.title` | | |
| | `abstractText` | `content` | | |
| | `doi` | Used for URL | | |
| | `pubYear` | `citation.date` | | |
| | `authorList.author` | `citation.authors` | | |
| | `pubTypeList.pubType` | Determines `citation.source` ("preprint" or "europepmc") | | |
| ## Unit Tests | |
| ### Test Coverage (`tests/unit/tools/test_europepmc.py`) | |
| | Test | Description | | |
| |------|-------------| | |
| | `test_tool_name` | Verifies tool name is "europepmc" | | |
| | `test_search_returns_evidence` | Basic search returns Evidence objects | | |
| | `test_search_marks_preprints` | Preprints have [PREPRINT] marker and source="preprint" | | |
| | `test_search_empty_results` | Handles empty results gracefully | | |
| ### Integration Test | |
| ```python | |
| @pytest.mark.integration | |
| async def test_real_api_call(): | |
| """Test actual API returns relevant results.""" | |
| tool = EuropePMCTool() | |
| results = await tool.search("long covid treatment", max_results=3) | |
| assert len(results) > 0 | |
| ``` | |
| ## SearchHandler Integration | |
| Europe PMC is included in `src/tools/search_handler.py` alongside PubMed and ClinicalTrials: | |
| ```python | |
| from src.tools.europepmc import EuropePMCTool | |
| class SearchHandler: | |
| def __init__(self): | |
| self.tools = [ | |
| PubMedTool(), | |
| ClinicalTrialsTool(), | |
| EuropePMCTool(), # Preprints + peer-reviewed | |
| ] | |
| ``` | |
| ## MCP Tools Integration | |
| Europe PMC is exposed via MCP in `src/mcp_tools.py`: | |
| ```python | |
| async def search_europepmc(query: str, max_results: int = 10) -> str: | |
| """Search Europe PMC for preprints and papers.""" | |
| results = await _europepmc.search(query, max_results) | |
| # Format and return | |
| ``` | |
| ## Verification | |
| ```bash | |
| # Run unit tests | |
| uv run pytest tests/unit/tools/test_europepmc.py -v | |
| # Run integration test (real API) | |
| uv run pytest tests/unit/tools/test_europepmc.py -v -m integration | |
| ``` | |
| ## Completion Checklist | |
| - [x] `src/tools/europepmc.py` implemented | |
| - [x] Unit tests in `tests/unit/tools/test_europepmc.py` | |
| - [x] Integration test with real API | |
| - [x] SearchHandler includes EuropePMCTool | |
| - [x] MCP wrapper in `src/mcp_tools.py` | |
| - [x] Preprint detection and marking | |
| - [x] Retry logic with exponential backoff | |
| ## Architecture Diagram | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β SearchHandler β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β | |
| β β PubMedTool β βClinicalTrialsβ β EuropePMCTool β β | |
| β β β β Tool β β β β | |
| β β Peer-review β β Trials β β Preprints + β β | |
| β β articles β β data β β peer-review β β | |
| β ββββββββ¬βββββββ ββββββββ¬ββββββββ βββββββββ¬ββββββββ β | |
| β β β β β | |
| β βΌ βΌ βΌ β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Evidence List β β | |
| β β (deduplicated, scored, with citations) β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |