Spaces:
Running
Running
File size: 6,741 Bytes
e720905 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# Phase 11 Implementation Spec: Europe PMC Integration
> **Status**: β
COMPLETE
> **Implemented**: `src/tools/europepmc.py`
> **Tests**: `tests/unit/tools/test_europepmc.py`
## Overview
Europe PMC provides access to preprints and peer-reviewed literature through a single, well-designed REST API. This replaces the originally planned bioRxiv integration due to bioRxiv's API limitations (no keyword search).
## Why Europe PMC Over bioRxiv?
### bioRxiv API Limitations (Why We Abandoned It)
- bioRxiv API does NOT support keyword search
- Only supports date-range queries returning all papers
- Would require downloading entire date ranges and filtering client-side
- Inefficient and impractical for our use case
### Europe PMC Advantages
1. **Full keyword search** - Query by any term
2. **Aggregates preprints** - Includes bioRxiv, medRxiv, ChemRxiv content
3. **No authentication required** - Free, open API
4. **34+ preprint servers indexed** - Not just bioRxiv
5. **REST API with JSON** - Easy integration
## API Reference
**Base URL**: `https://www.ebi.ac.uk/europepmc/webservices/rest/search`
**Documentation**: https://europepmc.org/RestfulWebService
### Parameters
| Parameter | Value | Description |
|-----------|-------|-------------|
| `query` | string | Search keywords |
| `resultType` | `core` | Full metadata including abstracts |
| `pageSize` | 1-100 | Results per page |
| `format` | `json` | Response format |
### Example Request
```
GET https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=metformin+alzheimer&resultType=core&pageSize=10&format=json
```
## Implementation
### EuropePMCTool (`src/tools/europepmc.py`)
```python
class EuropePMCTool:
"""
Search Europe PMC for papers and preprints.
Europe PMC indexes:
- PubMed/MEDLINE articles
- PMC full-text articles
- Preprints from bioRxiv, medRxiv, ChemRxiv, etc.
- Patents and clinical guidelines
"""
BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"
@property
def name(self) -> str:
return "europepmc"
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
"""Search Europe PMC for papers matching query."""
# Implementation with retry logic, error handling
```
### Key Features
1. **Preprint Detection**: Automatically identifies preprints via `pubTypeList`
2. **Preprint Marking**: Adds `[PREPRINT - Not peer-reviewed]` prefix to content
3. **Relevance Scoring**: Preprints get 0.75, peer-reviewed get 0.9
4. **URL Resolution**: DOI β PubMed β Europe PMC fallback chain
5. **Retry Logic**: 3 attempts with exponential backoff via tenacity
### Response Mapping
| Europe PMC Field | Evidence Field |
|------------------|----------------|
| `title` | `citation.title` |
| `abstractText` | `content` |
| `doi` | Used for URL |
| `pubYear` | `citation.date` |
| `authorList.author` | `citation.authors` |
| `pubTypeList.pubType` | Determines `citation.source` ("preprint" or "europepmc") |
## Unit Tests
### Test Coverage (`tests/unit/tools/test_europepmc.py`)
| Test | Description |
|------|-------------|
| `test_tool_name` | Verifies tool name is "europepmc" |
| `test_search_returns_evidence` | Basic search returns Evidence objects |
| `test_search_marks_preprints` | Preprints have [PREPRINT] marker and source="preprint" |
| `test_search_empty_results` | Handles empty results gracefully |
### Integration Test
```python
@pytest.mark.integration
async def test_real_api_call():
"""Test actual API returns relevant results."""
tool = EuropePMCTool()
results = await tool.search("long covid treatment", max_results=3)
assert len(results) > 0
```
## SearchHandler Integration
Europe PMC is included in `src/tools/search_handler.py` alongside PubMed and ClinicalTrials:
```python
from src.tools.europepmc import EuropePMCTool
class SearchHandler:
def __init__(self):
self.tools = [
PubMedTool(),
ClinicalTrialsTool(),
EuropePMCTool(), # Preprints + peer-reviewed
]
```
## MCP Tools Integration
Europe PMC is exposed via MCP in `src/mcp_tools.py`:
```python
async def search_europepmc(query: str, max_results: int = 10) -> str:
"""Search Europe PMC for preprints and papers."""
results = await _europepmc.search(query, max_results)
# Format and return
```
## Verification
```bash
# Run unit tests
uv run pytest tests/unit/tools/test_europepmc.py -v
# Run integration test (real API)
uv run pytest tests/unit/tools/test_europepmc.py -v -m integration
```
## Completion Checklist
- [x] `src/tools/europepmc.py` implemented
- [x] Unit tests in `tests/unit/tools/test_europepmc.py`
- [x] Integration test with real API
- [x] SearchHandler includes EuropePMCTool
- [x] MCP wrapper in `src/mcp_tools.py`
- [x] Preprint detection and marking
- [x] Retry logic with exponential backoff
## Architecture Diagram
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SearchHandler β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β PubMedTool β βClinicalTrialsβ β EuropePMCTool β β
β β β β Tool β β β β
β β Peer-review β β Trials β β Preprints + β β
β β articles β β data β β peer-review β β
β ββββββββ¬βββββββ ββββββββ¬ββββββββ βββββββββ¬ββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β Evidence List β β
β β (deduplicated, scored, with citations) β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
|