File size: 6,741 Bytes
e720905
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# Phase 11 Implementation Spec: Europe PMC Integration

> **Status**: βœ… COMPLETE
> **Implemented**: `src/tools/europepmc.py`
> **Tests**: `tests/unit/tools/test_europepmc.py`

## Overview

Europe PMC provides access to preprints and peer-reviewed literature through a single, well-designed REST API. This replaces the originally planned bioRxiv integration due to bioRxiv's API limitations (no keyword search).

## Why Europe PMC Over bioRxiv?

### bioRxiv API Limitations (Why We Abandoned It)
- bioRxiv API does NOT support keyword search
- Only supports date-range queries returning all papers
- Would require downloading entire date ranges and filtering client-side
- Inefficient and impractical for our use case

### Europe PMC Advantages
1. **Full keyword search** - Query by any term
2. **Aggregates preprints** - Includes bioRxiv, medRxiv, ChemRxiv content
3. **No authentication required** - Free, open API
4. **34+ preprint servers indexed** - Not just bioRxiv
5. **REST API with JSON** - Easy integration

## API Reference

**Base URL**: `https://www.ebi.ac.uk/europepmc/webservices/rest/search`

**Documentation**: https://europepmc.org/RestfulWebService

### Parameters

| Parameter | Value | Description |
|-----------|-------|-------------|
| `query` | string | Search keywords |
| `resultType` | `core` | Full metadata including abstracts |
| `pageSize` | 1-100 | Results per page |
| `format` | `json` | Response format |

### Example Request

```
GET https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=metformin+alzheimer&resultType=core&pageSize=10&format=json
```

## Implementation

### EuropePMCTool (`src/tools/europepmc.py`)

```python
class EuropePMCTool:
    """
    Search Europe PMC for papers and preprints.

    Europe PMC indexes:
    - PubMed/MEDLINE articles
    - PMC full-text articles
    - Preprints from bioRxiv, medRxiv, ChemRxiv, etc.
    - Patents and clinical guidelines
    """

    BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"

    @property
    def name(self) -> str:
        return "europepmc"

    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
        """Search Europe PMC for papers matching query."""
        # Implementation with retry logic, error handling
```

### Key Features

1. **Preprint Detection**: Automatically identifies preprints via `pubTypeList`
2. **Preprint Marking**: Adds `[PREPRINT - Not peer-reviewed]` prefix to content
3. **Relevance Scoring**: Preprints get 0.75, peer-reviewed get 0.9
4. **URL Resolution**: DOI β†’ PubMed β†’ Europe PMC fallback chain
5. **Retry Logic**: 3 attempts with exponential backoff via tenacity

### Response Mapping

| Europe PMC Field | Evidence Field |
|------------------|----------------|
| `title` | `citation.title` |
| `abstractText` | `content` |
| `doi` | Used for URL |
| `pubYear` | `citation.date` |
| `authorList.author` | `citation.authors` |
| `pubTypeList.pubType` | Determines `citation.source` ("preprint" or "europepmc") |

## Unit Tests

### Test Coverage (`tests/unit/tools/test_europepmc.py`)

| Test | Description |
|------|-------------|
| `test_tool_name` | Verifies tool name is "europepmc" |
| `test_search_returns_evidence` | Basic search returns Evidence objects |
| `test_search_marks_preprints` | Preprints have [PREPRINT] marker and source="preprint" |
| `test_search_empty_results` | Handles empty results gracefully |

### Integration Test

```python
@pytest.mark.integration
async def test_real_api_call():
    """Test actual API returns relevant results."""
    tool = EuropePMCTool()
    results = await tool.search("long covid treatment", max_results=3)
    assert len(results) > 0
```

## SearchHandler Integration

Europe PMC is included in `src/tools/search_handler.py` alongside PubMed and ClinicalTrials:

```python
from src.tools.europepmc import EuropePMCTool

class SearchHandler:
    def __init__(self):
        self.tools = [
            PubMedTool(),
            ClinicalTrialsTool(),
            EuropePMCTool(),  # Preprints + peer-reviewed
        ]
```

## MCP Tools Integration

Europe PMC is exposed via MCP in `src/mcp_tools.py`:

```python
async def search_europepmc(query: str, max_results: int = 10) -> str:
    """Search Europe PMC for preprints and papers."""
    results = await _europepmc.search(query, max_results)
    # Format and return
```

## Verification

```bash
# Run unit tests
uv run pytest tests/unit/tools/test_europepmc.py -v

# Run integration test (real API)
uv run pytest tests/unit/tools/test_europepmc.py -v -m integration
```

## Completion Checklist

- [x] `src/tools/europepmc.py` implemented
- [x] Unit tests in `tests/unit/tools/test_europepmc.py`
- [x] Integration test with real API
- [x] SearchHandler includes EuropePMCTool
- [x] MCP wrapper in `src/mcp_tools.py`
- [x] Preprint detection and marking
- [x] Retry logic with exponential backoff

## Architecture Diagram

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SearchHandler                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ PubMedTool  β”‚  β”‚ClinicalTrialsβ”‚  β”‚ EuropePMCTool β”‚  β”‚
β”‚  β”‚             β”‚  β”‚    Tool      β”‚  β”‚               β”‚  β”‚
β”‚  β”‚ Peer-review β”‚  β”‚   Trials     β”‚  β”‚  Preprints +  β”‚  β”‚
β”‚  β”‚  articles   β”‚  β”‚   data       β”‚  β”‚  peer-review  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚                β”‚                  β”‚          β”‚
β”‚         β–Ό                β–Ό                  β–Ό          β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚    β”‚              Evidence List                  β”‚     β”‚
β”‚    β”‚  (deduplicated, scored, with citations)     β”‚     β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```