Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.1.0
Phase 11 Implementation Spec: Europe PMC Integration
Status: β COMPLETE Implemented:
src/tools/europepmc.pyTests:tests/unit/tools/test_europepmc.py
Overview
Europe PMC provides access to preprints and peer-reviewed literature through a single, well-designed REST API. This replaces the originally planned bioRxiv integration due to bioRxiv's API limitations (no keyword search).
Why Europe PMC Over bioRxiv?
bioRxiv API Limitations (Why We Abandoned It)
- bioRxiv API does NOT support keyword search
- Only supports date-range queries returning all papers
- Would require downloading entire date ranges and filtering client-side
- Inefficient and impractical for our use case
Europe PMC Advantages
- Full keyword search - Query by any term
- Aggregates preprints - Includes bioRxiv, medRxiv, ChemRxiv content
- No authentication required - Free, open API
- 34+ preprint servers indexed - Not just bioRxiv
- REST API with JSON - Easy integration
API Reference
Base URL: https://www.ebi.ac.uk/europepmc/webservices/rest/search
Documentation: https://europepmc.org/RestfulWebService
Parameters
| Parameter | Value | Description |
|---|---|---|
query |
string | Search keywords |
resultType |
core |
Full metadata including abstracts |
pageSize |
1-100 | Results per page |
format |
json |
Response format |
Example Request
GET https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=metformin+alzheimer&resultType=core&pageSize=10&format=json
Implementation
EuropePMCTool (src/tools/europepmc.py)
class EuropePMCTool:
"""
Search Europe PMC for papers and preprints.
Europe PMC indexes:
- PubMed/MEDLINE articles
- PMC full-text articles
- Preprints from bioRxiv, medRxiv, ChemRxiv, etc.
- Patents and clinical guidelines
"""
BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"
@property
def name(self) -> str:
return "europepmc"
async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
"""Search Europe PMC for papers matching query."""
# Implementation with retry logic, error handling
Key Features
- Preprint Detection: Automatically identifies preprints via
pubTypeList - Preprint Marking: Adds
[PREPRINT - Not peer-reviewed]prefix to content - Relevance Scoring: Preprints get 0.75, peer-reviewed get 0.9
- URL Resolution: DOI β PubMed β Europe PMC fallback chain
- Retry Logic: 3 attempts with exponential backoff via tenacity
Response Mapping
| Europe PMC Field | Evidence Field |
|---|---|
title |
citation.title |
abstractText |
content |
doi |
Used for URL |
pubYear |
citation.date |
authorList.author |
citation.authors |
pubTypeList.pubType |
Determines citation.source ("preprint" or "europepmc") |
Unit Tests
Test Coverage (tests/unit/tools/test_europepmc.py)
| Test | Description |
|---|---|
test_tool_name |
Verifies tool name is "europepmc" |
test_search_returns_evidence |
Basic search returns Evidence objects |
test_search_marks_preprints |
Preprints have [PREPRINT] marker and source="preprint" |
test_search_empty_results |
Handles empty results gracefully |
Integration Test
@pytest.mark.integration
async def test_real_api_call():
"""Test actual API returns relevant results."""
tool = EuropePMCTool()
results = await tool.search("long covid treatment", max_results=3)
assert len(results) > 0
SearchHandler Integration
Europe PMC is included in src/tools/search_handler.py alongside PubMed and ClinicalTrials:
from src.tools.europepmc import EuropePMCTool
class SearchHandler:
def __init__(self):
self.tools = [
PubMedTool(),
ClinicalTrialsTool(),
EuropePMCTool(), # Preprints + peer-reviewed
]
MCP Tools Integration
Europe PMC is exposed via MCP in src/mcp_tools.py:
async def search_europepmc(query: str, max_results: int = 10) -> str:
"""Search Europe PMC for preprints and papers."""
results = await _europepmc.search(query, max_results)
# Format and return
Verification
# Run unit tests
uv run pytest tests/unit/tools/test_europepmc.py -v
# Run integration test (real API)
uv run pytest tests/unit/tools/test_europepmc.py -v -m integration
Completion Checklist
-
src/tools/europepmc.pyimplemented - Unit tests in
tests/unit/tools/test_europepmc.py - Integration test with real API
- SearchHandler includes EuropePMCTool
- MCP wrapper in
src/mcp_tools.py - Preprint detection and marking
- Retry logic with exponential backoff
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SearchHandler β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββ β
β β PubMedTool β βClinicalTrialsβ β EuropePMCTool β β
β β β β Tool β β β β
β β Peer-review β β Trials β β Preprints + β β
β β articles β β data β β peer-review β β
β ββββββββ¬βββββββ ββββββββ¬ββββββββ βββββββββ¬ββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
β β Evidence List β β
β β (deduplicated, scored, with citations) β β
β βββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ