Spaces:

MCP-1st-Birthday
/

DeepBoner

Running

# Rate limiting (lines 47-50)
from limits import parse
from limits.storage import MemoryStorage
from limits.strategies import MovingWindowRateLimiter

storage = MemoryStorage()
limiter = MovingWindowRateLimiter(storage)
rate_limit = parse("3/second")  # NCBI allows 3/sec without API key, 10/sec with

# Full-text via BioC format (lines 108-120)
def _get_fulltext(pmid: int) -> dict[str, Any] | None:
    pmid_url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
    # Returns structured JSON with full text for open-access papers

# Figure retrieval via Europe PMC (lines 123-149)
def _get_figures(pmcid: str) -> dict[str, str]:
    suppl_url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles"
    # Returns base64-encoded images from supplementary materials

Recommended Improvements

Phase 1: Rate Limiting (Critical)

# Add to src/tools/pubmed.py
from limits import parse
from limits.storage import MemoryStorage
from limits.strategies import MovingWindowRateLimiter

storage = MemoryStorage()
limiter = MovingWindowRateLimiter(storage)

# With NCBI_API_KEY: 10/sec, without: 3/sec
def get_rate_limit():
    if settings.ncbi_api_key:
        return parse("10/second")
    return parse("3/second")

Dependencies: pip install limits

Phase 2: Full-Text Retrieval

async def get_fulltext(pmid: str) -> str | None:
    """Get full text for open-access papers via BioC API."""
    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
    # Only works for PMC papers (open access)

Phase 3: PMC ID Resolution

async def get_pmc_id(pmid: str) -> str | None:
    """Convert PMID to PMCID for full-text access."""
    url = f"https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?ids={pmid}&format=json"

Python Libraries to Consider

Library	Purpose	Notes
Biopython	`Bio.Entrez` module	Official, well-maintained
PyMed	PubMed wrapper	Simpler API, less control
metapub	Full-featured	Tested on 1/3 of PubMed
limits	Rate limiting	Used by reference repo

API Endpoints Reference

Endpoint	Purpose	Rate Limit
`esearch.fcgi`	Search for PMIDs	3/sec (10 with key)
`efetch.fcgi`	Fetch metadata	3/sec (10 with key)
`esummary.fcgi`	Quick metadata	3/sec (10 with key)
`pmcoa.cgi/BioC_json`	Full text (PMC only)	Unknown
`idconv/v1.0`	PMID ↔ PMCID	Unknown

PubMed Tool: Current State & Future Improvements

Current Implementation

What We Have (src/tools/pubmed.py)

Current Limitations

Reference Implementation (DeepBoner Reference Repo)

Features We're Missing

Recommended Improvements

Phase 1: Rate Limiting (Critical)

Phase 2: Full-Text Retrieval

Phase 3: PMC ID Resolution

Python Libraries to Consider

API Endpoints Reference

Sources

What We Have (`src/tools/pubmed.py`)