File size: 3,506 Bytes
2ac49c3 92ee4ec 2ac49c3 f9576ce 2ac49c3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# SPEC-20: PubMed JSON Parsing Fix
**Status:** COMPLETED
**Priority:** P2 (Critical - causes crashes)
**Effort:** 15 minutes
**PR Scope:** Single file fix
---
## Problem Statement
The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens **outside** the try/except block.
**File:** `src/tools/pubmed.py:88`
**Crash Type:** `json.JSONDecodeError`
**User Impact:** Entire research workflow crashes
---
## Current Code (BROKEN)
```python
# src/tools/pubmed.py - Lines ~80-95
try:
search_resp = await client.get(
f"{NCBI_BASE_URL}/esearch.fcgi",
params=search_params,
)
search_resp.raise_for_status()
except httpx.HTTPStatusError as e:
logger.warning("PubMed search failed", status=e.response.status_code)
return []
# βββ THIS IS OUTSIDE THE TRY BLOCK βββ
search_data = search_resp.json() # CRASHES HERE on maintenance pages
pmids = search_data.get("esearchresult", {}).get("idlist", [])
```
---
## Required Fix
Move JSON parsing inside try/except and add `JSONDecodeError` handling:
```python
# src/tools/pubmed.py - Fixed version
try:
search_resp = await client.get(
f"{NCBI_BASE_URL}/esearch.fcgi",
params=search_params,
)
search_resp.raise_for_status()
search_data = search_resp.json() # β MOVED INSIDE TRY
except httpx.HTTPStatusError as e:
logger.warning("PubMed search failed", status=e.response.status_code)
return []
except json.JSONDecodeError as e:
logger.warning(
"PubMed returned invalid JSON (possible maintenance page)",
error=str(e),
response_preview=search_resp.text[:200] if search_resp else "N/A",
)
return []
pmids = search_data.get("esearchresult", {}).get("idlist", [])
```
---
## Implementation Checklist
- [x] Add `import json` at top of file (if not present)
- [x] Move `search_resp.json()` inside try block (line ~88)
- [x] Add `except json.JSONDecodeError` handler
- [x] Log warning with response preview for debugging
- [x] Return empty list (graceful degradation)
- [x] Write unit test: mock response with HTML content
- [x] Run `make check` (lint + typecheck + test)
---
## Test Case
```python
# tests/unit/tools/test_pubmed.py
import pytest
from unittest.mock import AsyncMock, MagicMock
from src.tools.pubmed import search_pubmed
@pytest.mark.asyncio
async def test_pubmed_handles_maintenance_page():
"""PubMed should gracefully handle non-JSON responses."""
# Mock httpx client returning HTML maintenance page
mock_response = MagicMock()
mock_response.status_code = 200
mock_response.text = "<html><body>Service Temporarily Unavailable</body></html>"
mock_response.json.side_effect = json.JSONDecodeError("Expecting value", "", 0)
mock_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get.return_value = mock_response
# Should return empty list, not crash
result = await search_pubmed("test query", client=mock_client)
assert result == []
```
---
## Acceptance Criteria
1. `search_pubmed()` returns `[]` when API returns HTML
2. Warning logged with response preview
3. No `JSONDecodeError` propagates to caller
4. All existing tests pass
5. `make check` passes
---
## Dependencies
None. This is a standalone fix.
---
## Notes
- This same pattern may exist in `clinicaltrials.py` and `europepmc.py` - check after this fix
- Do NOT over-engineer. Single fix, single PR.
|