File size: 3,506 Bytes
2ac49c3
 
92ee4ec
2ac49c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9576ce
 
 
 
 
 
 
2ac49c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# SPEC-20: PubMed JSON Parsing Fix

**Status:** COMPLETED
**Priority:** P2 (Critical - causes crashes)
**Effort:** 15 minutes
**PR Scope:** Single file fix

---

## Problem Statement

The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens **outside** the try/except block.

**File:** `src/tools/pubmed.py:88`
**Crash Type:** `json.JSONDecodeError`
**User Impact:** Entire research workflow crashes

---

## Current Code (BROKEN)

```python
# src/tools/pubmed.py - Lines ~80-95
try:
    search_resp = await client.get(
        f"{NCBI_BASE_URL}/esearch.fcgi",
        params=search_params,
    )
    search_resp.raise_for_status()
except httpx.HTTPStatusError as e:
    logger.warning("PubMed search failed", status=e.response.status_code)
    return []

# ↓↓↓ THIS IS OUTSIDE THE TRY BLOCK ↓↓↓
search_data = search_resp.json()  # CRASHES HERE on maintenance pages
pmids = search_data.get("esearchresult", {}).get("idlist", [])
```

---

## Required Fix

Move JSON parsing inside try/except and add `JSONDecodeError` handling:

```python
# src/tools/pubmed.py - Fixed version
try:
    search_resp = await client.get(
        f"{NCBI_BASE_URL}/esearch.fcgi",
        params=search_params,
    )
    search_resp.raise_for_status()
    search_data = search_resp.json()  # ← MOVED INSIDE TRY
except httpx.HTTPStatusError as e:
    logger.warning("PubMed search failed", status=e.response.status_code)
    return []
except json.JSONDecodeError as e:
    logger.warning(
        "PubMed returned invalid JSON (possible maintenance page)",
        error=str(e),
        response_preview=search_resp.text[:200] if search_resp else "N/A",
    )
    return []

pmids = search_data.get("esearchresult", {}).get("idlist", [])
```

---

## Implementation Checklist

- [x] Add `import json` at top of file (if not present)
- [x] Move `search_resp.json()` inside try block (line ~88)
- [x] Add `except json.JSONDecodeError` handler
- [x] Log warning with response preview for debugging
- [x] Return empty list (graceful degradation)
- [x] Write unit test: mock response with HTML content
- [x] Run `make check` (lint + typecheck + test)

---

## Test Case

```python
# tests/unit/tools/test_pubmed.py
import pytest
from unittest.mock import AsyncMock, MagicMock
from src.tools.pubmed import search_pubmed

@pytest.mark.asyncio
async def test_pubmed_handles_maintenance_page():
    """PubMed should gracefully handle non-JSON responses."""
    # Mock httpx client returning HTML maintenance page
    mock_response = MagicMock()
    mock_response.status_code = 200
    mock_response.text = "<html><body>Service Temporarily Unavailable</body></html>"
    mock_response.json.side_effect = json.JSONDecodeError("Expecting value", "", 0)
    mock_response.raise_for_status = MagicMock()

    mock_client = AsyncMock()
    mock_client.get.return_value = mock_response

    # Should return empty list, not crash
    result = await search_pubmed("test query", client=mock_client)
    assert result == []
```

---

## Acceptance Criteria

1. `search_pubmed()` returns `[]` when API returns HTML
2. Warning logged with response preview
3. No `JSONDecodeError` propagates to caller
4. All existing tests pass
5. `make check` passes

---

## Dependencies

None. This is a standalone fix.

---

## Notes

- This same pattern may exist in `clinicaltrials.py` and `europepmc.py` - check after this fix
- Do NOT over-engineer. Single fix, single PR.