File size: 9,929 Bytes
aca8ab4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
# MCP Download Issue - Fix Documentation

## Problem Summary

The MCP arXiv client was experiencing an issue where the `download_paper` tool would complete successfully on the remote MCP server, but the downloaded PDF files would not appear in the client's local `data/mcp_papers/` directory.

### Root Cause

The issue stems from the **client-server architecture** of MCP (Model Context Protocol):

1. **MCP Server** runs as a separate process (possibly remote)
2. **Server downloads PDFs** to its own storage location
3. **Server returns** `{"status": "success"}` without file path
4. **Client expects files** in its local `data/mcp_papers/` directory
5. **No file transfer mechanism** exists between server and client storage

This is fundamentally a **storage path mismatch** between what the server uses and what the client expects.

## Solution Implemented

### 1. Tool Discovery (Diagnostic)

Added automatic tool discovery when connecting to MCP server:
- Lists all available MCP tools at session initialization
- Logs tool names, descriptions, and schemas
- Helps diagnose what capabilities the server provides

**Location:** `utils/mcp_arxiv_client.py:88-112` (`_discover_tools` method)

### 2. Direct Download Fallback

Implemented a fallback mechanism that downloads PDFs directly from arXiv when MCP download fails:
- Detects when MCP download completes but file is not accessible
- Downloads PDF directly from `https://arxiv.org/pdf/{paper_id}.pdf`
- Writes file to client's local storage directory
- Maintains same retry logic and error handling

**Location:** `utils/mcp_arxiv_client.py:114-152` (`_download_from_arxiv_direct` method)

### 3. Enhanced Error Handling

Updated `download_paper_async` to:
- Try MCP download first (preserves existing functionality)
- Check multiple possible file locations
- Fall back to direct download if MCP fails
- Provide detailed logging at each step

**Location:** `utils/mcp_arxiv_client.py:462-479` (updated error handling)

## How It Works Now

### Download Flow

```
1. Check if file already exists locally β†’ Return if found
2. Call MCP server's download_paper tool
3. Check if file appeared in expected locations:
   a. Expected path: data/mcp_papers/{paper_id}.pdf
   b. MCP-returned path (if provided in response)
   c. Any file in storage matching paper_id
4. If file not found β†’ Fall back to direct arXiv download
5. Download PDF directly to client storage
6. Return path to downloaded file
```

### Benefits

- **Zero breaking changes**: Existing MCP functionality preserved
- **Automatic fallback**: Works even with remote MCP servers
- **Better diagnostics**: Tool discovery helps troubleshoot issues
- **Guaranteed downloads**: Direct fallback ensures files are retrieved
- **Client-side storage**: Files always accessible to client process

## Using the Fix

### Running the Application

No changes needed! The fix is automatic:

```bash
# Set environment variables (optional - defaults work)
export USE_MCP_ARXIV=true
export MCP_ARXIV_STORAGE_PATH=data/mcp_papers

# Run the application
python app.py
```

The system will:
1. Try MCP download first
2. Automatically fall back to direct download if needed
3. Log which method succeeded

### Running Diagnostics

Use the diagnostic script to test your MCP setup:

```bash
python test_mcp_diagnostic.py
```

This will:
- Check environment configuration
- Verify storage directory setup
- List available MCP tools
- Test search functionality
- Test download with detailed logging
- Show file system state before/after

**Expected Output:**

```
================================================================================
MCP arXiv Client Diagnostic Test
================================================================================

[1] Environment Configuration:
  USE_MCP_ARXIV: true
  MCP_ARXIV_STORAGE_PATH: data/mcp_papers

[2] Storage Directory:
  Path: /path/to/data/mcp_papers
  Exists: True
  Contains 0 PDF files

[3] Initializing MCP Client:
  βœ“ Client initialized successfully

[4] Testing Search Functionality:
  βœ“ Search successful, found 2 papers
  First paper: Attention Is All You Need...
  Paper ID: 1706.03762

[5] Testing Download Functionality:
  Attempting to download: 1706.03762
  PDF URL: https://arxiv.org/pdf/1706.03762.pdf
  βœ“ Download successful!
  File path: data/mcp_papers/1706.03762v7.pdf
  File exists: True
  File size: 2,215,520 bytes (2.11 MB)

[6] Storage Directory After Download:
  Contains 1 PDF files
  Files: ['1706.03762v7.pdf']

[7] Cleaning Up:
  βœ“ MCP session closed

================================================================================
Diagnostic Test Complete
================================================================================
```

## Interpreting Logs

### Successful MCP Download

If MCP server works correctly, you'll see:

```
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Downloading paper 2203.08975v2 via MCP
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - MCP download_paper response type: <class 'dict'>
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Successfully downloaded paper to data/mcp_papers/2203.08975v2.pdf
```

### Fallback to Direct Download

If MCP fails but direct download succeeds:

```
2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - File not found at expected path
2025-11-12 01:50:27 - utils.mcp_arxiv_client - ERROR - MCP download call completed but file not found
2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - Falling back to direct arXiv download...
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Attempting direct download from arXiv for 2203.08975v2
2025-11-12 01:50:28 - utils.mcp_arxiv_client - INFO - Successfully downloaded 1234567 bytes to data/mcp_papers/2203.08975v2.pdf
```

### Tool Discovery

At session initialization:

```
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - MCP server provides 3 tools:
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - search_papers: Search arXiv for papers
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - download_paper: Download paper PDF
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - list_papers: List cached papers
```

## Troubleshooting

### Issue: MCP server not found

**Symptom:** Error during initialization: `command not found: arxiv-mcp-server`

**Solution:**
- Ensure MCP server is installed and in PATH
- Check server configuration in your MCP settings
- Try using direct ArxivClient instead: `export USE_MCP_ARXIV=false`

### Issue: Files still not downloading

**Symptom:** Both MCP and direct download fail

**Possible causes:**
1. Network connectivity issues
2. arXiv API rate limiting
3. Invalid paper IDs
4. Storage directory permissions

**Debugging steps:**
```bash
# Check network connectivity
curl https://arxiv.org/pdf/1706.03762.pdf -o test.pdf

# Check storage permissions
ls -la data/mcp_papers/
touch data/mcp_papers/test.txt

# Run diagnostic script
python test_mcp_diagnostic.py
```

### Issue: MCP server uses different storage path

**Symptom:** MCP downloads succeed but client can't find files

**Current solution:** Direct download fallback handles this automatically

**Future enhancement:** Could add file transfer mechanism if MCP provides retrieval tools

## Technical Details

### Architecture Decision: Why Fallback Instead of File Transfer?

We chose direct download fallback over implementing a file transfer mechanism because:

1. **Server is third-party**: Cannot modify MCP server to add file retrieval tools
2. **Simpler implementation**: Direct download is straightforward and reliable
3. **Better performance**: Avoids two-step download (server β†’ client transfer)
4. **Same result**: Client gets PDFs either way
5. **Fail-safe**: Works even if MCP server is completely unavailable

### Performance Impact

- **MCP successful**: No performance change (same as before)
- **MCP fails**: Extra ~2-5 seconds for direct download
- **Network overhead**: Same (one download either way)
- **Storage**: Client-side only (no redundant server storage)

### Comparison with Direct ArxivClient

| Feature | MCPArxivClient (with fallback) | Direct ArxivClient |
|---------|-------------------------------|-------------------|
| Search via MCP | βœ“ | βœ— |
| Download via MCP | Tries first | βœ— |
| Direct download | Fallback | Primary |
| Remote MCP server | βœ“ | N/A |
| File storage | Client-side | Client-side |
| Reliability | High (dual method) | High |

## Future Enhancements

If MCP server capabilities expand, possible improvements:

1. **File retrieval tool**: MCP server adds `get_file(paper_id)` tool
2. **Streaming transfer**: MCP response includes base64-encoded PDF
3. **Shared storage**: Configure MCP server to write to shared filesystem
4. **Batch downloads**: Optimize multi-paper downloads

For now, the fallback solution provides robust, reliable downloads without requiring MCP server changes.

## Files Modified

1. `utils/mcp_arxiv_client.py` - Core client with fallback logic
2. `test_mcp_diagnostic.py` - New diagnostic script
3. `MCP_FIX_DOCUMENTATION.md` - This document

## Testing

Run the test suite to verify the fix:

```bash
# Test MCP client
pytest tests/test_mcp_arxiv_client.py -v

# Run diagnostic
python test_mcp_diagnostic.py

# Full integration test
python app.py
# Then use the Gradio UI to analyze papers with MCP enabled
```

## Summary

The fix ensures **reliable PDF downloads** by combining MCP capabilities with direct arXiv fallback:

- βœ… **Preserves MCP functionality** for servers that work correctly
- βœ… **Automatic fallback** when MCP fails or files aren't accessible
- βœ… **No configuration changes** required
- βœ… **Better diagnostics** via tool discovery
- βœ… **Comprehensive logging** for troubleshooting
- βœ… **Zero breaking changes** to existing code

The system now works reliably with **remote MCP servers**, **local servers**, or **no MCP at all**.