| # MCP Download Issue - Fix Documentation | |
| ## Problem Summary | |
| The MCP arXiv client was experiencing an issue where the `download_paper` tool would complete successfully on the remote MCP server, but the downloaded PDF files would not appear in the client's local `data/mcp_papers/` directory. | |
| ### Root Cause | |
| The issue stems from the **client-server architecture** of MCP (Model Context Protocol): | |
| 1. **MCP Server** runs as a separate process (possibly remote) | |
| 2. **Server downloads PDFs** to its own storage location | |
| 3. **Server returns** `{"status": "success"}` without file path | |
| 4. **Client expects files** in its local `data/mcp_papers/` directory | |
| 5. **No file transfer mechanism** exists between server and client storage | |
| This is fundamentally a **storage path mismatch** between what the server uses and what the client expects. | |
| ## Solution Implemented | |
| ### 1. Tool Discovery (Diagnostic) | |
| Added automatic tool discovery when connecting to MCP server: | |
| - Lists all available MCP tools at session initialization | |
| - Logs tool names, descriptions, and schemas | |
| - Helps diagnose what capabilities the server provides | |
| **Location:** `utils/mcp_arxiv_client.py:88-112` (`_discover_tools` method) | |
| ### 2. Direct Download Fallback | |
| Implemented a fallback mechanism that downloads PDFs directly from arXiv when MCP download fails: | |
| - Detects when MCP download completes but file is not accessible | |
| - Downloads PDF directly from `https://arxiv.org/pdf/{paper_id}.pdf` | |
| - Writes file to client's local storage directory | |
| - Maintains same retry logic and error handling | |
| **Location:** `utils/mcp_arxiv_client.py:114-152` (`_download_from_arxiv_direct` method) | |
| ### 3. Enhanced Error Handling | |
| Updated `download_paper_async` to: | |
| - Try MCP download first (preserves existing functionality) | |
| - Check multiple possible file locations | |
| - Fall back to direct download if MCP fails | |
| - Provide detailed logging at each step | |
| **Location:** `utils/mcp_arxiv_client.py:462-479` (updated error handling) | |
| ## How It Works Now | |
| ### Download Flow | |
| ``` | |
| 1. Check if file already exists locally β Return if found | |
| 2. Call MCP server's download_paper tool | |
| 3. Check if file appeared in expected locations: | |
| a. Expected path: data/mcp_papers/{paper_id}.pdf | |
| b. MCP-returned path (if provided in response) | |
| c. Any file in storage matching paper_id | |
| 4. If file not found β Fall back to direct arXiv download | |
| 5. Download PDF directly to client storage | |
| 6. Return path to downloaded file | |
| ``` | |
| ### Benefits | |
| - **Zero breaking changes**: Existing MCP functionality preserved | |
| - **Automatic fallback**: Works even with remote MCP servers | |
| - **Better diagnostics**: Tool discovery helps troubleshoot issues | |
| - **Guaranteed downloads**: Direct fallback ensures files are retrieved | |
| - **Client-side storage**: Files always accessible to client process | |
| ## Using the Fix | |
| ### Running the Application | |
| No changes needed! The fix is automatic: | |
| ```bash | |
| # Set environment variables (optional - defaults work) | |
| export USE_MCP_ARXIV=true | |
| export MCP_ARXIV_STORAGE_PATH=data/mcp_papers | |
| # Run the application | |
| python app.py | |
| ``` | |
| The system will: | |
| 1. Try MCP download first | |
| 2. Automatically fall back to direct download if needed | |
| 3. Log which method succeeded | |
| ### Running Diagnostics | |
| Use the diagnostic script to test your MCP setup: | |
| ```bash | |
| python test_mcp_diagnostic.py | |
| ``` | |
| This will: | |
| - Check environment configuration | |
| - Verify storage directory setup | |
| - List available MCP tools | |
| - Test search functionality | |
| - Test download with detailed logging | |
| - Show file system state before/after | |
| **Expected Output:** | |
| ``` | |
| ================================================================================ | |
| MCP arXiv Client Diagnostic Test | |
| ================================================================================ | |
| [1] Environment Configuration: | |
| USE_MCP_ARXIV: true | |
| MCP_ARXIV_STORAGE_PATH: data/mcp_papers | |
| [2] Storage Directory: | |
| Path: /path/to/data/mcp_papers | |
| Exists: True | |
| Contains 0 PDF files | |
| [3] Initializing MCP Client: | |
| β Client initialized successfully | |
| [4] Testing Search Functionality: | |
| β Search successful, found 2 papers | |
| First paper: Attention Is All You Need... | |
| Paper ID: 1706.03762 | |
| [5] Testing Download Functionality: | |
| Attempting to download: 1706.03762 | |
| PDF URL: https://arxiv.org/pdf/1706.03762.pdf | |
| β Download successful! | |
| File path: data/mcp_papers/1706.03762v7.pdf | |
| File exists: True | |
| File size: 2,215,520 bytes (2.11 MB) | |
| [6] Storage Directory After Download: | |
| Contains 1 PDF files | |
| Files: ['1706.03762v7.pdf'] | |
| [7] Cleaning Up: | |
| β MCP session closed | |
| ================================================================================ | |
| Diagnostic Test Complete | |
| ================================================================================ | |
| ``` | |
| ## Interpreting Logs | |
| ### Successful MCP Download | |
| If MCP server works correctly, you'll see: | |
| ``` | |
| 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Downloading paper 2203.08975v2 via MCP | |
| 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - MCP download_paper response type: <class 'dict'> | |
| 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Successfully downloaded paper to data/mcp_papers/2203.08975v2.pdf | |
| ``` | |
| ### Fallback to Direct Download | |
| If MCP fails but direct download succeeds: | |
| ``` | |
| 2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - File not found at expected path | |
| 2025-11-12 01:50:27 - utils.mcp_arxiv_client - ERROR - MCP download call completed but file not found | |
| 2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - Falling back to direct arXiv download... | |
| 2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Attempting direct download from arXiv for 2203.08975v2 | |
| 2025-11-12 01:50:28 - utils.mcp_arxiv_client - INFO - Successfully downloaded 1234567 bytes to data/mcp_papers/2203.08975v2.pdf | |
| ``` | |
| ### Tool Discovery | |
| At session initialization: | |
| ``` | |
| 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - MCP server provides 3 tools: | |
| 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - - search_papers: Search arXiv for papers | |
| 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - - download_paper: Download paper PDF | |
| 2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - - list_papers: List cached papers | |
| ``` | |
| ## Troubleshooting | |
| ### Issue: MCP server not found | |
| **Symptom:** Error during initialization: `command not found: arxiv-mcp-server` | |
| **Solution:** | |
| - Ensure MCP server is installed and in PATH | |
| - Check server configuration in your MCP settings | |
| - Try using direct ArxivClient instead: `export USE_MCP_ARXIV=false` | |
| ### Issue: Files still not downloading | |
| **Symptom:** Both MCP and direct download fail | |
| **Possible causes:** | |
| 1. Network connectivity issues | |
| 2. arXiv API rate limiting | |
| 3. Invalid paper IDs | |
| 4. Storage directory permissions | |
| **Debugging steps:** | |
| ```bash | |
| # Check network connectivity | |
| curl https://arxiv.org/pdf/1706.03762.pdf -o test.pdf | |
| # Check storage permissions | |
| ls -la data/mcp_papers/ | |
| touch data/mcp_papers/test.txt | |
| # Run diagnostic script | |
| python test_mcp_diagnostic.py | |
| ``` | |
| ### Issue: MCP server uses different storage path | |
| **Symptom:** MCP downloads succeed but client can't find files | |
| **Current solution:** Direct download fallback handles this automatically | |
| **Future enhancement:** Could add file transfer mechanism if MCP provides retrieval tools | |
| ## Technical Details | |
| ### Architecture Decision: Why Fallback Instead of File Transfer? | |
| We chose direct download fallback over implementing a file transfer mechanism because: | |
| 1. **Server is third-party**: Cannot modify MCP server to add file retrieval tools | |
| 2. **Simpler implementation**: Direct download is straightforward and reliable | |
| 3. **Better performance**: Avoids two-step download (server β client transfer) | |
| 4. **Same result**: Client gets PDFs either way | |
| 5. **Fail-safe**: Works even if MCP server is completely unavailable | |
| ### Performance Impact | |
| - **MCP successful**: No performance change (same as before) | |
| - **MCP fails**: Extra ~2-5 seconds for direct download | |
| - **Network overhead**: Same (one download either way) | |
| - **Storage**: Client-side only (no redundant server storage) | |
| ### Comparison with Direct ArxivClient | |
| | Feature | MCPArxivClient (with fallback) | Direct ArxivClient | | |
| |---------|-------------------------------|-------------------| | |
| | Search via MCP | β | β | | |
| | Download via MCP | Tries first | β | | |
| | Direct download | Fallback | Primary | | |
| | Remote MCP server | β | N/A | | |
| | File storage | Client-side | Client-side | | |
| | Reliability | High (dual method) | High | | |
| ## Future Enhancements | |
| If MCP server capabilities expand, possible improvements: | |
| 1. **File retrieval tool**: MCP server adds `get_file(paper_id)` tool | |
| 2. **Streaming transfer**: MCP response includes base64-encoded PDF | |
| 3. **Shared storage**: Configure MCP server to write to shared filesystem | |
| 4. **Batch downloads**: Optimize multi-paper downloads | |
| For now, the fallback solution provides robust, reliable downloads without requiring MCP server changes. | |
| ## Files Modified | |
| 1. `utils/mcp_arxiv_client.py` - Core client with fallback logic | |
| 2. `test_mcp_diagnostic.py` - New diagnostic script | |
| 3. `MCP_FIX_DOCUMENTATION.md` - This document | |
| ## Testing | |
| Run the test suite to verify the fix: | |
| ```bash | |
| # Test MCP client | |
| pytest tests/test_mcp_arxiv_client.py -v | |
| # Run diagnostic | |
| python test_mcp_diagnostic.py | |
| # Full integration test | |
| python app.py | |
| # Then use the Gradio UI to analyze papers with MCP enabled | |
| ``` | |
| ## Summary | |
| The fix ensures **reliable PDF downloads** by combining MCP capabilities with direct arXiv fallback: | |
| - β **Preserves MCP functionality** for servers that work correctly | |
| - β **Automatic fallback** when MCP fails or files aren't accessible | |
| - β **No configuration changes** required | |
| - β **Better diagnostics** via tool discovery | |
| - β **Comprehensive logging** for troubleshooting | |
| - β **Zero breaking changes** to existing code | |
| The system now works reliably with **remote MCP servers**, **local servers**, or **no MCP at all**. | |