Multi-Agent-Research-Paper-Analysis-System / MCP_FIX_DOCUMENTATION.md
GitHub Actions
Clean sync from GitHub - no large files in history
aca8ab4

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

MCP Download Issue - Fix Documentation

Problem Summary

The MCP arXiv client was experiencing an issue where the download_paper tool would complete successfully on the remote MCP server, but the downloaded PDF files would not appear in the client's local data/mcp_papers/ directory.

Root Cause

The issue stems from the client-server architecture of MCP (Model Context Protocol):

  1. MCP Server runs as a separate process (possibly remote)
  2. Server downloads PDFs to its own storage location
  3. Server returns {"status": "success"} without file path
  4. Client expects files in its local data/mcp_papers/ directory
  5. No file transfer mechanism exists between server and client storage

This is fundamentally a storage path mismatch between what the server uses and what the client expects.

Solution Implemented

1. Tool Discovery (Diagnostic)

Added automatic tool discovery when connecting to MCP server:

  • Lists all available MCP tools at session initialization
  • Logs tool names, descriptions, and schemas
  • Helps diagnose what capabilities the server provides

Location: utils/mcp_arxiv_client.py:88-112 (_discover_tools method)

2. Direct Download Fallback

Implemented a fallback mechanism that downloads PDFs directly from arXiv when MCP download fails:

  • Detects when MCP download completes but file is not accessible
  • Downloads PDF directly from https://arxiv.org/pdf/{paper_id}.pdf
  • Writes file to client's local storage directory
  • Maintains same retry logic and error handling

Location: utils/mcp_arxiv_client.py:114-152 (_download_from_arxiv_direct method)

3. Enhanced Error Handling

Updated download_paper_async to:

  • Try MCP download first (preserves existing functionality)
  • Check multiple possible file locations
  • Fall back to direct download if MCP fails
  • Provide detailed logging at each step

Location: utils/mcp_arxiv_client.py:462-479 (updated error handling)

How It Works Now

Download Flow

1. Check if file already exists locally β†’ Return if found
2. Call MCP server's download_paper tool
3. Check if file appeared in expected locations:
   a. Expected path: data/mcp_papers/{paper_id}.pdf
   b. MCP-returned path (if provided in response)
   c. Any file in storage matching paper_id
4. If file not found β†’ Fall back to direct arXiv download
5. Download PDF directly to client storage
6. Return path to downloaded file

Benefits

  • Zero breaking changes: Existing MCP functionality preserved
  • Automatic fallback: Works even with remote MCP servers
  • Better diagnostics: Tool discovery helps troubleshoot issues
  • Guaranteed downloads: Direct fallback ensures files are retrieved
  • Client-side storage: Files always accessible to client process

Using the Fix

Running the Application

No changes needed! The fix is automatic:

# Set environment variables (optional - defaults work)
export USE_MCP_ARXIV=true
export MCP_ARXIV_STORAGE_PATH=data/mcp_papers

# Run the application
python app.py

The system will:

  1. Try MCP download first
  2. Automatically fall back to direct download if needed
  3. Log which method succeeded

Running Diagnostics

Use the diagnostic script to test your MCP setup:

python test_mcp_diagnostic.py

This will:

  • Check environment configuration
  • Verify storage directory setup
  • List available MCP tools
  • Test search functionality
  • Test download with detailed logging
  • Show file system state before/after

Expected Output:

================================================================================
MCP arXiv Client Diagnostic Test
================================================================================

[1] Environment Configuration:
  USE_MCP_ARXIV: true
  MCP_ARXIV_STORAGE_PATH: data/mcp_papers

[2] Storage Directory:
  Path: /path/to/data/mcp_papers
  Exists: True
  Contains 0 PDF files

[3] Initializing MCP Client:
  βœ“ Client initialized successfully

[4] Testing Search Functionality:
  βœ“ Search successful, found 2 papers
  First paper: Attention Is All You Need...
  Paper ID: 1706.03762

[5] Testing Download Functionality:
  Attempting to download: 1706.03762
  PDF URL: https://arxiv.org/pdf/1706.03762.pdf
  βœ“ Download successful!
  File path: data/mcp_papers/1706.03762v7.pdf
  File exists: True
  File size: 2,215,520 bytes (2.11 MB)

[6] Storage Directory After Download:
  Contains 1 PDF files
  Files: ['1706.03762v7.pdf']

[7] Cleaning Up:
  βœ“ MCP session closed

================================================================================
Diagnostic Test Complete
================================================================================

Interpreting Logs

Successful MCP Download

If MCP server works correctly, you'll see:

2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Downloading paper 2203.08975v2 via MCP
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - MCP download_paper response type: <class 'dict'>
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Successfully downloaded paper to data/mcp_papers/2203.08975v2.pdf

Fallback to Direct Download

If MCP fails but direct download succeeds:

2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - File not found at expected path
2025-11-12 01:50:27 - utils.mcp_arxiv_client - ERROR - MCP download call completed but file not found
2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - Falling back to direct arXiv download...
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Attempting direct download from arXiv for 2203.08975v2
2025-11-12 01:50:28 - utils.mcp_arxiv_client - INFO - Successfully downloaded 1234567 bytes to data/mcp_papers/2203.08975v2.pdf

Tool Discovery

At session initialization:

2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - MCP server provides 3 tools:
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - search_papers: Search arXiv for papers
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - download_paper: Download paper PDF
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - list_papers: List cached papers

Troubleshooting

Issue: MCP server not found

Symptom: Error during initialization: command not found: arxiv-mcp-server

Solution:

  • Ensure MCP server is installed and in PATH
  • Check server configuration in your MCP settings
  • Try using direct ArxivClient instead: export USE_MCP_ARXIV=false

Issue: Files still not downloading

Symptom: Both MCP and direct download fail

Possible causes:

  1. Network connectivity issues
  2. arXiv API rate limiting
  3. Invalid paper IDs
  4. Storage directory permissions

Debugging steps:

# Check network connectivity
curl https://arxiv.org/pdf/1706.03762.pdf -o test.pdf

# Check storage permissions
ls -la data/mcp_papers/
touch data/mcp_papers/test.txt

# Run diagnostic script
python test_mcp_diagnostic.py

Issue: MCP server uses different storage path

Symptom: MCP downloads succeed but client can't find files

Current solution: Direct download fallback handles this automatically

Future enhancement: Could add file transfer mechanism if MCP provides retrieval tools

Technical Details

Architecture Decision: Why Fallback Instead of File Transfer?

We chose direct download fallback over implementing a file transfer mechanism because:

  1. Server is third-party: Cannot modify MCP server to add file retrieval tools
  2. Simpler implementation: Direct download is straightforward and reliable
  3. Better performance: Avoids two-step download (server β†’ client transfer)
  4. Same result: Client gets PDFs either way
  5. Fail-safe: Works even if MCP server is completely unavailable

Performance Impact

  • MCP successful: No performance change (same as before)
  • MCP fails: Extra ~2-5 seconds for direct download
  • Network overhead: Same (one download either way)
  • Storage: Client-side only (no redundant server storage)

Comparison with Direct ArxivClient

Feature MCPArxivClient (with fallback) Direct ArxivClient
Search via MCP βœ“ βœ—
Download via MCP Tries first βœ—
Direct download Fallback Primary
Remote MCP server βœ“ N/A
File storage Client-side Client-side
Reliability High (dual method) High

Future Enhancements

If MCP server capabilities expand, possible improvements:

  1. File retrieval tool: MCP server adds get_file(paper_id) tool
  2. Streaming transfer: MCP response includes base64-encoded PDF
  3. Shared storage: Configure MCP server to write to shared filesystem
  4. Batch downloads: Optimize multi-paper downloads

For now, the fallback solution provides robust, reliable downloads without requiring MCP server changes.

Files Modified

  1. utils/mcp_arxiv_client.py - Core client with fallback logic
  2. test_mcp_diagnostic.py - New diagnostic script
  3. MCP_FIX_DOCUMENTATION.md - This document

Testing

Run the test suite to verify the fix:

# Test MCP client
pytest tests/test_mcp_arxiv_client.py -v

# Run diagnostic
python test_mcp_diagnostic.py

# Full integration test
python app.py
# Then use the Gradio UI to analyze papers with MCP enabled

Summary

The fix ensures reliable PDF downloads by combining MCP capabilities with direct arXiv fallback:

  • βœ… Preserves MCP functionality for servers that work correctly
  • βœ… Automatic fallback when MCP fails or files aren't accessible
  • βœ… No configuration changes required
  • βœ… Better diagnostics via tool discovery
  • βœ… Comprehensive logging for troubleshooting
  • βœ… Zero breaking changes to existing code

The system now works reliably with remote MCP servers, local servers, or no MCP at all.