Spaces:

samir72
/

Multi-Agent-Research-Paper-Analysis-System

Sleeping

App Files Files Community

Multi-Agent-Research-Paper-Analysis-System / MCP_FIX_DOCUMENTATION.md

GitHub Actions

Clean sync from GitHub - no large files in history

aca8ab4 about 1 month ago

preview code

raw

history blame contribute delete

9.93 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

MCP Download Issue - Fix Documentation

Problem Summary

The MCP arXiv client was experiencing an issue where the download_paper tool would complete successfully on the remote MCP server, but the downloaded PDF files would not appear in the client's local data/mcp_papers/ directory.

Root Cause

The issue stems from the client-server architecture of MCP (Model Context Protocol):

MCP Server runs as a separate process (possibly remote)
Server downloads PDFs to its own storage location
Server returns {"status": "success"} without file path
Client expects files in its local data/mcp_papers/ directory
No file transfer mechanism exists between server and client storage

This is fundamentally a storage path mismatch between what the server uses and what the client expects.

Solution Implemented

1. Tool Discovery (Diagnostic)

Added automatic tool discovery when connecting to MCP server:

Lists all available MCP tools at session initialization
Logs tool names, descriptions, and schemas
Helps diagnose what capabilities the server provides

Location: utils/mcp_arxiv_client.py:88-112 (_discover_tools method)

2. Direct Download Fallback

Implemented a fallback mechanism that downloads PDFs directly from arXiv when MCP download fails:

Detects when MCP download completes but file is not accessible
Downloads PDF directly from https://arxiv.org/pdf/{paper_id}.pdf
Writes file to client's local storage directory
Maintains same retry logic and error handling

Location: utils/mcp_arxiv_client.py:114-152 (_download_from_arxiv_direct method)

3. Enhanced Error Handling

Updated download_paper_async to:

Try MCP download first (preserves existing functionality)
Check multiple possible file locations
Fall back to direct download if MCP fails
Provide detailed logging at each step

Location: utils/mcp_arxiv_client.py:462-479 (updated error handling)

How It Works Now

Download Flow

1. Check if file already exists locally → Return if found
2. Call MCP server's download_paper tool
3. Check if file appeared in expected locations:
   a. Expected path: data/mcp_papers/{paper_id}.pdf
   b. MCP-returned path (if provided in response)
   c. Any file in storage matching paper_id
4. If file not found → Fall back to direct arXiv download
5. Download PDF directly to client storage
6. Return path to downloaded file

Benefits

Zero breaking changes: Existing MCP functionality preserved
Automatic fallback: Works even with remote MCP servers
Better diagnostics: Tool discovery helps troubleshoot issues
Guaranteed downloads: Direct fallback ensures files are retrieved
Client-side storage: Files always accessible to client process

Using the Fix

Running the Application

No changes needed! The fix is automatic:

# Set environment variables (optional - defaults work)
export USE_MCP_ARXIV=true
export MCP_ARXIV_STORAGE_PATH=data/mcp_papers

# Run the application
python app.py

The system will:

Try MCP download first
Automatically fall back to direct download if needed
Log which method succeeded

Running Diagnostics

Use the diagnostic script to test your MCP setup:

python test_mcp_diagnostic.py

This will:

Check environment configuration
Verify storage directory setup
List available MCP tools
Test search functionality
Test download with detailed logging
Show file system state before/after

Expected Output:

================================================================================
MCP arXiv Client Diagnostic Test
================================================================================

[1] Environment Configuration:
  USE_MCP_ARXIV: true
  MCP_ARXIV_STORAGE_PATH: data/mcp_papers

[2] Storage Directory:
  Path: /path/to/data/mcp_papers
  Exists: True
  Contains 0 PDF files

[3] Initializing MCP Client:
  ✓ Client initialized successfully

[4] Testing Search Functionality:
  ✓ Search successful, found 2 papers
  First paper: Attention Is All You Need...
  Paper ID: 1706.03762

[5] Testing Download Functionality:
  Attempting to download: 1706.03762
  PDF URL: https://arxiv.org/pdf/1706.03762.pdf
  ✓ Download successful!
  File path: data/mcp_papers/1706.03762v7.pdf
  File exists: True
  File size: 2,215,520 bytes (2.11 MB)

[6] Storage Directory After Download:
  Contains 1 PDF files
  Files: ['1706.03762v7.pdf']

[7] Cleaning Up:
  ✓ MCP session closed

================================================================================
Diagnostic Test Complete
================================================================================

Interpreting Logs

Successful MCP Download

If MCP server works correctly, you'll see:

2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Downloading paper 2203.08975v2 via MCP
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - MCP download_paper response type: <class 'dict'>
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Successfully downloaded paper to data/mcp_papers/2203.08975v2.pdf

Fallback to Direct Download

If MCP fails but direct download succeeds:

2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - File not found at expected path
2025-11-12 01:50:27 - utils.mcp_arxiv_client - ERROR - MCP download call completed but file not found
2025-11-12 01:50:27 - utils.mcp_arxiv_client - WARNING - Falling back to direct arXiv download...
2025-11-12 01:50:27 - utils.mcp_arxiv_client - INFO - Attempting direct download from arXiv for 2203.08975v2
2025-11-12 01:50:28 - utils.mcp_arxiv_client - INFO - Successfully downloaded 1234567 bytes to data/mcp_papers/2203.08975v2.pdf

Tool Discovery

At session initialization:

2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO - MCP server provides 3 tools:
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - search_papers: Search arXiv for papers
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - download_paper: Download paper PDF
2025-11-12 01:50:26 - utils.mcp_arxiv_client - INFO -   - list_papers: List cached papers

Troubleshooting

Issue: MCP server not found

Symptom: Error during initialization: command not found: arxiv-mcp-server

Solution:

Ensure MCP server is installed and in PATH
Check server configuration in your MCP settings
Try using direct ArxivClient instead: export USE_MCP_ARXIV=false

Issue: Files still not downloading

Symptom: Both MCP and direct download fail

Possible causes:

Network connectivity issues
arXiv API rate limiting
Invalid paper IDs
Storage directory permissions

Debugging steps:

# Check network connectivity
curl https://arxiv.org/pdf/1706.03762.pdf -o test.pdf

# Check storage permissions
ls -la data/mcp_papers/
touch data/mcp_papers/test.txt

# Run diagnostic script
python test_mcp_diagnostic.py

Issue: MCP server uses different storage path

Symptom: MCP downloads succeed but client can't find files

Current solution: Direct download fallback handles this automatically

Future enhancement: Could add file transfer mechanism if MCP provides retrieval tools

Technical Details

Architecture Decision: Why Fallback Instead of File Transfer?

We chose direct download fallback over implementing a file transfer mechanism because:

Server is third-party: Cannot modify MCP server to add file retrieval tools
Simpler implementation: Direct download is straightforward and reliable
Better performance: Avoids two-step download (server → client transfer)
Same result: Client gets PDFs either way
Fail-safe: Works even if MCP server is completely unavailable

Performance Impact

MCP successful: No performance change (same as before)
MCP fails: Extra ~2-5 seconds for direct download
Network overhead: Same (one download either way)
Storage: Client-side only (no redundant server storage)

Comparison with Direct ArxivClient

Feature	MCPArxivClient (with fallback)	Direct ArxivClient
Search via MCP	✓	✗
Download via MCP	Tries first	✗
Direct download	Fallback	Primary
Remote MCP server	✓	N/A
File storage	Client-side	Client-side
Reliability	High (dual method)	High

Future Enhancements

If MCP server capabilities expand, possible improvements:

File retrieval tool: MCP server adds get_file(paper_id) tool
Streaming transfer: MCP response includes base64-encoded PDF
Shared storage: Configure MCP server to write to shared filesystem
Batch downloads: Optimize multi-paper downloads

For now, the fallback solution provides robust, reliable downloads without requiring MCP server changes.

Files Modified

utils/mcp_arxiv_client.py - Core client with fallback logic
test_mcp_diagnostic.py - New diagnostic script
MCP_FIX_DOCUMENTATION.md - This document

Testing

Run the test suite to verify the fix:

# Test MCP client
pytest tests/test_mcp_arxiv_client.py -v

# Run diagnostic
python test_mcp_diagnostic.py

# Full integration test
python app.py
# Then use the Gradio UI to analyze papers with MCP enabled

Summary

The fix ensures reliable PDF downloads by combining MCP capabilities with direct arXiv fallback:

✅ Preserves MCP functionality for servers that work correctly
✅ Automatic fallback when MCP fails or files aren't accessible
✅ No configuration changes required
✅ Better diagnostics via tool discovery
✅ Comprehensive logging for troubleshooting
✅ Zero breaking changes to existing code

The system now works reliably with remote MCP servers, local servers, or no MCP at all.