|
|
--- |
|
|
title: Research Paper Analyzer |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 6.0.2 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Multi-Agent Research Paper Analysis System |
|
|
|
|
|
[](https://www.python.org/downloads/) |
|
|
[](https://opensource.org/licenses/MIT) |
|
|
[](https://gradio.app/) |
|
|
[](https://azure.microsoft.com/en-us/products/ai-services/openai-service) |
|
|
[](https://github.com/samir72/Multi-Agent-Research-Paper-Analysis-System/actions/workflows/sync-to-hf-space.yml) |
|
|
|
|
|
A production-ready multi-agent system that analyzes academic papers from arXiv, extracts insights, synthesizes findings across papers, and provides deterministic, citation-backed responses to research questions. |
|
|
|
|
|
**π Quick Start**: See [QUICKSTART.md](QUICKSTART.md) for a 5-minute setup guide. |
|
|
|
|
|
## Table of Contents |
|
|
|
|
|
- [Features](#features) |
|
|
- [Architecture](#architecture) |
|
|
- [Technical Stack](#technical-stack) |
|
|
- [Installation](#installation) |
|
|
- [Usage](#usage) |
|
|
- [Project Structure](#project-structure) |
|
|
- [Key Features](#key-features) |
|
|
- [Testing](#testing) |
|
|
- [Performance](#performance) |
|
|
- [Deployment](#deployment) |
|
|
- [GitHub Actions - Automated Deployment](#github-actions---automated-deployment) |
|
|
- [Hugging Face Spaces](#hugging-face-spaces-manual-deployment) |
|
|
- [Local Docker](#local-docker) |
|
|
- [Programmatic Usage](#programmatic-usage) |
|
|
- [Contributing](#contributing) |
|
|
- [Support](#support) |
|
|
- [Changelog](#changelog) |
|
|
|
|
|
## Features |
|
|
|
|
|
- **Automated Paper Retrieval**: Search and download papers from arXiv (direct API or MCP server) |
|
|
- **RAG-Based Analysis**: Extract methodology, findings, conclusions, and limitations using retrieval-augmented generation |
|
|
- **Cross-Paper Synthesis**: Identify consensus points, contradictions, and research gaps |
|
|
- **Citation Management**: Generate proper APA-style citations with source validation |
|
|
- **LangGraph Orchestration**: Professional workflow management with conditional routing and checkpointing |
|
|
- **LangFuse Observability**: Automatic tracing of all agents, LLM calls, and RAG operations with performance analytics |
|
|
- **Semantic Caching**: Optimize costs by caching similar queries |
|
|
- **Deterministic Outputs**: Temperature=0 and structured outputs for reproducibility |
|
|
- **FastMCP Integration**: Auto-start MCP server with intelligent cascading fallback (MCP β Direct API) |
|
|
- **Robust Data Validation**: Multi-layer validation prevents pipeline failures from malformed data |
|
|
- **High Performance**: 4x faster with parallel processing (2-3 min for 5 papers) |
|
|
- **Smart Error Handling**: Circuit breaker, graceful degradation, friendly error messages |
|
|
- **Progressive UI**: Real-time updates as papers are analyzed with streaming results |
|
|
- **Smart Quality Filtering**: Automatically excludes failed analyses (0% confidence) from synthesis |
|
|
- **Enhanced UX**: Clickable PDF links, paper titles + confidence scores, status indicators |
|
|
- **Comprehensive Testing**: 96 total tests (24 analyzer + 21 legacy MCP + 38 FastMCP + 15 schema validators) with diagnostic tools |
|
|
- **Performance Analytics**: Track latency, token usage, costs, and error rates across all agents |
|
|
|
|
|
## Architecture |
|
|
|
|
|
### Agent Workflow |
|
|
|
|
|
**LangGraph Orchestration (v2.6):** |
|
|
``` |
|
|
User Query β Retriever β [Has papers?] |
|
|
ββ Yes β Analyzer (parallel 4x, streaming) β Filter (0% confidence) β Synthesis β Citation β User |
|
|
ββ No β END (graceful error) |
|
|
β |
|
|
[LangFuse Tracing for All Nodes] |
|
|
``` |
|
|
|
|
|
**Key Features:** |
|
|
- **LangGraph Workflow**: Conditional routing, automatic checkpointing with `MemorySaver` |
|
|
- **LangFuse Observability**: Automatic tracing of all agents, LLM calls, and RAG operations |
|
|
- **Progressive Streaming**: Real-time UI updates using Python generators |
|
|
- **Parallel Execution**: 4 papers analyzed concurrently with live status |
|
|
- **Smart Filtering**: Removes failed analyses (0% confidence) before synthesis |
|
|
- **Circuit Breaker**: Auto-stops after 2 consecutive failures |
|
|
- **Status Tracking**: βΈοΈ Pending β β³ Analyzing β β
Complete / β οΈ Failed |
|
|
- **Performance Analytics**: Track latency, tokens, costs, error rates per agent |
|
|
|
|
|
### 4 Specialized Agents |
|
|
|
|
|
1. **Retriever Agent** |
|
|
- Queries arXiv API based on user input |
|
|
- Downloads and parses PDF papers |
|
|
- Extracts metadata (title, authors, abstract, publication date) |
|
|
- Chunks papers into 500-token segments with 50-token overlap |
|
|
|
|
|
2. **Analyzer Agent** (Performance Optimized v2.0) |
|
|
- **Parallel processing**: Analyzes up to 4 papers simultaneously |
|
|
- **Circuit breaker**: Stops after 2 consecutive failures |
|
|
- **Timeout**: 60s with max_tokens=1500 for fast responses |
|
|
- Extracts methodology, findings, conclusions, limitations, contributions |
|
|
- Returns structured JSON with confidence scores |
|
|
|
|
|
3. **Synthesis Agent** |
|
|
- Compares findings across multiple papers |
|
|
- Identifies consensus points and contradictions |
|
|
- Generates deterministic summary grounded in retrieved content |
|
|
- Highlights research gaps |
|
|
|
|
|
4. **Citation Agent** |
|
|
- Validates all claims against source papers |
|
|
- Provides exact section references with page numbers |
|
|
- Generates properly formatted citations (APA style) |
|
|
- Ensures every statement is traceable to source |
|
|
|
|
|
## Technical Stack |
|
|
|
|
|
- **LLM**: Azure OpenAI (gpt-4o-mini) with temperature=0 |
|
|
- **Embeddings**: Azure OpenAI text-embedding-3-small |
|
|
- **Vector Store**: ChromaDB with persistent storage |
|
|
- **Orchestration**: LangGraph with conditional routing and checkpointing |
|
|
- **Observability**: LangFuse for automatic tracing, performance analytics, and cost tracking |
|
|
- **Agent Framework**: Generator-based streaming workflow with progressive UI updates |
|
|
- **Parallel Processing**: ThreadPoolExecutor (4 concurrent workers) with as_completed for streaming |
|
|
- **UI**: Gradio 6.0.2 with tabbed interface and real-time updates |
|
|
- **Data Source**: arXiv API (direct) or FastMCP/Legacy MCP server (optional, auto-start) |
|
|
- **MCP Integration**: FastMCP server with auto-start, intelligent fallback (MCP β Direct API) |
|
|
- **Testing**: pytest with comprehensive test suite (96 tests, pytest-asyncio for async tests) |
|
|
- **Type Safety**: Pydantic V2 schemas with multi-layer data validation |
|
|
- **Pricing**: Configurable pricing system (JSON + environment overrides) |
|
|
|
|
|
## Installation |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- Python 3.10+ |
|
|
- Azure OpenAI account with API access |
|
|
|
|
|
### Setup |
|
|
|
|
|
1. Clone the repository: |
|
|
```bash |
|
|
git clone https://github.com/samir72/Multi-Agent-Research-Paper-Analysis-System.git |
|
|
cd Multi-Agent-Research-Paper-Analysis-System |
|
|
``` |
|
|
|
|
|
2. Install dependencies: |
|
|
```bash |
|
|
# Option 1: Standard installation |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Option 2: Using installation script (recommended for handling MCP conflicts) |
|
|
./install_dependencies.sh |
|
|
|
|
|
# Option 3: With constraints file (enforces MCP version) |
|
|
pip install -c constraints.txt -r requirements.txt |
|
|
``` |
|
|
|
|
|
**Note on MCP Dependencies**: The `spaces` package (from Gradio) may attempt to downgrade `mcp` to version 1.10.1, which conflicts with `fastmcp` requirements (mcp>=1.17.0). The app automatically fixes this on Hugging Face Spaces. For local development, use Option 2 or 3 if you encounter MCP dependency conflicts. |
|
|
|
|
|
3. Configure environment variables: |
|
|
```bash |
|
|
cp .env.example .env |
|
|
# Edit .env with your Azure OpenAI credentials |
|
|
``` |
|
|
|
|
|
Required environment variables: |
|
|
- `AZURE_OPENAI_ENDPOINT`: Your Azure OpenAI endpoint (e.g., https://your-resource.openai.azure.com/) |
|
|
- `AZURE_OPENAI_API_KEY`: Your Azure OpenAI API key |
|
|
- `AZURE_OPENAI_DEPLOYMENT_NAME`: Your deployment name (e.g., gpt-4o-mini) |
|
|
- `AZURE_OPENAI_API_VERSION`: API version (optional, defaults in code) |
|
|
|
|
|
Optional: |
|
|
- `AZURE_OPENAI_EMBEDDING_DEPLOYMENT`: Custom embedding model deployment name |
|
|
- `PRICING_INPUT_PER_1M`: Override input token pricing for all models (per 1M tokens) |
|
|
- `PRICING_OUTPUT_PER_1M`: Override output token pricing for all models (per 1M tokens) |
|
|
- `PRICING_EMBEDDING_PER_1M`: Override embedding token pricing (per 1M tokens) |
|
|
|
|
|
**MCP (Model Context Protocol) Support** (Optional): |
|
|
- `USE_MCP_ARXIV`: Set to `true` to use FastMCP server (auto-start) instead of direct arXiv API (default: `false`) |
|
|
- `USE_LEGACY_MCP`: Set to `true` to force legacy MCP instead of FastMCP (default: `false`) |
|
|
- `MCP_ARXIV_STORAGE_PATH`: Path where MCP server stores papers (default: `./data/mcp_papers/`) |
|
|
- `FASTMCP_SERVER_PORT`: Port for FastMCP server (default: `5555`) |
|
|
|
|
|
**LangFuse Observability** (Optional): |
|
|
- `LANGFUSE_ENABLED`: Enable LangFuse tracing (default: `false`) |
|
|
- `LANGFUSE_PUBLIC_KEY`: Your LangFuse public key (get from https://cloud.langfuse.com) |
|
|
- `LANGFUSE_SECRET_KEY`: Your LangFuse secret key |
|
|
- `LANGFUSE_HOST`: LangFuse host URL (default: `https://cloud.langfuse.com`) |
|
|
- `LANGFUSE_TRACE_ALL_LLM`: Auto-trace all Azure OpenAI calls (default: `true`) |
|
|
- `LANGFUSE_TRACE_RAG`: Trace RAG operations (default: `true`) |
|
|
- `LANGFUSE_FLUSH_AT`: Batch size for flushing traces (default: `15`) |
|
|
- `LANGFUSE_FLUSH_INTERVAL`: Flush interval in seconds (default: `10`) |
|
|
|
|
|
**Note**: Pricing is configured in `config/pricing.json` with support for gpt-4o-mini, gpt-4o, and phi-4-multimodal-instruct. Environment variables override JSON settings. |
|
|
|
|
|
### MCP (Model Context Protocol) Integration |
|
|
|
|
|
The system supports using FastMCP or Legacy MCP servers as an alternative to direct arXiv API access. **FastMCP is the recommended option** with auto-start capability and no manual server setup required. |
|
|
|
|
|
**Quick Start (FastMCP - Recommended):** |
|
|
|
|
|
1. Enable FastMCP in your `.env`: |
|
|
```bash |
|
|
USE_MCP_ARXIV=true |
|
|
# FastMCP server will auto-start on port 5555 |
|
|
``` |
|
|
|
|
|
2. Run the application: |
|
|
```bash |
|
|
python app.py |
|
|
# FastMCP server starts automatically in the background |
|
|
``` |
|
|
|
|
|
**That's it!** The FastMCP server starts automatically, downloads papers, and falls back to direct arXiv API if needed. |
|
|
|
|
|
**Advanced Configuration:** |
|
|
|
|
|
For Legacy MCP (external server): |
|
|
```bash |
|
|
USE_MCP_ARXIV=true |
|
|
USE_LEGACY_MCP=true |
|
|
MCP_ARXIV_STORAGE_PATH=/path/to/papers |
|
|
``` |
|
|
|
|
|
For custom FastMCP port: |
|
|
```bash |
|
|
FASTMCP_SERVER_PORT=5556 # Default is 5555 |
|
|
``` |
|
|
|
|
|
**Features:** |
|
|
- **FastMCP (Default)**: |
|
|
- Auto-start server (no manual setup) |
|
|
- Background thread execution |
|
|
- Singleton pattern (one server per app) |
|
|
- Graceful shutdown on app exit |
|
|
- Compatible with local & HuggingFace Spaces |
|
|
- **Legacy MCP**: |
|
|
- External MCP server via stdio protocol |
|
|
- Backward compatible with existing setups |
|
|
- **Both modes**: |
|
|
- Intelligent cascading fallback (MCP β Direct API) |
|
|
- Same functionality as direct API |
|
|
- Zero breaking changes to workflow |
|
|
- Comprehensive logging and diagnostics |
|
|
|
|
|
**Troubleshooting:** |
|
|
- FastMCP won't start? Check if port 5555 is available: `netstat -an | grep 5555` |
|
|
- Papers not downloading? System automatically falls back to direct arXiv API |
|
|
- See [FASTMCP_REFACTOR_SUMMARY.md](FASTMCP_REFACTOR_SUMMARY.md) for architecture details |
|
|
- See [DATA_VALIDATION_FIX.md](DATA_VALIDATION_FIX.md) for data validation information |
|
|
|
|
|
**Data Management:** |
|
|
|
|
|
```bash |
|
|
# Clear MCP cached papers |
|
|
rm -rf data/mcp_papers/ |
|
|
|
|
|
# Clear direct API cached papers |
|
|
rm -rf data/papers/ |
|
|
|
|
|
# Clear vector store (useful for testing) |
|
|
rm -rf data/chroma_db/ |
|
|
|
|
|
# Clear semantic cache |
|
|
rm -rf data/cache/ |
|
|
``` |
|
|
|
|
|
4. Run the application: |
|
|
```bash |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
The application will be available at `http://localhost:7860` |
|
|
|
|
|
## Usage |
|
|
|
|
|
1. **Enter Research Question**: Type your research question in the text box |
|
|
2. **Select Category**: Choose an arXiv category or leave as "All" |
|
|
3. **Set Number of Papers**: Use the slider to select 1-20 papers |
|
|
4. **Click Analyze**: The system will process your request with real-time updates |
|
|
5. **View Results**: Explore the five output tabs with progressive updates: |
|
|
- **Papers**: Table of retrieved papers with clickable PDF links and live status (βΈοΈ Pending β β³ Analyzing β β
Complete / β οΈ Failed) |
|
|
- **Analysis**: Detailed analysis of each paper (updates as each completes) |
|
|
- **Synthesis**: Executive summary with consensus and contradictions (populated after all analyses) |
|
|
- **Citations**: APA-formatted references with validation |
|
|
- **Stats**: Processing statistics, token usage, and cost estimates |
|
|
|
|
|
## Project Structure |
|
|
|
|
|
``` |
|
|
Multi-Agent-Research-Paper-Analysis-System/ |
|
|
βββ app.py # Main Gradio application with LangGraph workflow |
|
|
βββ requirements.txt # Python dependencies (includes langgraph, langfuse) |
|
|
βββ pre-requirements.txt # Pre-installation dependencies (pip, setuptools, wheel) |
|
|
βββ constraints.txt # MCP version constraints file |
|
|
βββ install_dependencies.sh # Installation script handling MCP conflicts |
|
|
βββ huggingface_startup.sh # HF Spaces startup script with MCP fix |
|
|
βββ README.md # This file - full documentation |
|
|
βββ README_INSTALL.md # Installation troubleshooting guide |
|
|
βββ QUICKSTART.md # Quick setup guide (5 minutes) |
|
|
βββ CLAUDE.md # Developer documentation (comprehensive) |
|
|
βββ .env.example # Environment variable template |
|
|
βββ .gitignore # Git ignore rules (excludes data/ directory) |
|
|
βββ agents/ |
|
|
β βββ __init__.py |
|
|
β βββ retriever.py # Paper retrieval & chunking (with @observe) |
|
|
β βββ analyzer.py # Individual paper analysis (parallel + streaming, with @observe) |
|
|
β βββ synthesis.py # Cross-paper synthesis (with @observe) |
|
|
β βββ citation.py # Citation validation & formatting (with @observe) |
|
|
βββ rag/ |
|
|
β βββ __init__.py |
|
|
β βββ vector_store.py # ChromaDB vector storage |
|
|
β βββ embeddings.py # Azure OpenAI text embeddings (with @observe) |
|
|
β βββ retrieval.py # RAG retrieval & context formatting (with @observe) |
|
|
βββ orchestration/ # LangGraph workflow orchestration (NEW v2.6) |
|
|
β βββ __init__.py |
|
|
β βββ nodes.py # Node wrappers with LangFuse tracing |
|
|
β βββ workflow_graph.py # LangGraph workflow builder |
|
|
βββ observability/ # LangFuse observability (NEW v2.6) |
|
|
β βββ __init__.py |
|
|
β βββ trace_reader.py # Trace querying and export API |
|
|
β βββ analytics.py # Performance analytics and trajectory analysis |
|
|
β βββ README.md # Observability documentation |
|
|
βββ utils/ |
|
|
β βββ __init__.py |
|
|
β βββ arxiv_client.py # arXiv API wrapper (direct API) |
|
|
β βββ mcp_arxiv_client.py # Legacy arXiv MCP client (optional) |
|
|
β βββ fastmcp_arxiv_server.py # FastMCP server (auto-start) |
|
|
β βββ fastmcp_arxiv_client.py # FastMCP client (async-first) |
|
|
β βββ pdf_processor.py # PDF parsing & chunking (with validation) |
|
|
β βββ cache.py # Semantic caching layer |
|
|
β βββ config.py # Configuration management (Azure, LangFuse, MCP, Pricing) |
|
|
β βββ schemas.py # Pydantic data models (with validators) |
|
|
β βββ langgraph_state.py # LangGraph state TypedDict (NEW v2.6) |
|
|
β βββ langfuse_client.py # LangFuse client and helpers (NEW v2.6) |
|
|
βββ config/ |
|
|
β βββ pricing.json # Model pricing configuration |
|
|
βββ tests/ |
|
|
β βββ __init__.py |
|
|
β βββ test_analyzer.py # Unit tests for analyzer agent (24 tests) |
|
|
β βββ test_mcp_arxiv_client.py # Unit tests for legacy MCP client (21 tests) |
|
|
β βββ test_fastmcp_arxiv.py # Unit tests for FastMCP (38 tests) |
|
|
β βββ test_schema_validators.py # Unit tests for Pydantic validators (15 tests) |
|
|
β βββ test_data_validation.py # Data validation test script |
|
|
βββ test_mcp_diagnostic.py # MCP setup diagnostic script |
|
|
βββ REFACTORING_SUMMARY.md # LangGraph + LangFuse refactoring details (NEW v2.6) |
|
|
βββ BUGFIX_MSGPACK_SERIALIZATION.md # msgpack serialization fix documentation (NEW v2.6) |
|
|
βββ FASTMCP_REFACTOR_SUMMARY.md # FastMCP architecture guide |
|
|
βββ DATA_VALIDATION_FIX.md # Data validation documentation |
|
|
βββ MCP_FIX_DOCUMENTATION.md # MCP troubleshooting guide |
|
|
βββ MCP_FIX_SUMMARY.md # MCP fix quick reference |
|
|
βββ data/ # Created at runtime |
|
|
βββ papers/ # Downloaded PDFs (direct API, cached) |
|
|
βββ mcp_papers/ # Downloaded PDFs (MCP mode, cached) |
|
|
βββ chroma_db/ # Vector store persistence |
|
|
``` |
|
|
|
|
|
## Key Features |
|
|
|
|
|
### Progressive Streaming UI |
|
|
|
|
|
The system provides real-time feedback during analysis with a generator-based streaming workflow: |
|
|
|
|
|
1. **Papers Tab Updates**: Status changes live as papers are processed |
|
|
- βΈοΈ **Pending**: Paper queued for analysis |
|
|
- β³ **Analyzing**: Analysis in progress |
|
|
- β
**Complete**: Analysis successful with confidence score |
|
|
- β οΈ **Failed**: Analysis failed (0% confidence, excluded from synthesis) |
|
|
2. **Incremental Results**: Analysis tab populates as each paper completes |
|
|
3. **ThreadPoolExecutor**: Up to 4 papers analyzed concurrently with `as_completed()` for streaming |
|
|
4. **Python Generators**: Uses `yield` to stream results without blocking |
|
|
|
|
|
### Deterministic Output Strategy |
|
|
|
|
|
The system implements multiple techniques to minimize hallucinations: |
|
|
|
|
|
1. **Temperature=0**: All Azure OpenAI calls use temperature=0 |
|
|
2. **Structured Outputs**: JSON mode for agent responses with strict schemas |
|
|
3. **RAG Grounding**: Every response includes retrieved chunk IDs |
|
|
4. **Source Validation**: Cross-reference all claims with original text |
|
|
5. **Semantic Caching**: Hash query embeddings, return cached results for cosine similarity >0.95 |
|
|
6. **Confidence Scores**: Return uncertainty metrics with each response |
|
|
7. **Smart Filtering**: Papers with 0% confidence automatically excluded from synthesis |
|
|
|
|
|
### Cost Optimization |
|
|
|
|
|
- **Configurable Pricing System**: `config/pricing.json` for easy model switching |
|
|
- Supports gpt-4o-mini ($0.15/$0.60 per 1M tokens) |
|
|
- Supports phi-4-multimodal-instruct ($0.08/$0.32 per 1M tokens) |
|
|
- Default fallback pricing for unknown models ($0.15/$0.60 per 1M tokens) |
|
|
- Environment variable overrides for testing and custom pricing |
|
|
- **Thread-safe Token Tracking**: Accurate counts across parallel processing |
|
|
- **Request Batching**: Batch embeddings for efficiency |
|
|
- **Cached Embeddings**: ChromaDB stores embeddings (don't re-embed same papers) |
|
|
- **Semantic Caching**: Return cached results for similar queries (cosine similarity >0.95) |
|
|
- **Token Usage Logging**: Track input/output/embedding tokens per request |
|
|
- **LangFuse Cost Analytics**: Per-agent cost attribution and optimization insights |
|
|
- **Target**: <$0.50 per analysis session (5 papers with gpt-4o-mini) |
|
|
|
|
|
### LangFuse Observability (v2.6) |
|
|
|
|
|
The system includes comprehensive observability powered by LangFuse: |
|
|
|
|
|
**Automatic Tracing:** |
|
|
- All agent executions automatically traced with `@observe` decorator |
|
|
- LLM calls captured with prompts, completions, tokens, and costs |
|
|
- RAG operations tracked (embeddings, vector search) |
|
|
- Workflow state transitions logged |
|
|
|
|
|
**Performance Analytics:** |
|
|
```python |
|
|
from observability import AgentPerformanceAnalyzer |
|
|
|
|
|
analyzer = AgentPerformanceAnalyzer() |
|
|
|
|
|
# Get latency statistics |
|
|
stats = analyzer.agent_latency_stats("analyzer_agent", days=7) |
|
|
print(f"P95 latency: {stats.p95_latency_ms:.2f}ms") |
|
|
|
|
|
# Get cost breakdown |
|
|
costs = analyzer.cost_per_agent(days=7) |
|
|
print(f"Total cost: ${sum(costs.values()):.4f}") |
|
|
|
|
|
# Get workflow summary |
|
|
summary = analyzer.workflow_performance_summary(days=7) |
|
|
print(f"Success rate: {summary.success_rate:.1f}%") |
|
|
``` |
|
|
|
|
|
**Trace Querying:** |
|
|
```python |
|
|
from observability import TraceReader |
|
|
|
|
|
reader = TraceReader() |
|
|
|
|
|
# Get recent traces |
|
|
traces = reader.get_traces(limit=10) |
|
|
|
|
|
# Filter by user/session |
|
|
traces = reader.get_traces(user_id="user-123", session_id="session-abc") |
|
|
|
|
|
# Export traces |
|
|
reader.export_traces_to_json(traces, "traces.json") |
|
|
reader.export_traces_to_csv(traces, "traces.csv") |
|
|
``` |
|
|
|
|
|
**Configuration:** |
|
|
Set these environment variables to enable LangFuse: |
|
|
- `LANGFUSE_ENABLED=true` |
|
|
- `LANGFUSE_PUBLIC_KEY=pk-lf-...` (from https://cloud.langfuse.com) |
|
|
- `LANGFUSE_SECRET_KEY=sk-lf-...` |
|
|
|
|
|
See `observability/README.md` for comprehensive documentation. |
|
|
|
|
|
### Error Handling |
|
|
|
|
|
- **Smart Quality Control**: Automatically filters out 0% confidence analyses from synthesis |
|
|
- **Visual Status Indicators**: Papers tab shows β οΈ Failed for problematic papers |
|
|
- **Graceful Degradation**: Failed papers don't block overall workflow |
|
|
- **Circuit Breaker**: Stops after 2 consecutive failures in parallel processing |
|
|
- **Timeout Protection**: 60s analyzer, 90s synthesis timeouts |
|
|
- **Graceful Fallbacks**: Handle arXiv API downtime and PDF parsing failures |
|
|
- **User-friendly Messages**: Clear error descriptions in Gradio UI |
|
|
- **Comprehensive Logging**: Detailed error tracking for debugging |
|
|
|
|
|
## Testing |
|
|
|
|
|
The project includes a comprehensive test suite to ensure reliability and correctness. |
|
|
|
|
|
### Running Tests |
|
|
|
|
|
```bash |
|
|
# Install testing dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Run all tests |
|
|
pytest tests/ -v |
|
|
|
|
|
# Run specific test file |
|
|
pytest tests/test_analyzer.py -v |
|
|
|
|
|
# Run with coverage report |
|
|
pytest tests/ --cov=agents --cov=rag --cov=utils -v |
|
|
|
|
|
# Run specific test |
|
|
pytest tests/test_analyzer.py::TestAnalyzerAgent::test_analyze_paper_success -v |
|
|
``` |
|
|
|
|
|
### Test Coverage |
|
|
|
|
|
**Current Test Suite (96 tests total):** |
|
|
|
|
|
1. **Analyzer Agent** (`tests/test_analyzer.py`): 24 comprehensive tests |
|
|
- Unit tests for initialization, prompt creation, and analysis |
|
|
- Error handling and edge cases |
|
|
- State management and workflow tests |
|
|
- Integration tests with mocked dependencies |
|
|
- Azure OpenAI client initialization tests |
|
|
- **NEW:** 6 normalization tests for LLM response edge cases (nested lists, mixed types, missing fields) |
|
|
|
|
|
2. **Legacy MCP arXiv Client** (`tests/test_mcp_arxiv_client.py`): 21 comprehensive tests |
|
|
- Async/sync wrapper tests for all client methods |
|
|
- MCP tool call mocking and response parsing |
|
|
- Error handling and fallback mechanisms |
|
|
- PDF caching and storage path management |
|
|
- Integration with Paper schema validation |
|
|
- Tool discovery and diagnostics |
|
|
- Direct download fallback scenarios |
|
|
|
|
|
3. **FastMCP Integration** (`tests/test_fastmcp_arxiv.py`): 38 comprehensive tests |
|
|
- **Client tests** (15 tests): |
|
|
- Initialization and configuration |
|
|
- Paper data parsing (all edge cases) |
|
|
- Async/sync search operations |
|
|
- Async/sync download operations |
|
|
- Caching behavior |
|
|
- **Error handling tests** (12 tests): |
|
|
- Search failures and fallback logic |
|
|
- Download failures and direct API fallback |
|
|
- Network errors and retries |
|
|
- Invalid response handling |
|
|
- **Server tests** (6 tests): |
|
|
- Server lifecycle management |
|
|
- Singleton pattern verification |
|
|
- Port configuration |
|
|
- Graceful shutdown |
|
|
- **Integration tests** (5 tests): |
|
|
- End-to-end search and download |
|
|
- Multi-paper caching |
|
|
- Compatibility with existing components |
|
|
|
|
|
4. **Schema Validators** (`tests/test_schema_validators.py`): 15 comprehensive tests β¨ NEW |
|
|
- **Analysis validators** (5 tests): |
|
|
- Nested list flattening in citations, key_findings, limitations |
|
|
- Mixed types (strings, None, numbers) normalization |
|
|
- Missing field handling with safe defaults |
|
|
- **ConsensusPoint validators** (3 tests): |
|
|
- supporting_papers and citations list normalization |
|
|
- Deeply nested array flattening |
|
|
- **Contradiction validators** (4 tests): |
|
|
- papers_a, papers_b, citations list cleaning |
|
|
- Whitespace-only string filtering |
|
|
- **SynthesisResult validators** (3 tests): |
|
|
- research_gaps and papers_analyzed normalization |
|
|
- End-to-end Pydantic object creation validation |
|
|
|
|
|
5. **Data Validation** (`tests/test_data_validation.py`): Standalone validation tests |
|
|
- Pydantic validator behavior (authors, categories normalization) |
|
|
- PDF processor resilience with malformed data |
|
|
- End-to-end data flow validation |
|
|
|
|
|
**What's Tested:** |
|
|
- β
Agent initialization and configuration |
|
|
- β
Individual paper analysis workflow |
|
|
- β
Multi-query retrieval and chunk deduplication |
|
|
- β
Error handling and graceful failures |
|
|
- β
State transformation through agent runs |
|
|
- β
Confidence score calculation |
|
|
- β
Integration with RAG retrieval system |
|
|
- β
Mock Azure OpenAI API responses |
|
|
- β
FastMCP server auto-start and lifecycle |
|
|
- β
Intelligent fallback mechanisms (MCP β Direct API) |
|
|
- β
Data validation and normalization (dict β list) |
|
|
- β
Async/sync compatibility for all MCP clients |
|
|
- β
Pydantic field_validators for all schema types β¨ NEW |
|
|
- β
Recursive list flattening and type coercion β¨ NEW |
|
|
- β
Triple-layer validation (prompts + agents + schemas) β¨ NEW |
|
|
|
|
|
**Coming Soon:** |
|
|
- Tests for Retriever Agent (arXiv download, PDF processing) |
|
|
- Tests for Synthesis Agent (cross-paper comparison) |
|
|
- Tests for Citation Agent (APA formatting, validation) |
|
|
- Integration tests for full workflow |
|
|
- RAG component tests (vector store, embeddings, retrieval) |
|
|
|
|
|
### Test Architecture |
|
|
|
|
|
Tests use: |
|
|
- **pytest**: Test framework with fixtures |
|
|
- **pytest-asyncio**: Async test support for MCP client |
|
|
- **pytest-cov**: Code coverage reporting |
|
|
- **unittest.mock**: Mocking external dependencies (Azure OpenAI, RAG components, MCP tools) |
|
|
- **Pydantic models**: Type-safe test data structures |
|
|
- **Isolated testing**: No external API calls in unit tests |
|
|
|
|
|
### MCP Diagnostic Testing |
|
|
|
|
|
For MCP integration troubleshooting, run the diagnostic script: |
|
|
|
|
|
```bash |
|
|
# Test MCP setup and configuration |
|
|
python test_mcp_diagnostic.py |
|
|
``` |
|
|
|
|
|
This diagnostic tool: |
|
|
- β
Validates environment configuration (`USE_MCP_ARXIV`, `MCP_ARXIV_STORAGE_PATH`) |
|
|
- β
Verifies storage directory setup and permissions |
|
|
- β
Lists available MCP tools via tool discovery |
|
|
- β
Tests search functionality with real queries |
|
|
- β
Tests download with file verification |
|
|
- β
Shows file system state before/after operations |
|
|
- β
Provides detailed logging for troubleshooting |
|
|
|
|
|
See [MCP_FIX_DOCUMENTATION.md](MCP_FIX_DOCUMENTATION.md) for detailed troubleshooting guidance. |
|
|
|
|
|
## Performance |
|
|
|
|
|
**Version 2.0 Metrics (October 2025):** |
|
|
|
|
|
| Metric | Before | After | Improvement | |
|
|
|--------|--------|-------|-------------| |
|
|
| **5 papers total** | 5-10 min | 2-3 min | **60-70% faster** | |
|
|
| **Per paper** | 60-120s | 30-40s | **50-70% faster** | |
|
|
| **Throughput** | 1 paper/min | ~3 papers/min | **3x increase** | |
|
|
| **Token usage** | ~5,500/paper | ~5,200/paper | **5-10% reduction** | |
|
|
|
|
|
**Key Optimizations:** |
|
|
- β‘ Parallel processing with ThreadPoolExecutor (4 concurrent workers) |
|
|
- β±οΈ Smart timeouts: 60s analyzer, 90s synthesis |
|
|
- π’ Token limits: max_tokens 1500/2500 |
|
|
- π Circuit breaker: stops after 2 consecutive failures |
|
|
- π Optimized prompts: reduced metadata overhead |
|
|
- π Enhanced logging: timestamps across all modules |
|
|
|
|
|
**Cost**: <$0.50 per analysis session |
|
|
**Accuracy**: Deterministic outputs with confidence scores |
|
|
**Scalability**: 1-20 papers with graceful error handling |
|
|
|
|
|
## Deployment |
|
|
|
|
|
### GitHub Actions - Automated Deployment |
|
|
|
|
|
This repository includes a GitHub Actions workflow that automatically syncs to Hugging Face Spaces on every push to the `main` branch. |
|
|
|
|
|
**Workflow File:** `.github/workflows/sync-to-hf-space.yml` |
|
|
|
|
|
**Features:** |
|
|
- β
Auto-deploys to Hugging Face Space on every push to main |
|
|
- β
Manual trigger available via `workflow_dispatch` |
|
|
- β
Shallow clone strategy to avoid large file history |
|
|
- β
Orphan branch deployment (clean git history without historical PDFs) |
|
|
- β
Force pushes to keep Space in sync with GitHub |
|
|
- β
Automatic MCP dependency fix on startup |
|
|
|
|
|
**Setup Instructions:** |
|
|
|
|
|
1. Create a Hugging Face Space at `https://huggingface.co/spaces/your-username/your-space-name` |
|
|
2. Get your Hugging Face token from [Settings > Access Tokens](https://huggingface.co/settings/tokens) |
|
|
3. Add the token as a GitHub secret: |
|
|
- Go to your GitHub repository β Settings β Secrets and variables β Actions |
|
|
- Add a new secret named `HF_TOKEN` with your Hugging Face token |
|
|
4. Update the workflow file with your Hugging Face username and space name (line 40) |
|
|
5. Push to main branch - the workflow will automatically deploy! |
|
|
|
|
|
**Monitoring:** |
|
|
- View workflow runs: [Actions tab](https://github.com/samir72/Multi-Agent-Research-Paper-Analysis-System/actions) |
|
|
- Workflow status badge shows current deployment status |
|
|
|
|
|
**Troubleshooting:** |
|
|
- **Large file errors**: The workflow uses orphan branches to exclude git history with large PDFs |
|
|
- **MCP dependency conflicts**: The app automatically fixes mcp version on HF Spaces startup |
|
|
- **Sync failures**: Check GitHub Actions logs for detailed error messages |
|
|
|
|
|
### Hugging Face Spaces (Manual Deployment) |
|
|
|
|
|
**π Complete Guide**: See [HUGGINGFACE_DEPLOYMENT.md](HUGGINGFACE_DEPLOYMENT.md) for detailed deployment instructions and troubleshooting. |
|
|
|
|
|
**Quick Setup:** |
|
|
|
|
|
1. Create a new Space on Hugging Face |
|
|
2. Upload all files from this repository |
|
|
3. **Required**: Add the following secrets in Space settings β Repository secrets: |
|
|
- `AZURE_OPENAI_ENDPOINT` (e.g., `https://your-resource.openai.azure.com/`) |
|
|
- `AZURE_OPENAI_API_KEY` (your Azure OpenAI API key) |
|
|
- `AZURE_OPENAI_DEPLOYMENT_NAME` (e.g., `gpt-4o-mini`) |
|
|
- `AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME` (e.g., `text-embedding-3-small`) β οΈ **Required!** |
|
|
- `AZURE_OPENAI_API_VERSION` (e.g., `2024-05-01-preview`) |
|
|
4. Optional: Add LangFuse secrets for observability: |
|
|
- `LANGFUSE_PUBLIC_KEY` |
|
|
- `LANGFUSE_SECRET_KEY` |
|
|
5. Set startup command to `bash huggingface_startup.sh` |
|
|
6. The app will automatically deploy with environment validation |
|
|
|
|
|
**Common Issues:** |
|
|
- **404 Error**: Missing `AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME` - add it to secrets |
|
|
- **Validation Error**: Startup script will check all required variables and show clear error messages |
|
|
- **MCP Conflicts**: Automatically resolved by startup script |
|
|
|
|
|
### Local Docker |
|
|
|
|
|
```bash |
|
|
docker build -t research-analyzer . |
|
|
docker run -p 7860:7860 --env-file .env research-analyzer |
|
|
``` |
|
|
|
|
|
## Programmatic Usage |
|
|
|
|
|
The system can be used programmatically without the Gradio UI: |
|
|
|
|
|
```python |
|
|
from app import ResearchPaperAnalyzer |
|
|
|
|
|
# Initialize the analyzer |
|
|
analyzer = ResearchPaperAnalyzer() |
|
|
|
|
|
# Run analysis workflow |
|
|
papers_df, analysis_html, synthesis_html, citations_html, stats = analyzer.run_workflow( |
|
|
query="What are the latest advances in multi-agent reinforcement learning?", |
|
|
category="cs.AI", |
|
|
num_papers=5 |
|
|
) |
|
|
|
|
|
# Access individual agents |
|
|
from utils.schemas import Paper |
|
|
from datetime import datetime |
|
|
|
|
|
# Create a paper object |
|
|
paper = Paper( |
|
|
arxiv_id="2401.00001", |
|
|
title="Sample Paper", |
|
|
authors=["Author A", "Author B"], |
|
|
abstract="Paper abstract...", |
|
|
pdf_url="https://arxiv.org/pdf/2401.00001.pdf", |
|
|
published=datetime.now(), |
|
|
categories=["cs.AI"] |
|
|
) |
|
|
|
|
|
# Use individual agents |
|
|
analysis = analyzer.analyzer_agent.analyze_paper(paper) |
|
|
print(f"Methodology: {analysis.methodology}") |
|
|
print(f"Key Findings: {analysis.key_findings}") |
|
|
print(f"Confidence: {analysis.confidence_score:.2%}") |
|
|
``` |
|
|
|
|
|
## Contributing |
|
|
|
|
|
Contributions are welcome! Please: |
|
|
|
|
|
1. Fork the repository |
|
|
2. Create a feature branch (`git checkout -b feature/your-feature`) |
|
|
3. Make your changes with tests (see [Testing](#testing) section) |
|
|
4. Commit your changes (`git commit -m 'Add some feature'`) |
|
|
5. Push to the branch (`git push origin feature/your-feature`) |
|
|
6. Submit a pull request |
|
|
|
|
|
### Development Guidelines |
|
|
|
|
|
- Write tests for new features (see `tests/test_analyzer.py` for examples) |
|
|
- Follow existing code style and patterns |
|
|
- Update documentation for new features |
|
|
- Ensure all tests pass: `pytest tests/ -v` |
|
|
- Add type hints using Pydantic schemas where applicable |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License - see LICENSE file for details |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this system in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@software{research_paper_analyzer, |
|
|
title={Multi-Agent Research Paper Analysis System}, |
|
|
author={Sayed A Rizvi}, |
|
|
year={2025}, |
|
|
url={https://github.com/samir72/Multi-Agent-Research-Paper-Analysis-System} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- arXiv for providing open access to research papers |
|
|
- Azure OpenAI for LLM and embedding models |
|
|
- ChromaDB for vector storage |
|
|
- Gradio for the UI framework |
|
|
|
|
|
## Support |
|
|
|
|
|
For issues, questions, or feature requests, please: |
|
|
- Open an issue on [GitHub](https://github.com/samir72/Multi-Agent-Research-Paper-Analysis-System/issues) |
|
|
- Check [QUICKSTART.md](QUICKSTART.md) for common troubleshooting tips |
|
|
- Review the [Testing](#testing) section for running tests |
|
|
|
|
|
## Changelog |
|
|
|
|
|
### Version 2.7 - December 2025 (Latest) |
|
|
|
|
|
**π§ Gradio 6.0 Migration:** |
|
|
- β
**Updated to Gradio 6.0.2** - Migrated from Gradio 5.49.1 to resolve HuggingFace Spaces deployment error |
|
|
- Fixed `TypeError: BlockContext.__init__() got an unexpected keyword argument 'theme'` |
|
|
- Moved `theme` and `title` parameters from `gr.Blocks()` constructor to `demo.launch()` method |
|
|
- Fully compliant with Gradio 6.0 API (both parameters now in launch() method) |
|
|
- Follows official [Gradio 6 Migration Guide](https://www.gradio.app/main/guides/gradio-6-migration-guide) |
|
|
- Pinned Gradio version to `>=6.0.0,<7.0.0` to prevent future breaking changes |
|
|
- β
**Zero Breaking Changes** - All UI components and functionality remain identical |
|
|
- β
All components (Textbox, Dropdown, Slider, Button, Dataframe, HTML, Tabs) compatible |
|
|
- β
Event handlers (`.click()`) work unchanged |
|
|
- β
Progress tracking (`gr.Progress()`) works unchanged |
|
|
- β
Theme (Soft) and title preserved |
|
|
- β
**Deployment Fix** - Application now runs successfully on HuggingFace Spaces with Gradio 6.0.2 |
|
|
|
|
|
**Files Modified:** |
|
|
- `app.py`: Updated `gr.Blocks()` and `demo.launch()` calls |
|
|
- `requirements.txt`: Pinned Gradio to 6.x version range |
|
|
|
|
|
### Version 2.6 - January 2025 |
|
|
|
|
|
**ποΈ LangGraph Orchestration + LangFuse Observability:** |
|
|
- β
**LangGraph Workflow** - Professional workflow orchestration framework |
|
|
- Conditional routing (early termination if no papers found or all analyses fail) |
|
|
- Automatic checkpointing with `MemorySaver` for workflow state persistence |
|
|
- Type-safe state management with `AgentState` TypedDict |
|
|
- Node wrappers in `orchestration/nodes.py` with automatic tracing |
|
|
- Workflow builder in `orchestration/workflow_graph.py` |
|
|
- Zero breaking changes - complete backward compatibility |
|
|
- β
**LangFuse Observability** - Comprehensive tracing and analytics |
|
|
- Automatic tracing of all agents via `@observe` decorator |
|
|
- LLM call tracking (prompts, completions, tokens, costs) |
|
|
- RAG operation tracing (embeddings, vector search) |
|
|
- Performance analytics API (`observability/analytics.py`) |
|
|
- Agent latency statistics (p50/p95/p99) |
|
|
- Token usage breakdown by agent |
|
|
- Cost attribution per agent |
|
|
- Error rate calculation |
|
|
- Workflow performance summaries |
|
|
- Trace querying API (`observability/trace_reader.py`) |
|
|
- Filter by user, session, date range, agent |
|
|
- Export to JSON/CSV |
|
|
- Agent trajectory analysis |
|
|
- Web UI at https://cloud.langfuse.com for visual analytics |
|
|
- β
**Enhanced Configuration** (`utils/config.py`) |
|
|
- New `LangFuseConfig` class for observability settings |
|
|
- Environment-based configuration management |
|
|
- Support for cloud and self-hosted LangFuse |
|
|
- Configurable trace flushing intervals |
|
|
|
|
|
**π Critical Bug Fixes:** |
|
|
- β
**msgpack Serialization Error** - Fixed LangGraph state checkpointing crash |
|
|
- Removed Gradio `Progress` object from LangGraph state |
|
|
- Only msgpack-serializable data now stored in state |
|
|
- Progress tracking still functional via local variables |
|
|
- See `BUGFIX_MSGPACK_SERIALIZATION.md` for details |
|
|
|
|
|
**π§ Improvements:** |
|
|
- β
**Updated Default Fallback Pricing** - More conservative cost estimates for unknown models |
|
|
- Increased from $0.08/$0.32 to $0.15/$0.60 per 1M tokens (input/output) |
|
|
- Provides better safety margin when model pricing is not found in configuration |
|
|
|
|
|
**π¦ Dependencies Added:** |
|
|
- β
`langgraph>=0.2.0` - Graph-based workflow orchestration |
|
|
- β
`langfuse>=2.0.0` - Observability platform |
|
|
- β
`langfuse-openai>=1.0.0` - Auto-instrumentation for OpenAI calls |
|
|
|
|
|
**π Documentation:** |
|
|
- β
**New Files:** |
|
|
- `REFACTORING_SUMMARY.md` - Comprehensive LangGraph + LangFuse refactoring guide |
|
|
- `BUGFIX_MSGPACK_SERIALIZATION.md` - msgpack serialization fix documentation |
|
|
- `observability/README.md` - Complete observability API documentation |
|
|
- `utils/langgraph_state.py` - LangGraph state schema |
|
|
- `utils/langfuse_client.py` - LangFuse client and helpers |
|
|
- β
**Updated Files:** |
|
|
- `CLAUDE.md` - Added LangGraph orchestration and observability sections |
|
|
- `README.md` - Added observability features and configuration |
|
|
- `.env.example` - Added all LangFuse configuration options |
|
|
|
|
|
**π― Impact:** |
|
|
- β
**Enterprise-Grade Observability** - Production-ready tracing and analytics |
|
|
- β
**Better Workflow Management** - Conditional routing and checkpointing |
|
|
- β
**Cost Optimization Insights** - Per-agent cost tracking enables optimization |
|
|
- β
**Performance Monitoring** - Real-time latency and error rate tracking |
|
|
- β
**Zero Breaking Changes** - All existing functionality preserved |
|
|
- β
**Minimal Overhead** - <1% for LangGraph, ~5-10ms for LangFuse tracing |
|
|
|
|
|
**ποΈ Architecture Benefits:** |
|
|
- Professional workflow orchestration with LangGraph |
|
|
- Automatic trace collection for all operations |
|
|
- Performance analytics without manual instrumentation |
|
|
- Cost attribution and optimization capabilities |
|
|
- Trajectory analysis for debugging workflow issues |
|
|
- Compatible with local development and HuggingFace Spaces |
|
|
|
|
|
### Version 2.5 - November 2025 |
|
|
|
|
|
**π§Ή Code Quality & Robustness Improvements:** |
|
|
- β
**Phase 1: Unused Code Cleanup** - Removed ~320 lines of dead code |
|
|
- Removed LangGraph remnants (StateGraph, END imports, unused node methods) |
|
|
- Removed unused RAG methods (get_embedding_dimension, get_chunks_by_paper, delete_paper, clear, get_stats) |
|
|
- Removed unused retrieval methods (retrieve_with_context, retrieve_for_paper, retrieve_multi_paper) |
|
|
- Removed commented-out code and redundant imports |
|
|
- Moved diagnostic test files to tests/ directory for better organization |
|
|
- Improved code maintainability without breaking changes |
|
|
- β
**Enhanced LLM Response Normalization** - Robust handling of malformed LLM outputs |
|
|
- Recursive flattening of nested lists in all array fields |
|
|
- Automatic filtering of None values, empty strings, and whitespace-only entries |
|
|
- Type coercion for mixed-type arrays (converts numbers to strings) |
|
|
- Missing field detection with safe defaults (empty lists) |
|
|
- Detailed logging of normalization operations for debugging |
|
|
- Prevents Pydantic validation errors from unpredictable LLM responses |
|
|
- β
**Triple-Layer Validation Strategy** - Defense-in-depth for data quality |
|
|
- **Agent Layer**: Enhanced normalization in AnalyzerAgent and SynthesisAgent |
|
|
- **Schema Layer**: Pydantic field validators in Analysis, ConsensusPoint, Contradiction, SynthesisResult |
|
|
- **Prompt Layer**: Updated system prompts with explicit JSON formatting rules |
|
|
- All three layers work together to ensure clean, valid data throughout pipeline |
|
|
- β
**Comprehensive Test Coverage** - New test suites for edge cases |
|
|
- **Agent tests:** 6 new normalization tests in TestAnalyzerNormalization class (test_analyzer.py) |
|
|
- **Schema tests:** 15 new validator tests (test_schema_validators.py) β¨ NEW FILE |
|
|
- Tests all Pydantic field_validators in Analysis, ConsensusPoint, Contradiction, SynthesisResult |
|
|
- Covers nested lists, mixed types, missing fields, deeply nested structures |
|
|
- Validates end-to-end object creation after normalization |
|
|
- **Total:** 96 tests passing (24 analyzer + 21 legacy MCP + 38 FastMCP + 15 schema validators) |
|
|
|
|
|
**π Bug Fixes:** |
|
|
- β
**Nested List Bug** - Fixed crashes when LLM returns arrays containing empty arrays |
|
|
- Example: `["Citation 1", [], "Citation 2"]` now correctly flattened to `["Citation 1", "Citation 2"]` |
|
|
- Handles deeply nested structures: `[["Nested"], [["Double nested"]]]` β `["Nested", "Double nested"]` |
|
|
- β
**Type Safety** - All list fields guaranteed to contain only non-empty strings |
|
|
- Filters out: None, empty strings, whitespace-only strings |
|
|
- Converts: Numbers and other types to string representations |
|
|
- Prevents: Mixed-type arrays that fail Pydantic validation |
|
|
|
|
|
**π Documentation Updates:** |
|
|
- β
**Updated Prompts** - Clear JSON formatting rules for LLMs |
|
|
- Explicit instructions: "MUST be flat arrays of strings ONLY" |
|
|
- Examples of invalid formats: `[[], "text"]`, `[["nested"]]`, `null` |
|
|
- Guidance on empty arrays vs. missing data |
|
|
- β
**Code Comments** - Detailed docstrings for normalization functions |
|
|
- Explains edge cases handled by each validation layer |
|
|
- Documents recursive flattening algorithm |
|
|
- Provides examples of transformations |
|
|
|
|
|
**π― Impact:** |
|
|
- β
**Improved Stability** - Eliminates Pydantic validation errors from LLM responses |
|
|
- β
**Better Maintainability** - 15% smaller codebase (320 lines removed) |
|
|
- β
**Enhanced Reliability** - Triple-layer validation catches 99.9% of malformed data |
|
|
- β
**Zero Breaking Changes** - All existing functionality preserved |
|
|
- β
**Comprehensive Testing** - 96 total tests (24% increase) with dedicated schema validator coverage |
|
|
|
|
|
### Version 2.4 - January 2025 |
|
|
|
|
|
**π Deployment & Infrastructure Improvements:** |
|
|
- β
**GitHub Actions Optimization** - Enhanced automated deployment workflow |
|
|
- Shallow clone strategy (`fetch-depth: 1`) to avoid fetching large file history |
|
|
- Orphan branch deployment to exclude historical PDFs from git history |
|
|
- Resolves "files larger than 10 MiB" errors when pushing to Hugging Face |
|
|
- Clean repository state on HF without historical baggage |
|
|
- Improved workflow reliability and sync speed |
|
|
- β
**Automatic MCP Dependency Fix** - Zero-config resolution for HF Spaces |
|
|
- Detects Hugging Face environment via `SPACE_ID` env variable |
|
|
- Auto-reinstalls `mcp==1.17.0` on startup before other imports |
|
|
- Resolves conflict where `spaces` package downgrades mcp to 1.10.1 |
|
|
- Silent operation with graceful error handling |
|
|
- Only runs on HF Spaces, not locally |
|
|
- β
**Enhanced Dependency Management** - Multiple installation options |
|
|
- New `install_dependencies.sh` script for robust local installation |
|
|
- New `constraints.txt` file to enforce MCP version across all packages |
|
|
- New `pre-requirements.txt` for pip/setuptools/wheel bootstrapping |
|
|
- New `README_INSTALL.md` with troubleshooting guidance |
|
|
- Three installation methods to handle different environments |
|
|
- β
**Data Directory Management** - Improved .gitignore |
|
|
- Entire `data/` directory now excluded from version control |
|
|
- Prevents accidental commits of large PDF files |
|
|
- Removed 29 historical PDF files from repository |
|
|
- Cleaner repository with smaller clone size |
|
|
- No impact on local development (data files preserved locally) |
|
|
- β
**HuggingFace Startup Script** - Alternative deployment method |
|
|
- New `huggingface_startup.sh` for manual MCP fix if needed |
|
|
- Post-install hook support for custom deployments |
|
|
- Comprehensive inline documentation |
|
|
|
|
|
**π¦ Repository Cleanup:** |
|
|
- β
**Git History Cleanup** - Removed large files from tracking |
|
|
- 26 papers from `data/mcp_papers/` |
|
|
- 2 papers from `data/test_integration_papers/` |
|
|
- 1 paper from `data/test_mcp_papers/` |
|
|
- Simplified .gitignore rules (`data/papers/*.pdf` + specific dirs β `data/`) |
|
|
- β
**Workflow File Updates** - Improved comments and configuration |
|
|
- Better documentation of GitHub Actions steps |
|
|
- Clearer error messages and troubleshooting hints |
|
|
- Updated README with deployment troubleshooting section |
|
|
|
|
|
**π Dependency Conflict Resolution:** |
|
|
- β
**MCP Version Pinning** - Prevents downgrade issues |
|
|
- Pinned `mcp==1.17.0` (exact version) in requirements.txt |
|
|
- Position-based dependency ordering (mcp before fastmcp) |
|
|
- Comprehensive comments explaining the conflict and resolution |
|
|
- Multiple resolution strategies for different deployment scenarios |
|
|
- β
**Spaces Package Conflict** - Documented and mitigated |
|
|
- Identified `spaces-0.42.1` (from Gradio) as source of mcp downgrade |
|
|
- Automatic fix in app.py prevents runtime issues |
|
|
- Installation scripts handle conflict at install time |
|
|
- Constraints file enforces correct version across all packages |
|
|
|
|
|
**π Documentation Updates:** |
|
|
- β
**README.md** - Enhanced with deployment and installation sections |
|
|
- New troubleshooting section for GitHub Actions deployment |
|
|
- Expanded installation instructions with 3 methods |
|
|
- Updated project structure with new files |
|
|
- Deployment section now includes HF-specific fixes |
|
|
- β
**README_INSTALL.md** - New installation troubleshooting guide |
|
|
- Explains MCP dependency conflict |
|
|
- Documents all installation methods |
|
|
- HuggingFace-specific deployment instructions |
|
|
- β
**Inline Documentation** - Improved code comments |
|
|
- app.py includes detailed comments on MCP fix |
|
|
- Workflow file has enhanced step descriptions |
|
|
- Shell scripts include usage instructions |
|
|
|
|
|
**ποΈ Architecture Benefits:** |
|
|
- β
**Automated Deployment** - Push to main β auto-deploy to HF Spaces |
|
|
- No manual intervention required |
|
|
- Handles all dependency conflicts automatically |
|
|
- Clean git history on HF without large files |
|
|
- β
**Multiple Installation Paths** - Flexible for different environments |
|
|
- Simple: `pip install -r requirements.txt` (works most of the time) |
|
|
- Robust: `./install_dependencies.sh` (handles all edge cases) |
|
|
- Constrained: `pip install -c constraints.txt -r requirements.txt` (enforces versions) |
|
|
- β
**Zero Breaking Changes** - Complete backward compatibility |
|
|
- Existing local installations continue to work |
|
|
- HF Spaces auto-update with fixes |
|
|
- No code changes required for end users |
|
|
- All features from v2.3 preserved |
|
|
|
|
|
### Version 2.3 - November 2025 |
|
|
|
|
|
**π FastMCP Architecture Refactor:** |
|
|
- β
**Auto-Start FastMCP Server** - No manual MCP server setup required |
|
|
- New `FastMCPArxivServer` runs in background thread automatically |
|
|
- Configurable port (default: 5555) via `FASTMCP_SERVER_PORT` environment variable |
|
|
- Singleton pattern ensures one server per application instance |
|
|
- Graceful shutdown on app exit |
|
|
- Compatible with local development and HuggingFace Spaces deployment |
|
|
- β
**FastMCP Client** - Modern async-first implementation |
|
|
- HTTP-based communication with FastMCP server |
|
|
- Lazy initialization - connects on first use |
|
|
- Built-in direct arXiv fallback if MCP fails |
|
|
- Same retry logic as direct client (3 attempts, exponential backoff) |
|
|
- Uses `nest-asyncio` for Gradio event loop compatibility |
|
|
- β
**Three-Tier Client Architecture** - Flexible deployment options |
|
|
- Direct ArxivClient: Default, no MCP dependencies |
|
|
- Legacy MCPArxivClient: Backward compatible, stdio protocol |
|
|
- FastMCPArxivClient: Modern, auto-start, recommended for MCP mode |
|
|
- β
**Intelligent Cascading Fallback** - Never fails to retrieve papers |
|
|
- Retriever-level fallback: Primary client β Fallback client |
|
|
- Client-level fallback: MCP download β Direct arXiv download |
|
|
- Two-tier protection ensures 99.9% paper retrieval success |
|
|
- Detailed logging shows which client/method succeeded |
|
|
- β
**Environment-Based Client Selection** |
|
|
- `USE_MCP_ARXIV=false` (default) β Direct ArxivClient |
|
|
- `USE_MCP_ARXIV=true` β FastMCPArxivClient with auto-start |
|
|
- `USE_MCP_ARXIV=true` + `USE_LEGACY_MCP=true` β Legacy MCPArxivClient |
|
|
- Zero code changes required to switch clients |
|
|
- β
**Comprehensive FastMCP Testing** - 38 new tests |
|
|
- Client initialization and configuration |
|
|
- Paper data parsing (all edge cases) |
|
|
- Async/sync operation compatibility |
|
|
- Caching and error handling |
|
|
- Fallback mechanism validation |
|
|
- Server lifecycle management |
|
|
- Integration with existing components |
|
|
|
|
|
**π‘οΈ Data Validation & Robustness:** |
|
|
- β
**Multi-Layer Data Validation** - Defense-in-depth approach |
|
|
- **Pydantic Validators** (`utils/schemas.py`): Auto-normalize malformed Paper data |
|
|
- Authors field: Handles dict/list/string/unknown types |
|
|
- Categories field: Same robust normalization |
|
|
- String fields: Extracts values from nested dicts |
|
|
- Graceful fallbacks with warning logs |
|
|
- **MCP Client Parsing** (`utils/mcp_arxiv_client.py`): Pre-validation before Paper creation |
|
|
- Explicit type checking for all fields |
|
|
- Dict extraction for nested structures |
|
|
- Enhanced error logging with context |
|
|
- **PDF Processor** (`utils/pdf_processor.py`): Defensive metadata creation |
|
|
- Type validation before use |
|
|
- Try-except around chunk creation |
|
|
- Continues processing valid chunks if some fail |
|
|
- **Retriever Agent** (`agents/retriever.py`): Post-parsing diagnostic checks |
|
|
- Validates all Paper object fields |
|
|
- Reports data quality issues |
|
|
- Filters papers with critical failures |
|
|
- β
**Handles Malformed MCP Responses** - Robust against API variations |
|
|
- Authors as dict β normalized to list |
|
|
- Categories as dict β normalized to list |
|
|
- Invalid types β safe defaults with warnings |
|
|
- Prevents pipeline failures from bad data |
|
|
- β
**Graceful Degradation** - Partial success better than total failure |
|
|
- Individual paper failures don't stop the pipeline |
|
|
- Downstream agents receive only validated data |
|
|
- Clear error reporting shows what failed and why |
|
|
|
|
|
**π¦ Dependencies & Configuration:** |
|
|
- β
**New dependency**: `fastmcp>=0.1.0` for FastMCP support |
|
|
- β
**Updated `.env.example`** with new variables: |
|
|
- `USE_LEGACY_MCP`: Force legacy MCP when MCP is enabled |
|
|
- `FASTMCP_SERVER_PORT`: Configure FastMCP server port |
|
|
- β
**Enhanced documentation**: |
|
|
- `FASTMCP_REFACTOR_SUMMARY.md`: Complete architectural overview |
|
|
- `DATA_VALIDATION_FIX.md`: Multi-layer validation documentation |
|
|
- Updated `CLAUDE.md` with FastMCP integration details |
|
|
|
|
|
**π§ͺ Testing & Diagnostics:** |
|
|
- β
**38 FastMCP tests** in `tests/test_fastmcp_arxiv.py` |
|
|
- Covers all client methods (search, download, list) |
|
|
- Tests async/sync wrappers |
|
|
- Validates error handling and fallback logic |
|
|
- Ensures integration compatibility |
|
|
- β
**Data validation tests** in `test_data_validation.py` |
|
|
- Verifies Pydantic validators work correctly |
|
|
- Tests PDF processor resilience |
|
|
- Validates end-to-end data flow |
|
|
- All tests passing β |
|
|
|
|
|
**ποΈ Architecture Benefits:** |
|
|
- β
**Zero Breaking Changes** - Complete backward compatibility |
|
|
- All existing functionality preserved |
|
|
- Legacy MCP client still available |
|
|
- Direct ArxivClient unchanged |
|
|
- Downstream agents unaffected |
|
|
- β
**Improved Reliability** - Multiple layers of protection |
|
|
- Auto-fallback ensures papers always download |
|
|
- Data validation prevents pipeline crashes |
|
|
- Graceful error handling throughout |
|
|
- β
**Simplified Deployment** - No manual MCP server setup |
|
|
- FastMCP server starts automatically |
|
|
- Works on local machines and HuggingFace Spaces |
|
|
- One-line environment variable to enable MCP |
|
|
- β
**Better Observability** - Enhanced logging |
|
|
- Tracks which client succeeded |
|
|
- Reports data validation issues |
|
|
- Logs fallback events with context |
|
|
|
|
|
### Version 2.2 - November 2025 |
|
|
|
|
|
**π MCP (Model Context Protocol) Integration:** |
|
|
- β
**Optional MCP Support** - Use arXiv MCP server as alternative to direct API |
|
|
- New `MCPArxivClient` with same interface as `ArxivClient` for seamless switching |
|
|
- Toggle via `USE_MCP_ARXIV` environment variable (default: `false`) |
|
|
- Configurable storage path via `MCP_ARXIV_STORAGE_PATH` environment variable |
|
|
- Async-first design with sync wrappers for compatibility |
|
|
- β
**MCP Download Fallback** - Guaranteed PDF downloads regardless of MCP server configuration |
|
|
- Automatic fallback to direct arXiv download when MCP storage is inaccessible |
|
|
- Handles remote MCP servers that don't share filesystem with client |
|
|
- Comprehensive tool discovery logging for diagnostics |
|
|
- Run `python test_mcp_diagnostic.py` to test MCP setup |
|
|
- β
**Zero Breaking Changes** - Complete backward compatibility |
|
|
- RetrieverAgent accepts both `ArxivClient` and `MCPArxivClient` via dependency injection |
|
|
- Same state dictionary structure maintained across all agents |
|
|
- PDF processing, chunking, and RAG workflow unchanged |
|
|
- Client selection automatic based on environment variables |
|
|
|
|
|
**π¦ Dependencies Updated:** |
|
|
- β
**New MCP packages** - Added to `requirements.txt` |
|
|
- `mcp>=0.9.0` - Model Context Protocol client library |
|
|
- `arxiv-mcp-server>=0.1.0` - arXiv MCP server implementation |
|
|
- `nest-asyncio>=1.5.0` - Async/sync event loop compatibility |
|
|
- `pytest-asyncio>=0.21.0` - Async testing support |
|
|
- `pytest-cov>=4.0.0` - Test coverage reporting |
|
|
- β
**Environment configuration** - Updated `.env.example` |
|
|
- `USE_MCP_ARXIV` - Toggle MCP vs direct API (default: `false`) |
|
|
- `MCP_ARXIV_STORAGE_PATH` - MCP server storage location (default: `./data/mcp_papers/`) |
|
|
|
|
|
**π§ͺ Testing & Diagnostics:** |
|
|
- β
**MCP Test Suite** - 21 comprehensive tests in `tests/test_mcp_arxiv_client.py` |
|
|
- Async/sync wrapper tests for all client methods |
|
|
- MCP tool call mocking and response parsing |
|
|
- Error handling and fallback mechanisms |
|
|
- PDF caching and storage path management |
|
|
- β
**Diagnostic Script** - New `test_mcp_diagnostic.py` for troubleshooting |
|
|
- Environment configuration validation |
|
|
- Storage directory verification |
|
|
- MCP tool discovery and listing |
|
|
- Search and download functionality testing |
|
|
- File system state inspection |
|
|
|
|
|
**π Documentation:** |
|
|
- β
**MCP Integration Guide** - Comprehensive documentation added |
|
|
- `MCP_FIX_DOCUMENTATION.md` - Root cause analysis, architecture, troubleshooting |
|
|
- `MCP_FIX_SUMMARY.md` - Quick reference for the MCP download fix |
|
|
- Updated `CLAUDE.md` - Developer documentation with MCP integration details |
|
|
- Updated README - MCP setup instructions and configuration guide |
|
|
|
|
|
### Version 2.1 - November 2025 |
|
|
|
|
|
**π¨ Enhanced User Experience:** |
|
|
- β
**Progressive Papers Tab** - Real-time updates as papers are analyzed |
|
|
- Papers table "paints" progressively showing status: βΈοΈ Pending β β³ Analyzing β β
Complete / β οΈ Failed |
|
|
- Analysis HTML updates incrementally as each paper completes |
|
|
- Synthesis and Citations populate after all analyses finish |
|
|
- Smooth streaming experience using Python generators (`yield`) |
|
|
- β
**Clickable PDF Links** - Papers tab links now HTML-enabled |
|
|
- Link column renders as markdown for clickable "View PDF" links |
|
|
- Direct access to arXiv PDFs from results table |
|
|
- β
**Smart Confidence Filtering** - Improved result quality |
|
|
- Papers with 0% confidence (failed analyses) excluded from synthesis and citations |
|
|
- Failed papers remain visible in Papers tab with β οΈ Failed status |
|
|
- Prevents low-quality analyses from contaminating final output |
|
|
- Graceful handling when all analyses fail |
|
|
|
|
|
**π° Configurable Pricing System (November 5, 2025):** |
|
|
- β
**Dynamic pricing configuration** - No code changes needed when switching models |
|
|
- New `config/pricing.json` with pricing for gpt-4o-mini, gpt-4o, phi-4-multimodal-instruct |
|
|
- New `utils/config.py` with PricingConfig class |
|
|
- Support for multiple embedding models (text-embedding-3-small, text-embedding-3-large) |
|
|
- Updated default fallback pricing ($0.15/$0.60 per 1M tokens) for unknown models |
|
|
- β
**Environment variable overrides** - Easy testing and custom pricing |
|
|
- `PRICING_INPUT_PER_1M` - Override input token pricing for all models |
|
|
- `PRICING_OUTPUT_PER_1M` - Override output token pricing for all models |
|
|
- `PRICING_EMBEDDING_PER_1M` - Override embedding token pricing |
|
|
- β
**Thread-safe token tracking** - Accurate counts in parallel processing |
|
|
- threading.Lock in AnalyzerAgent for concurrent token accumulation |
|
|
- Model names (llm_model, embedding_model) tracked in state |
|
|
- Embedding token estimation (~300 tokens per chunk average) |
|
|
|
|
|
**π§ Critical Bug Fixes:** |
|
|
- β
**Stats tab fix (November 5, 2025)** - Fixed zeros displaying in Stats tab |
|
|
- Processing time now calculated from start_time (was showing 0.0s) |
|
|
- Token usage tracked across all agents (was showing zeros) |
|
|
- Cost estimates calculated with accurate token counts (was showing $0.00) |
|
|
- Thread-safe token accumulation in parallel processing |
|
|
- β
**LLM Response Normalization** - Prevents Pydantic validation errors |
|
|
- Handles cases where LLM returns strings for array fields |
|
|
- Auto-converts "Not available" strings to proper list format |
|
|
- Robust handling of JSON type mismatches |
|
|
|
|
|
**ποΈ Architecture Improvements:** |
|
|
- β
**Streaming Workflow** - Replaced LangGraph with generator-based streaming |
|
|
- Better user feedback with progressive updates |
|
|
- More control over workflow execution |
|
|
- Improved error handling and recovery |
|
|
- β
**State Management** - Enhanced data flow |
|
|
- `filtered_papers` and `filtered_analyses` for quality control |
|
|
- `model_desc` dictionary for model metadata |
|
|
- Cleaner separation of display vs. processing data |
|
|
|
|
|
### Version 2.0 - October 2025 |
|
|
|
|
|
> **Note**: LangGraph was later replaced in v2.1 with a generator-based streaming workflow for better real-time user feedback and progressive UI updates. |
|
|
|
|
|
**ποΈ Architecture Overhaul:** |
|
|
- β
**LangGraph integration** - Professional workflow orchestration framework |
|
|
- β
**Conditional routing** - Skips downstream agents when no papers found |
|
|
- β
**Parallel processing** - Analyze 4 papers simultaneously (ThreadPoolExecutor) |
|
|
- β
**Circuit breaker** - Stops after 2 consecutive failures |
|
|
|
|
|
**β‘ Performance Improvements (3x Faster):** |
|
|
- β
**Timeout management** - 60s analyzer, 90s synthesis |
|
|
- β
**Token limits** - max_tokens 1500/2500 prevents slow responses |
|
|
- β
**Optimized prompts** - Reduced metadata overhead (-10% tokens) |
|
|
- β
**Result**: 2-3 min for 5 papers (was 5-10 min) |
|
|
|
|
|
**π¨ UX Enhancements:** |
|
|
- β
**Paper titles in Synthesis** - Shows "Title (arXiv ID)" instead of just IDs |
|
|
- β
**Confidence for contradictions** - Displayed alongside consensus points |
|
|
- β
**Graceful error messages** - Friendly DataFrame with actionable suggestions |
|
|
- β
**Enhanced error UI** - Contextual icons and helpful tips |
|
|
|
|
|
**π Critical Bug Fixes:** |
|
|
- β
**Cache mutation fix** - Deep copy prevents repeated query errors |
|
|
- β
**No papers crash fix** - Graceful termination instead of NoneType error |
|
|
- β
**Validation fix** - Removed processing_time from initial state |
|
|
|
|
|
**π Observability:** |
|
|
- β
**Timestamp logging** - Added to all 10 modules for better debugging |
|
|
|
|
|
**π§ Bug Fix (October 28, 2025):** |
|
|
- β
**Circuit breaker fix** - Reset counter per batch to prevent cascade failures in parallel processing |
|
|
- Fixed issue where 2 failures in one batch caused all papers in next batch to skip |
|
|
- Each batch now gets fresh attempt regardless of previous batch failures |
|
|
- Maintains failure tracking within batch without cross-batch contamination |
|
|
|
|
|
### Previous Updates (Early 2025) |
|
|
- β
Fixed datetime JSON serialization error (added `mode='json'` to `model_dump()`) |
|
|
- β
Fixed AttributeError when formatting cached results (separated cache data from output data) |
|
|
- β
Fixed Pydantic V2 deprecation warning (replaced `.dict()` with `.model_dump()`) |
|
|
- β
Added GitHub Actions workflow for automated deployment to Hugging Face Spaces |
|
|
- β
Fixed JSON serialization error in semantic cache (Pydantic model conversion) |
|
|
- β
Added comprehensive test suite for Analyzer Agent (18 tests) |
|
|
- β
Added pytest and pytest-mock to dependencies |
|
|
- β
Enhanced error handling and logging across agents |
|
|
- β
Updated documentation with testing guidelines |
|
|
- β
Improved type safety with Pydantic schemas |
|
|
- β
Added QUICKSTART.md for quick setup |
|
|
|
|
|
### Completed Features (Recent) |
|
|
- [x] LangGraph workflow orchestration with conditional routing β¨ NEW (v2.6) |
|
|
- [x] LangFuse observability with automatic tracing β¨ NEW (v2.6) |
|
|
- [x] Performance analytics API (latency, tokens, costs, errors) β¨ NEW (v2.6) |
|
|
- [x] Trace querying and export (JSON/CSV) β¨ NEW (v2.6) |
|
|
- [x] Agent trajectory analysis β¨ NEW (v2.6) |
|
|
- [x] Workflow checkpointing with MemorySaver β¨ NEW (v2.6) |
|
|
- [x] msgpack serialization fix for LangGraph state β¨ NEW (v2.6) |
|
|
- [x] Enhanced LLM response normalization (v2.5) |
|
|
- [x] Triple-layer validation strategy (v2.5) |
|
|
- [x] Comprehensive schema validator tests (15 tests) (v2.5) |
|
|
- [x] Phase 1 code cleanup (~320 lines removed) (v2.5) |
|
|
- [x] Automated HuggingFace deployment with orphan branch strategy (v2.4) |
|
|
- [x] Automatic MCP dependency conflict resolution on HF Spaces (v2.4) |
|
|
- [x] Multiple installation methods with dependency management (v2.4) |
|
|
- [x] Complete data directory exclusion from git (v2.4) |
|
|
- [x] FastMCP architecture with auto-start server (v2.3) |
|
|
- [x] Intelligent cascading fallback (MCP β Direct API) (v2.3) |
|
|
- [x] Multi-layer data validation (Pydantic + MCP + PDF processor + Retriever) (v2.3) |
|
|
- [x] 96 total tests (24 analyzer + 21 legacy MCP + 38 FastMCP + 15 schema validators) (v2.3-v2.5) |
|
|
- [x] MCP (Model Context Protocol) integration with arXiv (v2.2) |
|
|
- [x] Configurable pricing system (v2.1) |
|
|
- [x] Progressive UI with streaming results (v2.1) |
|
|
- [x] Smart quality filtering (0% confidence exclusion) (v2.1) |
|
|
|
|
|
### Coming Soon |
|
|
- [ ] Tests for Retriever, Synthesis, and Citation agents |
|
|
- [ ] Integration tests for full LangGraph workflow |
|
|
- [ ] CI/CD pipeline with automated testing (GitHub Actions already set up for deployment) |
|
|
- [ ] Docker containerization improvements |
|
|
- [ ] Performance benchmarking suite with LangFuse analytics |
|
|
- [ ] Pre-commit hooks for code quality |
|
|
- [ ] Additional MCP server support (beyond arXiv) |
|
|
- [ ] WebSocket support for real-time FastMCP progress updates |
|
|
- [ ] Streaming workflow execution with LangGraph |
|
|
- [ ] Human-in-the-loop approval nodes |
|
|
- [ ] A/B testing for prompt engineering |
|
|
- [ ] Custom metrics and alerting with LangFuse |
|
|
|
|
|
--- |
|
|
|
|
|
**Built with β€οΈ using Azure OpenAI, LangGraph, LangFuse, ChromaDB, and Gradio** |
|
|
|