File size: 23,566 Bytes
aca8ab4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Core Architecture

This is a **multi-agent RAG system** for analyzing academic papers from arXiv. The system uses **LangGraph** for workflow orchestration and **LangFuse** for comprehensive observability.

### Agent Pipeline Flow

```
User Query β†’ Retriever β†’ Analyzer β†’ Filter β†’ Synthesis β†’ Citation β†’ Output
                ↓          ↓         ↓         ↓           ↓
            [LangFuse Tracing for All Nodes]
```

**Orchestration**: The workflow is managed by LangGraph (`orchestration/workflow_graph.py`):
- Conditional routing (early termination if no papers found or all analyses fail)
- Automatic checkpointing with `MemorySaver`
- State management with type-safe `AgentState` TypedDict
- Node wrappers in `orchestration/nodes.py` with automatic tracing

**State Dictionary** (`utils/langgraph_state.py`): All agents operate on a shared state dictionary that flows through the pipeline:
- `query`: User's research question
- `category`: Optional arXiv category filter
- `num_papers`: Number of papers to analyze
- `papers`: List of Paper objects (populated by Retriever)
- `chunks`: List of PaperChunk objects (populated by Retriever)
- `analyses`: List of Analysis objects (populated by Analyzer)
- `synthesis`: SynthesisResult object (populated by Synthesis)
- `validated_output`: ValidatedOutput object (populated by Citation)
- `errors`: List of error messages accumulated across agents
- `token_usage`: Dict tracking input/output/embedding tokens
- `trace_id`: LangFuse trace identifier (for observability)
- `session_id`: User session tracking
- `user_id`: Optional user identifier

**IMPORTANT**: Only msgpack-serializable data should be stored in the state. Do NOT add complex objects like Gradio Progress, file handles, or callbacks to the state dictionary (see BUGFIX_MSGPACK_SERIALIZATION.md).

### Agent Responsibilities

1. **RetrieverAgent** (`agents/retriever.py`):
   - Decorated with `@observe` for LangFuse tracing
   - Searches arXiv API using `ArxivClient`, `MCPArxivClient`, or `FastMCPArxivClient` (configurable via env)
   - Downloads PDFs to `data/papers/` (direct API) or MCP server storage (MCP mode)
   - **Intelligent Fallback**: Automatically falls back to direct API if primary MCP client fails
   - Processes PDFs with `PDFProcessor` (500-token chunks, 50-token overlap)
   - Generates embeddings via `EmbeddingGenerator` (Azure OpenAI text-embedding-3-small, traced)
   - Stores chunks in ChromaDB via `VectorStore`
   - **FastMCP Support**: Auto-start FastMCP server for standardized arXiv access

2. **AnalyzerAgent** (`agents/analyzer.py`):
   - Decorated with `@observe(as_type="generation")` for LLM call tracing
   - Analyzes each paper individually using RAG
   - Uses 4 broad queries per paper: methodology, results, conclusions, limitations
   - Deduplicates chunks by chunk_id
   - Calls Azure OpenAI with **temperature=0** and JSON mode
   - RAG retrieval automatically traced via `@observe` on `RAGRetriever.retrieve()`
   - Returns structured `Analysis` objects with confidence scores

3. **SynthesisAgent** (`agents/synthesis.py`):
   - Decorated with `@observe(as_type="generation")` for LLM call tracing
   - Compares findings across all papers
   - Identifies consensus points, contradictions, research gaps
   - Creates executive summary addressing user's query
   - Uses **temperature=0** for deterministic outputs
   - Returns `SynthesisResult` with confidence scores

4. **CitationAgent** (`agents/citation.py`):
   - Decorated with `@observe(as_type="span")` for data processing tracing
   - Generates APA-formatted citations for all papers
   - Validates synthesis claims against source papers
   - Calculates cost estimates (GPT-4o-mini pricing)
   - Creates final `ValidatedOutput` with all metadata

### Critical Architecture Patterns

**RAG Context Formatting**: `RAGRetriever.format_context()` creates structured context with:
```
[Chunk N] Paper: {title}
Authors: {authors}
Section: {section}
Page: {page_number}
Source: {arxiv_url}
--------------------------------------------------------------------------------
{content}
```

**Chunking Strategy**: PDFProcessor uses tiktoken encoding (cl100k_base) for precise token counting:
- Chunk size: 500 tokens
- Overlap: 50 tokens
- Page markers preserved: `[Page N]` tags in text
- Section detection via keyword matching (abstract, introduction, results, etc.)

**Vector Store Filtering**: ChromaDB searches support paper_id filtering:
- Single paper: `{"paper_id": "2401.00001"}`
- Multiple papers: `{"paper_id": {"$in": ["2401.00001", "2401.00002"]}}`

**Semantic Caching**: Cache hits when cosine similarity β‰₯ 0.95 between query embeddings. Cache key includes both query and category.

**Error Handling Philosophy**: Agents catch exceptions, log errors, append to `state["errors"]`, and return partial results rather than failing completely. For example, Analyzer returns confidence_score=0.0 on failure.

### LangGraph Orchestration (`orchestration/`)

**Workflow Graph** (`orchestration/workflow_graph.py`):
- `create_workflow_graph()`: Creates StateGraph with all nodes and conditional edges
- `run_workflow()`: Sync wrapper for Gradio compatibility (uses `nest-asyncio`)
- `run_workflow_async()`: Async streaming execution
- `get_workflow_state()`: Retrieve current state by thread ID

**Node Wrappers** (`orchestration/nodes.py`):
- `retriever_node()`: Executes RetrieverAgent with LangFuse tracing
- `analyzer_node()`: Executes AnalyzerAgent with LangFuse tracing
- `filter_node()`: Filters out low-confidence analyses (confidence_score < 0.7)
- `synthesis_node()`: Executes SynthesisAgent with LangFuse tracing
- `citation_node()`: Executes CitationAgent with LangFuse tracing

**Conditional Routing**:
- `should_continue_after_retriever()`: Returns "END" if no papers found, else "analyzer"
- `should_continue_after_filter()`: Returns "END" if all analyses filtered out, else "synthesis"

**Workflow Execution Flow**:
```python
# In app.py
workflow_app = create_workflow_graph(
    retriever_agent=self.retriever_agent,
    analyzer_agent=self.analyzer_agent,
    synthesis_agent=self.synthesis_agent,
    citation_agent=self.citation_agent
)

# Run workflow with checkpointing
config = {"configurable": {"thread_id": session_id}}
final_state = run_workflow(workflow_app, initial_state, config, progress)
```

**State Serialization**:
- LangGraph uses msgpack for state checkpointing
- **CRITICAL**: Only msgpack-serializable types allowed in state
- βœ… Primitives: str, int, float, bool, None
- βœ… Collections: list, dict
- βœ… Pydantic models (via `.model_dump()`)
- ❌ Complex objects: Gradio Progress, file handles, callbacks
- See BUGFIX_MSGPACK_SERIALIZATION.md for detailed fix documentation

## Development Commands

### Running the Application
```bash
# Start Gradio interface (http://localhost:7860)
python app.py
```

### Testing
```bash
# Run all tests with verbose output
pytest tests/ -v

# Run specific test file
pytest tests/test_analyzer.py -v

# Run single test
pytest tests/test_analyzer.py::TestAnalyzerAgent::test_analyze_paper_success -v

# Run with coverage
pytest tests/ --cov=agents --cov=rag --cov=utils -v

# Run tests matching pattern
pytest tests/ -k "analyzer" -v
```

### Environment Setup
```bash
# Copy environment template
cp .env.example .env

# Required variables in .env:
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
# AZURE_OPENAI_API_KEY=your-key
# AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini
# AZURE_OPENAI_API_VERSION=2024-02-01  # optional

# Optional MCP (Model Context Protocol) variables:
# USE_MCP_ARXIV=false              # Set to 'true' to use MCP (FastMCP by default)
# USE_LEGACY_MCP=false              # Set to 'true' to use legacy MCP instead of FastMCP
# MCP_ARXIV_STORAGE_PATH=./data/mcp_papers/  # MCP server storage path
# FASTMCP_SERVER_PORT=5555          # Port for FastMCP server (auto-started)

# Optional LangFuse observability variables:
# LANGFUSE_ENABLED=true            # Enable LangFuse tracing
# LANGFUSE_PUBLIC_KEY=pk-lf-...    # LangFuse public key
# LANGFUSE_SECRET_KEY=sk-lf-...    # LangFuse secret key
# LANGFUSE_HOST=https://cloud.langfuse.com  # LangFuse host (cloud or self-hosted)
# LANGFUSE_TRACE_ALL_LLM=true      # Auto-trace all Azure OpenAI calls
# LANGFUSE_TRACE_RAG=true          # Trace RAG operations
# LANGFUSE_FLUSH_AT=15             # Batch size for flushing traces
# LANGFUSE_FLUSH_INTERVAL=10       # Flush interval in seconds
```

### Data Management
```bash
# Clear vector store (useful for testing)
rm -rf data/chroma_db/

# Clear cached papers
rm -rf data/papers/

# Clear semantic cache
rm -rf data/cache/
```

## Key Implementation Details

### Azure OpenAI Integration

All agents use **temperature=0** and **response_format={"type": "json_object"}** for deterministic, structured outputs. Initialize clients like:

```python
from openai import AzureOpenAI
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
```

### Pydantic Schemas (`utils/schemas.py` and `utils/langgraph_state.py`)

All data structures use Pydantic for validation:
- `Paper`: arXiv paper metadata
- `PaperChunk`: Text chunk with metadata
- `Analysis`: Individual paper analysis results
- `SynthesisResult`: Cross-paper synthesis with ConsensusPoint and Contradiction
- `ValidatedOutput`: Final output with citations and cost tracking
- `AgentState`: TypedDict for LangGraph state management (used in workflow orchestration)

**Observability Models** (`observability/trace_reader.py`):
- `TraceInfo`: Trace metadata and performance metrics
- `SpanInfo`: Agent execution data with timings
- `GenerationInfo`: LLM call details (prompt, completion, tokens, cost)

**Analytics Models** (`observability/analytics.py`):
- `AgentStats`: Per-agent performance statistics (latency, tokens, cost, errors)
- `WorkflowStats`: Workflow-level aggregated metrics
- `AgentTrajectory`: Complete execution path with timings

### Retry Logic

ArxivClient uses tenacity for resilient API calls:
- 3 retry attempts
- Exponential backoff (4s min, 10s max)
- Applied to search_papers() and download_paper()

### MCP (Model Context Protocol) Integration

The system supports **optional** integration with arXiv MCP servers as an alternative to direct arXiv API access. **FastMCP is now the default MCP implementation** when `USE_MCP_ARXIV=true`.

**Architecture Overview**:
- Three client options: Direct ArxivClient, Legacy MCPArxivClient, FastMCPArxivClient
- All clients implement the same interface for drop-in compatibility
- RetrieverAgent includes intelligent fallback from MCP to direct API
- App selects client based on environment variables with cascading fallback

**Client Selection Logic** (`app.py` lines 75-135):
1. `USE_MCP_ARXIV=false` β†’ Direct ArxivClient (default)
2. `USE_MCP_ARXIV=true` + `USE_LEGACY_MCP=true` β†’ Legacy MCPArxivClient
3. `USE_MCP_ARXIV=true` (default) β†’ FastMCPArxivClient with auto-start server
4. Fallback cascade: FastMCP β†’ Legacy MCP β†’ Direct API

**FastMCP Implementation** (Recommended):

**Server** (`utils/fastmcp_arxiv_server.py`):
- Auto-start FastMCP server in background thread
- Implements tools: `search_papers`, `download_paper`, `list_papers`
- Uses standard `arxiv` library for arXiv API access
- Configurable port (default: 5555) via `FASTMCP_SERVER_PORT`
- Singleton pattern for application-wide server instance
- Graceful shutdown on app exit
- Compatible with local and HuggingFace Spaces deployment

**Client** (`utils/fastmcp_arxiv_client.py`):
- Async-first design with sync wrappers for Gradio compatibility
- Connects to FastMCP server via HTTP
- Lazy client initialization on first use
- Reuses legacy MCP's robust `_parse_mcp_paper()` logic
- **Built-in fallback**: Direct arXiv download if MCP fails
- Same retry logic (3 attempts, exponential backoff)
- Uses `nest-asyncio` for event loop compatibility

**Retriever Fallback Logic** (`agents/retriever.py` lines 68-156):
- Two-tier fallback: Primary client β†’ Fallback client
- `_search_with_fallback()`: Try primary MCP, then fallback to direct API
- `_download_with_fallback()`: Try primary MCP, then fallback to direct API
- Ensures paper retrieval never fails due to MCP issues
- Detailed logging of fallback events

**Legacy MCP Client** (`utils/mcp_arxiv_client.py`):
- In-process handler calls (imports MCP server functions directly)
- Stdio protocol for external MCP servers
- Maintained for backward compatibility
- Enable via `USE_LEGACY_MCP=true` when `USE_MCP_ARXIV=true`
- All features from legacy implementation preserved

**Key Features Across All MCP Clients**:
- Async-first design with sync wrappers
- MCP tools: `search_papers`, `download_paper`, `list_papers`
- Transforms MCP responses to `Paper` Pydantic objects
- Same retry logic and caching behavior as ArxivClient
- Automatic direct download fallback if MCP storage inaccessible

**Zero Breaking Changes**:
- Downstream agents (Analyzer, Synthesis, Citation) unaffected
- Same state dictionary structure maintained
- PDF processing, chunking, and RAG unchanged
- Toggle via environment variables without code changes
- Legacy MCP remains available for compatibility

**Configuration** (`.env.example`):
```bash
# Enable MCP (FastMCP by default)
USE_MCP_ARXIV=true

# Force legacy MCP instead of FastMCP (optional)
USE_LEGACY_MCP=false

# Storage path for papers (used by all MCP clients)
MCP_ARXIV_STORAGE_PATH=./data/mcp_papers/

# FastMCP server port
FASTMCP_SERVER_PORT=5555
```

**Testing**:
- FastMCP: `pytest tests/test_fastmcp_arxiv.py -v` (38 tests)
- Legacy MCP: `pytest tests/test_mcp_arxiv_client.py -v` (21 tests)
- Both test suites cover: search, download, caching, error handling, fallback logic

### PDF Processing Edge Cases

- Some PDFs may be scanned images (extraction fails gracefully)
- Page markers `[Page N]` extracted during text extraction for chunk attribution
- Section detection is heuristic-based (checks first 5 lines of chunk)
- Empty pages or extraction failures logged as warnings, not errors

### Gradio UI Structure (`app.py`)

ResearchPaperAnalyzer class orchestrates the workflow:
1. Initialize LangFuse client and instrument Azure OpenAI (if enabled)
2. Create LangGraph workflow with all agents
3. Check semantic cache first
4. Initialize state dictionary with `create_initial_state()`
5. Generate unique `session_id` for trace tracking
6. Run LangGraph workflow via `run_workflow()` from orchestration module
7. Flush LangFuse traces to ensure upload
8. Cache results on success
9. Format output for 5 tabs: Papers, Analysis, Synthesis, Citations, Stats

**LangGraph Workflow Execution**:
- Nodes execute in order: retriever β†’ analyzer β†’ filter β†’ synthesis β†’ citation
- Conditional edges for early termination (no papers found, all analyses failed)
- Checkpointing enabled via `MemorySaver` for workflow state persistence
- Progress updates still work via local variable (NOT in state to avoid msgpack serialization issues)

## Testing Patterns

Tests use mocks to avoid external dependencies:

```python
# Mock RAG retriever
mock_retriever = Mock(spec=RAGRetriever)
mock_retriever.retrieve.return_value = {"chunks": [...], "chunk_ids": [...]}

# Mock Azure OpenAI
with patch('agents.analyzer.AzureOpenAI', return_value=mock_client):
    agent = AnalyzerAgent(rag_retriever=mock_retriever)
```

Current test coverage:
- **AnalyzerAgent** (18 tests): Core analysis workflow and error handling
- **MCPArxivClient** (21 tests): Legacy MCP tool integration, async/sync wrappers, response parsing
- **FastMCPArxiv** (38 tests): FastMCP server, client, integration, error handling, fallback logic

When adding tests for other agents, follow the same pattern:
- Fixtures for mock dependencies
- Test both success and error paths
- Verify state transformations
- Test edge cases (empty inputs, API failures)
- For async code, use `pytest-asyncio` with `@pytest.mark.asyncio`

## Observability and Analytics

### LangFuse Integration

The system automatically traces all agent executions and LLM calls when LangFuse is enabled:

**Configuration** (`utils/langfuse_client.py`):
- `initialize_langfuse()`: Initialize global LangFuse client at startup
- `instrument_openai()`: Auto-trace all Azure OpenAI API calls
- `@observe` decorator: Trace custom functions/spans
- `flush_langfuse()`: Ensure all traces uploaded before shutdown

**Automatic Tracing**:
- All agent `run()` methods decorated with `@observe`
- LLM calls automatically captured (prompt, completion, tokens, cost)
- RAG operations traced (embeddings, vector search)
- Workflow state transitions logged

### Trace Querying (`observability/trace_reader.py`)

```python
from observability import TraceReader

reader = TraceReader()

# Get recent traces
traces = reader.get_traces(limit=10)

# Filter by user/session
traces = reader.get_traces(user_id="user-123", session_id="session-abc")

# Filter by date range
from datetime import datetime, timedelta
start = datetime.now() - timedelta(days=7)
traces = reader.filter_by_date_range(traces, start_date=start)

# Get specific agent executions
analyzer_spans = reader.filter_by_agent(traces, agent_name="analyzer_agent")

# Export traces
reader.export_traces_to_json(traces, "traces.json")
reader.export_traces_to_csv(traces, "traces.csv")
```

### Performance Analytics (`observability/analytics.py`)

```python
from observability import AgentPerformanceAnalyzer, AgentTrajectoryAnalyzer

# Performance metrics
perf_analyzer = AgentPerformanceAnalyzer()

# Get agent latency statistics
stats = perf_analyzer.agent_latency_stats("analyzer_agent", days=7)
print(f"P95 latency: {stats.p95_latency_ms:.2f}ms")

# Token usage breakdown
token_usage = perf_analyzer.token_usage_breakdown(days=7)
print(f"Total tokens: {sum(token_usage.values())}")

# Cost per agent
costs = perf_analyzer.cost_per_agent(days=7)
print(f"Total cost: ${sum(costs.values()):.4f}")

# Error rates
error_rates = perf_analyzer.error_rates(days=7)

# Workflow summary
summary = perf_analyzer.workflow_performance_summary(days=7)
print(f"Success rate: {summary.success_rate:.1f}%")
print(f"Avg duration: {summary.avg_duration_ms/1000:.2f}s")

# Trajectory analysis
traj_analyzer = AgentTrajectoryAnalyzer()
analysis = traj_analyzer.analyze_execution_paths(days=7)
print(f"Most common path: {analysis['most_common_path']}")
```

See `observability/README.md` for comprehensive documentation.

## Common Modification Points

**Adding a new agent**:
1. Create agent class with `run(state) -> state` method
2. Decorate `run()` with `@observe` for tracing
3. Add node wrapper in `orchestration/nodes.py`
4. Add node to workflow graph in `orchestration/workflow_graph.py`
5. Update conditional routing if needed

**Modifying chunking**:
- Adjust `chunk_size` and `chunk_overlap` in PDFProcessor initialization
- Affects retrieval quality vs. context size tradeoff
- Default 500/50 balances precision and coverage

**Changing LLM model**:
- Update `AZURE_OPENAI_DEPLOYMENT_NAME` in .env
- Cost estimates in CitationAgent may need adjustment
- Temperature must stay 0 for deterministic outputs

**Adding arXiv categories**:
- Extend `ARXIV_CATEGORIES` list in `app.py`
- Format: `"code - Description"` (e.g., `"cs.AI - Artificial Intelligence"`)

**Switching between arXiv clients**:
- Set `USE_MCP_ARXIV=false` (default) β†’ Direct ArxivClient
- Set `USE_MCP_ARXIV=true` β†’ FastMCPArxivClient (default MCP)
- Set `USE_MCP_ARXIV=true` + `USE_LEGACY_MCP=true` β†’ Legacy MCPArxivClient
- Configure `MCP_ARXIV_STORAGE_PATH` for MCP server's storage location
- Configure `FASTMCP_SERVER_PORT` for FastMCP server port (default: 5555)
- No code changes required - client selected automatically in `app.py`
- All clients implement identical interface for seamless switching
- FastMCP server auto-starts when FastMCP client is selected

## Cost and Performance Considerations

- Target: <$0.50 per 5-paper analysis
- Semantic cache reduces repeated query costs
- ChromaDB persistence prevents re-embedding same papers
- Batch embedding generation in PDFProcessor for efficiency
- Token usage tracked per request for monitoring
- LangFuse observability enables cost optimization insights
- LangGraph overhead: <1% for state management
- Trace upload overhead: ~5-10ms per trace (async, negligible impact)

## Key Files and Modules

### Core Application
- `app.py`: Gradio UI and workflow orchestration entry point
- `utils/config.py`: Configuration management (Azure OpenAI, LangFuse, MCP)
- `utils/schemas.py`: Pydantic data models for validation
- `utils/langgraph_state.py`: LangGraph state TypedDict and helpers

### Agents
- `agents/retriever.py`: Paper retrieval, PDF processing, embeddings
- `agents/analyzer.py`: Individual paper analysis with RAG
- `agents/synthesis.py`: Cross-paper synthesis and insights
- `agents/citation.py`: Citation generation and validation

### RAG Components
- `rag/pdf_processor.py`: PDF text extraction and chunking
- `rag/embeddings.py`: Batch embedding generation (Azure OpenAI)
- `rag/vector_store.py`: ChromaDB vector store management
- `rag/retrieval.py`: RAG retrieval with formatted context

### Orchestration (LangGraph)
- `orchestration/__init__.py`: Module exports
- `orchestration/nodes.py`: Node wrappers with tracing
- `orchestration/workflow_graph.py`: LangGraph workflow builder

### Observability (LangFuse)
- `observability/__init__.py`: Module exports
- `observability/trace_reader.py`: Trace querying and export API
- `observability/analytics.py`: Performance analytics and trajectory analysis
- `observability/README.md`: Comprehensive observability documentation
- `utils/langfuse_client.py`: LangFuse client initialization and helpers

### Utilities
- `utils/arxiv_client.py`: Direct arXiv API client with retry logic
- `utils/mcp_arxiv_client.py`: Legacy MCP client implementation
- `utils/fastmcp_arxiv_client.py`: FastMCP client (recommended)
- `utils/fastmcp_arxiv_server.py`: FastMCP server with auto-start
- `utils/semantic_cache.py`: Query caching with embeddings

### Documentation
- `CLAUDE.md`: This file - comprehensive developer guide
- `README.md`: User-facing project documentation
- `REFACTORING_SUMMARY.md`: LangGraph + LangFuse refactoring details
- `BUGFIX_MSGPACK_SERIALIZATION.md`: msgpack serialization fix documentation
- `.env.example`: Environment variable template with all options

## Version History and Recent Changes

### Version 2.6: LangGraph Orchestration + LangFuse Observability
**Added:**
- LangGraph workflow orchestration with conditional routing
- LangFuse automatic tracing for all agents and LLM calls
- Observability Python API for trace querying and analytics
- Performance analytics (latency, tokens, cost, error rates)
- Agent trajectory analysis
- Checkpointing with `MemorySaver`

**Fixed:**
- msgpack serialization error (removed Gradio Progress from state)

**Dependencies Added:**
- `langgraph>=0.2.0`
- `langfuse>=2.0.0`
- `langfuse-openai>=1.0.0`

**Breaking Changes:**
- None! Fully backward compatible

**Documentation:**
- Created `observability/README.md`
- Created `REFACTORING_SUMMARY.md`
- Created `BUGFIX_MSGPACK_SERIALIZATION.md`
- Updated `CLAUDE.md` (this file)
- Updated `.env.example`

See `REFACTORING_SUMMARY.md` for detailed migration guide and architecture changes.