Spaces:

samir72
/

Multi-Agent-Research-Paper-Analysis-System

Sleeping

App Files Files Community

Multi-Agent-Research-Paper-Analysis-System / observability /README.md

GitHub Actions

Clean sync from GitHub - no large files in history

aca8ab4 about 1 month ago

preview code

raw

history blame contribute delete

9.95 kB

	# Observability Module

	This module provides comprehensive observability for the multi-agent RAG system using LangFuse tracing and analytics.

	## Features

	- Trace Reading API: Query and filter LangFuse traces programmatically
	- Performance Analytics: Agent-level metrics including latency, token usage, and costs
	- Trajectory Analysis: Analyze agent execution paths and workflow patterns
	- Export Capabilities: Export traces to JSON/CSV for external analysis

	## Quick Start

	### 1. Configure LangFuse

	Add your LangFuse credentials to `.env`:

	```bash
	LANGFUSE_ENABLED=true
	LANGFUSE_PUBLIC_KEY=pk-lf-your-public-key-here
	LANGFUSE_SECRET_KEY=sk-lf-your-secret-key-here
	LANGFUSE_HOST=https://cloud.langfuse.com
	```

	### 2. Run Your Workflow

	The system automatically traces all agent executions, LLM calls, and RAG operations.

	### 3. Query Traces

	Use the Python API to read and analyze traces:

	```python
	from observability import TraceReader, AgentPerformanceAnalyzer

	# Initialize trace reader
	reader = TraceReader()

	# Get recent traces
	traces = reader.get_traces(limit=10)

	# Get traces for a specific session
	session_traces = reader.get_traces(session_id="session-abc123")

	# Filter by agent
	retriever_spans = reader.filter_by_agent("retriever_agent", limit=50)

	# Get specific trace
	trace = reader.get_trace_by_id("trace-xyz")
	```

	## Trace Reader API

	### TraceReader

	Query and retrieve traces from LangFuse.

	```python
	from observability import TraceReader
	from datetime import datetime, timedelta

	reader = TraceReader()

	# Get traces with filters
	traces = reader.get_traces(
	limit=50,
	user_id="user-123",
	session_id="session-abc",
	from_timestamp=datetime.now() - timedelta(days=7),
	to_timestamp=datetime.now()
	)

	# Filter by date range
	recent_traces = reader.filter_by_date_range(
	from_date=datetime.now() - timedelta(days=1),
	to_date=datetime.now(),
	limit=100
	)

	# Get LLM generations
	generations = reader.get_generations(trace_id="trace-xyz")

	# Export to files
	reader.export_traces_to_json(traces, "traces.json")
	reader.export_traces_to_csv(traces, "traces.csv")
	```

	## Performance Analytics API

	### AgentPerformanceAnalyzer

	Analyze agent performance metrics.

	```python
	from observability import AgentPerformanceAnalyzer

	analyzer = AgentPerformanceAnalyzer()

	# Get latency statistics for an agent
	stats = analyzer.agent_latency_stats("retriever_agent", days=7)
	print(f"Average latency: {stats.avg_latency_ms:.2f}ms")
	print(f"P95 latency: {stats.p95_latency_ms:.2f}ms")
	print(f"Success rate: {stats.success_rate:.1f}%")

	# Get token usage breakdown
	token_usage = analyzer.token_usage_breakdown(days=7)
	for agent, usage in token_usage.items():
	print(f"{agent}: {usage['total']:,} tokens")

	# Get cost breakdown per agent
	costs = analyzer.cost_per_agent(session_id="session-abc")
	for agent, cost in costs.items():
	print(f"{agent}: ${cost:.4f}")

	# Get error rates
	error_stats = analyzer.error_rates(days=30)
	for agent, stats in error_stats.items():
	print(f"{agent}: {stats['error_rate_percent']:.2f}% errors")

	# Get workflow performance summary
	workflow_stats = analyzer.workflow_performance_summary(days=7)
	print(f"Total runs: {workflow_stats.total_runs}")
	print(f"Average duration: {workflow_stats.avg_duration_ms:.2f}ms")
	print(f"Total cost: ${workflow_stats.total_cost:.4f}")
	```

	## Trajectory Analysis API

	### AgentTrajectoryAnalyzer

	Analyze agent execution paths and workflow patterns.

	```python
	from observability import AgentTrajectoryAnalyzer

	analyzer = AgentTrajectoryAnalyzer()

	# Get agent trajectories
	trajectories = analyzer.get_trajectories(session_id="session-abc", days=7)

	for traj in trajectories:
	print(f"Trace: {traj.trace_id}")
	print(f"Duration: {traj.total_duration_ms:.2f}ms")
	print(f"Path: {' → '.join(traj.agent_sequence)}")
	print(f"Success: {traj.success}")

	# Analyze execution paths
	path_analysis = analyzer.analyze_execution_paths(days=7)
	print(f"Total workflows: {path_analysis['total_workflows']}")
	print(f"Unique paths: {path_analysis['unique_paths']}")
	print(f"Most common path: {path_analysis['most_common_path']}")

	# Compare two workflow executions
	comparison = analyzer.compare_trajectories("trace-1", "trace-2")
	print(f"Duration difference: {comparison['duration_diff_ms']:.2f}ms")
	print(f"Same path: {comparison['same_path']}")
	```

	## Data Models

	### TraceInfo

	```python
	class TraceInfo(BaseModel):
	id: str
	name: str
	user_id: Optional[str]
	session_id: Optional[str]
	timestamp: datetime
	metadata: Dict[str, Any]
	duration_ms: Optional[float]
	total_cost: Optional[float]
	token_usage: Dict[str, int]
	```

	### AgentStats

	```python
	class AgentStats(BaseModel):
	agent_name: str
	execution_count: int
	avg_latency_ms: float
	p50_latency_ms: float
	p95_latency_ms: float
	p99_latency_ms: float
	min_latency_ms: float
	max_latency_ms: float
	success_rate: float
	total_cost: float
	```

	### WorkflowStats

	```python
	class WorkflowStats(BaseModel):
	total_runs: int
	avg_duration_ms: float
	p50_duration_ms: float
	p95_duration_ms: float
	p99_duration_ms: float
	success_rate: float
	total_cost: float
	avg_cost_per_run: float
	total_tokens: int
	```

	### AgentTrajectory

	```python
	class AgentTrajectory(BaseModel):
	trace_id: str
	session_id: Optional[str]
	start_time: datetime
	total_duration_ms: float
	agent_sequence: List[str]
	agent_timings: Dict[str, float]
	agent_costs: Dict[str, float]
	errors: List[str]
	success: bool
	```

	## Example: Performance Dashboard Script

	```python
	#!/usr/bin/env python3
	"""Generate performance dashboard from traces."""

	from datetime import datetime, timedelta
	from observability import AgentPerformanceAnalyzer, AgentTrajectoryAnalyzer

	def main():
	perf = AgentPerformanceAnalyzer()
	traj = AgentTrajectoryAnalyzer()

	print("=" * 60)
	print("AGENT PERFORMANCE DASHBOARD - Last 7 Days")
	print("=" * 60)

	# Workflow summary
	workflow_stats = perf.workflow_performance_summary(days=7)
	if workflow_stats:
	print(f"\nWorkflow Summary:")
	print(f" Total Runs: {workflow_stats.total_runs}")
	print(f" Avg Duration: {workflow_stats.avg_duration_ms/1000:.2f}s")
	print(f" P95 Duration: {workflow_stats.p95_duration_ms/1000:.2f}s")
	print(f" Success Rate: {workflow_stats.success_rate:.1f}%")
	print(f" Total Cost: ${workflow_stats.total_cost:.4f}")
	print(f" Avg Cost/Run: ${workflow_stats.avg_cost_per_run:.4f}")

	# Agent latency stats
	print(f"\nAgent Latency Statistics:")
	for agent_name in ["retriever_agent", "analyzer_agent", "synthesis_agent"]:
	stats = perf.agent_latency_stats(agent_name, days=7)
	if stats:
	print(f"\n {agent_name}:")
	print(f" Executions: {stats.execution_count}")
	print(f" Avg Latency: {stats.avg_latency_ms/1000:.2f}s")
	print(f" P95 Latency: {stats.p95_latency_ms/1000:.2f}s")
	print(f" Success Rate: {stats.success_rate:.1f}%")

	# Cost breakdown
	print(f"\nCost Breakdown:")
	costs = perf.cost_per_agent(days=7)
	for agent, cost in sorted(costs.items(), key=lambda x: x[1], reverse=True):
	print(f" {agent}: ${cost:.4f}")

	# Path analysis
	print(f"\nExecution Path Analysis:")
	path_analysis = traj.analyze_execution_paths(days=7)
	if path_analysis:
	print(f" Total Workflows: {path_analysis['total_workflows']}")
	print(f" Unique Paths: {path_analysis['unique_paths']}")
	if path_analysis['most_common_path']:
	path, count = path_analysis['most_common_path']
	print(f" Most Common: {path} ({count} times)")

	if __name__ == "__main__":
	main()
	```

	Save as `scripts/performance_dashboard.py` and run:

	```bash
	python scripts/performance_dashboard.py
	```

	## Advanced Usage

	### Custom Metrics

	```python
	from observability import TraceReader

	reader = TraceReader()

	# Calculate custom metric: papers processed per second
	traces = reader.get_traces(limit=100)
	total_papers = 0
	total_time_ms = 0

	for trace in traces:
	if trace.metadata.get("num_papers"):
	total_papers += trace.metadata["num_papers"]
	total_time_ms += trace.duration_ms or 0

	if total_time_ms > 0:
	papers_per_second = (total_papers / total_time_ms) * 1000
	print(f"Papers/second: {papers_per_second:.2f}")
	```

	### Monitoring Alerts

	```python
	from observability import AgentPerformanceAnalyzer

	analyzer = AgentPerformanceAnalyzer()

	# Check if error rate exceeds threshold
	error_stats = analyzer.error_rates(days=1)
	for agent, stats in error_stats.items():
	if stats['error_rate_percent'] > 10:
	print(f"⚠️ ALERT: {agent} error rate is {stats['error_rate_percent']:.1f}%")

	# Check if P95 latency is too high
	stats = analyzer.agent_latency_stats("analyzer_agent", days=1)
	if stats and stats.p95_latency_ms > 30000: # 30 seconds
	print(f"⚠️ ALERT: Analyzer P95 latency is {stats.p95_latency_ms/1000:.1f}s")
	```

	## Troubleshooting

	### No Traces Found

	1. Check that LangFuse is enabled: `LANGFUSE_ENABLED=true`
	2. Verify API keys are correct in `.env`
	3. Ensure network connectivity to LangFuse Cloud
	4. Check that at least one workflow has been executed

	### Missing Token/Cost Data

	- Token usage requires `langfuse-openai` instrumentation
	- Ensure `instrument_openai()` is called before creating Azure OpenAI clients
	- Cost data depends on LangFuse pricing configuration

	### Slow Query Performance

	- Reduce `limit` parameter for large trace datasets
	- Use date range filters to narrow results
	- Consider exporting traces to CSV for offline analysis

	## See Also

	- [LangFuse Documentation](https://langfuse.com/docs)
	- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
	- Main README: `../README.md`
	- Architecture: `../CLAUDE.md`