Spaces:

NeerajCodz
/

scrapeRL

Sleeping

App Files Files Community

scrapeRL / docs /observability.md

NeerajCodz

docs: update

df47251 2 months ago

preview code

raw

history blame

2.62 kB

	# Observability and Dashboard

	## Overview

	Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.

	## Dashboard Sections

	### 1. Live Thought Stream

	- chronological reasoning notes
	- model/router choice trace
	- action confidence timeline
	- override events

	### 2. Navigation Map

	Graph of visited pages:

	- nodes = URLs
	- edges = transitions
	- node color = relevance/confidence
	- revisit highlighting

	### 3. MCP Usage Panel

	- tool call count by server
	- avg latency by tool
	- error rate and retries
	- top successful tool chains

	### 4. Memory Viewer

	- inspect short/working/long/shared memory
	- filter by task/domain/confidence
	- edit/delete entries
	- prune previews

	### 5. Reward Analytics

	- per-step reward breakdown
	- component contribution trends
	- penalty heatmap
	- episode comparison

	### 6. Cost and Token Monitor

	- per-provider usage
	- per-model token counts
	- cumulative cost vs budget
	- forecasted burn rate

	## Core Metrics

	### Agent Metrics

	- task completion rate
	- avg steps to completion
	- recovery score
	- generalization score
	- exploration ratio

	### Tool Metrics

	- tool success rate
	- timeout ratio
	- fallback frequency
	- schema validation failures

	### Memory Metrics

	- retrieval hit rate
	- relevance score distribution
	- prune rate
	- memory-assisted success ratio

	### Search Metrics

	- query success rate
	- multi-hop depth distribution
	- credibility score average
	- duplicate result ratio

	## Logging Model

	Structured logs (JSON):

	```json
	{
	"timestamp": "2026-03-27T00:00:00Z",
	"episode_id": "ep_123",
	"step": 7,
	"event": "tool_call",
	"tool": "beautifulsoup.find_all",
	"latency_ms": 54,
	"success": true,
	"reward_delta": 0.08
	}
	```

	## Tracing

	Per-episode trace includes:

	- observations
	- actions
	- rewards
	- tool calls
	- memory operations
	- final submission and grader results

	## Alerts

	Configurable alerts:

	- budget threshold crossed
	- error spike
	- tool outage
	- memory bloat
	- anomalous low reward streak

	## APIs

	- `GET /api/metrics/summary`
	- `GET /api/metrics/timeseries`
	- `GET /api/traces/{episode_id}`
	- `GET /api/costs`
	- `GET /api/memory/stats`
	- `GET /api/tools/stats`

	## Recommended Dashboard Layout

	1. Top row: completion, cost, latency, error rate
	2. Mid row: thought stream + navigation graph
	3. Lower row: reward breakdown + MCP usage + memory viewer
	4. Bottom row: raw trace and export controls

	## Export and Audit

	Exports:

	- JSON trace
	- CSV metrics
	- reward analysis report
	- model usage report

	All exports include episode and configuration fingerprints for reproducibility.