Spaces:
Sleeping
Sleeping
File size: 2,977 Bytes
24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 df47251 24f0bf0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | # observability-and-dashboard
## overview
Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.
## dashboard-sections
### 1-live-thought-stream
- chronological reasoning notes
- model/router choice trace
- action confidence timeline
- override events
### 2-navigation-map
Graph of visited pages:
- nodes = URLs
- edges = transitions
- node color = relevance/confidence
- revisit highlighting
### 3-mcp-usage-panel
- tool call count by server
- avg latency by tool
- error rate and retries
- top successful tool chains
### 4-memory-viewer
- inspect short/working/long/shared memory
- filter by task/domain/confidence
- edit/delete entries
- prune previews
### 5-reward-analytics
- per-step reward breakdown
- component contribution trends
- penalty heatmap
- episode comparison
### 6-cost-and-token-monitor
- per-provider usage
- per-model token counts
- cumulative cost vs budget
- forecasted burn rate
## core-metrics
### agent-metrics
- task completion rate
- avg steps to completion
- recovery score
- generalization score
- exploration ratio
### tool-metrics
- tool success rate
- timeout ratio
- fallback frequency
- schema validation failures
### memory-metrics
- retrieval hit rate
- relevance score distribution
- prune rate
- memory-assisted success ratio
### search-metrics
- query success rate
- multi-hop depth distribution
- credibility score average
- duplicate result ratio
## logging-model
Structured logs (JSON):
```json
{
"timestamp": "2026-03-27T00:00:00Z",
"episode_id": "ep_123",
"step": 7,
"event": "tool_call",
"tool": "beautifulsoup.find_all",
"latency_ms": 54,
"success": true,
"reward_delta": 0.08
}
```
## tracing
Per-episode trace includes:
- observations
- actions
- rewards
- tool calls
- memory operations
- final submission and grader results
## alerts
Configurable alerts:
- budget threshold crossed
- error spike
- tool outage
- memory bloat
- anomalous low reward streak
## apis
- `GET /api/metrics/summary`
- `GET /api/metrics/timeseries`
- `GET /api/traces/{episode_id}`
- `GET /api/costs`
- `GET /api/memory/stats`
- `GET /api/tools/stats`
## recommended-dashboard-layout
1. Top row: completion, cost, latency, error rate
2. Mid row: thought stream + navigation graph
3. Lower row: reward breakdown + MCP usage + memory viewer
4. Bottom row: raw trace and export controls
## export-and-audit
Exports:
- JSON trace
- CSV metrics
- reward analysis report
- model usage report
All exports include episode and configuration fingerprints for reproducibility.
## related-api-reference
| item | value |
| --- | --- |
| api-reference | `api-reference.md` |
## document-metadata
| key | value |
| --- | --- |
| document | `observability.md` |
| status | active |
## document-flow
```mermaid
flowchart TD
A[document] --> B[key-sections]
B --> C[implementation]
B --> D[operations]
B --> E[validation]
```
|