File size: 2,977 Bytes
24f0bf0
df47251
24f0bf0
df47251
 
 
24f0bf0
df47251
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
24f0bf0
df47251
 
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
24f0bf0
df47251
 
 
 
 
 
 
 
 
24f0bf0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# observability-and-dashboard

## overview

Observability provides deep insight into runtime behavior, model usage, tool execution, memory quality, and rewards.

## dashboard-sections

### 1-live-thought-stream

- chronological reasoning notes
- model/router choice trace
- action confidence timeline
- override events

### 2-navigation-map

Graph of visited pages:

- nodes = URLs
- edges = transitions
- node color = relevance/confidence
- revisit highlighting

### 3-mcp-usage-panel

- tool call count by server
- avg latency by tool
- error rate and retries
- top successful tool chains

### 4-memory-viewer

- inspect short/working/long/shared memory
- filter by task/domain/confidence
- edit/delete entries
- prune previews

### 5-reward-analytics

- per-step reward breakdown
- component contribution trends
- penalty heatmap
- episode comparison

### 6-cost-and-token-monitor

- per-provider usage
- per-model token counts
- cumulative cost vs budget
- forecasted burn rate

## core-metrics

### agent-metrics

- task completion rate
- avg steps to completion
- recovery score
- generalization score
- exploration ratio

### tool-metrics

- tool success rate
- timeout ratio
- fallback frequency
- schema validation failures

### memory-metrics

- retrieval hit rate
- relevance score distribution
- prune rate
- memory-assisted success ratio

### search-metrics

- query success rate
- multi-hop depth distribution
- credibility score average
- duplicate result ratio

## logging-model

Structured logs (JSON):

```json
{
  "timestamp": "2026-03-27T00:00:00Z",
  "episode_id": "ep_123",
  "step": 7,
  "event": "tool_call",
  "tool": "beautifulsoup.find_all",
  "latency_ms": 54,
  "success": true,
  "reward_delta": 0.08
}
```

## tracing

Per-episode trace includes:

- observations
- actions
- rewards
- tool calls
- memory operations
- final submission and grader results

## alerts

Configurable alerts:

- budget threshold crossed
- error spike
- tool outage
- memory bloat
- anomalous low reward streak

## apis

- `GET /api/metrics/summary`
- `GET /api/metrics/timeseries`
- `GET /api/traces/{episode_id}`
- `GET /api/costs`
- `GET /api/memory/stats`
- `GET /api/tools/stats`

## recommended-dashboard-layout

1. Top row: completion, cost, latency, error rate
2. Mid row: thought stream + navigation graph
3. Lower row: reward breakdown + MCP usage + memory viewer
4. Bottom row: raw trace and export controls

## export-and-audit

Exports:

- JSON trace
- CSV metrics
- reward analysis report
- model usage report

All exports include episode and configuration fingerprints for reproducibility.


## related-api-reference

| item | value |
| --- | --- |
| api-reference | `api-reference.md` |

## document-metadata

| key | value |
| --- | --- |
| document | `observability.md` |
| status | active |

## document-flow

```mermaid
flowchart TD
    A[document] --> B[key-sections]
    B --> C[implementation]
    B --> D[operations]
    B --> E[validation]
```