Spaces:
Running on Zero
A newer version of the Gradio SDK is available: 6.19.0
Observability
A complete, modular log of the application β API/LLM calls, inference, memory, and the core loop β readable in the terminal and live in the Gradio app. Built on OpenTelemetry with a self-contained in-process backend. See ADR-0024.
The facade
Every layer imports one thing:
from src import observability as obs
| Call | Purpose |
|---|---|
obs.configure(level=, fmt=, tracing=) |
Idempotent init (reads MAL_* env). Called by the app entrypoints; auto-runs on first use. |
obs.get_logger(__name__) |
A stdlib logger routed through the structured handlers. |
obs.log(event, level="info", **fields) |
One structured record: an event name + arbitrary fields (+ bound run/turn/agent). |
obs.span(name, **attrs) |
Context manager opening an OTEL span; nesting is automatic. |
obs.add_span_attrs(**attrs) |
Attach attributes to the active span. |
obs.incr(name, v=1, **labels) / obs.observe(name, v, **labels) |
Counter / histogram metric. |
obs.record_llm_call(model, prompt_tokens, completion_tokens, cost_usd) |
LLM-call counters. |
obs.record_agent_turn(agent, seconds) |
Agent-turn latency. |
obs.record_governor_trip(reason) |
Governor budget trip. |
obs.bind(run_id=, turn=, agent=) / obs.set_context(...) |
Correlation context (contextvars). |
obs.telemetry_store() |
In-memory store backing the Gradio Telemetry panel. |
Span hierarchy
Spans nest by OTEL context β each layer opens only its own span:
run (conductor.reset)
ββ turn (conductor._tick / step_one)
ββ agent.turn (conductor._run_agent)
ββ memory.recall (agents/base._recall β memory.py)
β ββ memory.index.search (memory_index.py)
ββ llm.call | llm.structured (models/litellm_provider.py)
ββ tool.call (tools/registry.py)
LLM spans use GenAI semantic-convention attributes (gen_ai.system,
gen_ai.request.model, gen_ai.usage.input_tokens/output_tokens) plus
engine-specific llm.cost_usd, llm.prompt, llm.completion, llm.reasoning.
Configuration
| Env var | Default | Meaning |
|---|---|---|
MAL_LOG_LEVEL |
INFO |
Root level. DEBUG surfaces full prompts + memory. |
MAL_LOG_FORMAT |
text |
Terminal format: text (human) or json. |
MAL_TRACING |
memory |
Span sink: off | console | memory | both. |
MAL_TELEMETRY_BUFFER |
4000 |
Ring-buffer size for logs/spans kept for the UI. |
MAL_TELEMETRY_TEXT_LIMIT |
4000 |
Prompt/memory truncation length in stored snapshots. |
Conventions
- Import the facade only β never the OpenTelemetry SDK directly at a call site.
- Use the documented span names so the hierarchy stays consistent.
- Never log or attach API keys. Capture full prompts/memory at
DEBUG; the store truncates them for the UI. - The store is bounded and thread-safe; long sessions stay flat in memory.
In-app panel
The Gradio "Telemetry" tab reads from telemetry_store(): a filterable log feed
(by agent/layer/level), metric charts (calls / tokens / cost / latency), and a
per-turn trace timeline where selecting a span reveals the prompt + memory the
agent saw.