multi-agent-lab / docs /architecture /observability.md
agharsallah
feat(observability): add OpenTelemetry logging/tracing/metrics foundation
3f7b296
|
Raw
History Blame Contribute Delete
3.29 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Observability

A complete, modular log of the application β€” API/LLM calls, inference, memory, and the core loop β€” readable in the terminal and live in the Gradio app. Built on OpenTelemetry with a self-contained in-process backend. See ADR-0024.

The facade

Every layer imports one thing:

from src import observability as obs
Call Purpose
obs.configure(level=, fmt=, tracing=) Idempotent init (reads MAL_* env). Called by the app entrypoints; auto-runs on first use.
obs.get_logger(__name__) A stdlib logger routed through the structured handlers.
obs.log(event, level="info", **fields) One structured record: an event name + arbitrary fields (+ bound run/turn/agent).
obs.span(name, **attrs) Context manager opening an OTEL span; nesting is automatic.
obs.add_span_attrs(**attrs) Attach attributes to the active span.
obs.incr(name, v=1, **labels) / obs.observe(name, v, **labels) Counter / histogram metric.
obs.record_llm_call(model, prompt_tokens, completion_tokens, cost_usd) LLM-call counters.
obs.record_agent_turn(agent, seconds) Agent-turn latency.
obs.record_governor_trip(reason) Governor budget trip.
obs.bind(run_id=, turn=, agent=) / obs.set_context(...) Correlation context (contextvars).
obs.telemetry_store() In-memory store backing the Gradio Telemetry panel.

Span hierarchy

Spans nest by OTEL context β€” each layer opens only its own span:

run                         (conductor.reset)
└─ turn                     (conductor._tick / step_one)
   └─ agent.turn            (conductor._run_agent)
      β”œβ”€ memory.recall      (agents/base._recall β†’ memory.py)
      β”‚  └─ memory.index.search   (memory_index.py)
      β”œβ”€ llm.call | llm.structured  (models/litellm_provider.py)
      └─ tool.call          (tools/registry.py)

LLM spans use GenAI semantic-convention attributes (gen_ai.system, gen_ai.request.model, gen_ai.usage.input_tokens/output_tokens) plus engine-specific llm.cost_usd, llm.prompt, llm.completion, llm.reasoning.

Configuration

Env var Default Meaning
MAL_LOG_LEVEL INFO Root level. DEBUG surfaces full prompts + memory.
MAL_LOG_FORMAT text Terminal format: text (human) or json.
MAL_TRACING memory Span sink: off | console | memory | both.
MAL_TELEMETRY_BUFFER 4000 Ring-buffer size for logs/spans kept for the UI.
MAL_TELEMETRY_TEXT_LIMIT 4000 Prompt/memory truncation length in stored snapshots.

Conventions

  • Import the facade only β€” never the OpenTelemetry SDK directly at a call site.
  • Use the documented span names so the hierarchy stays consistent.
  • Never log or attach API keys. Capture full prompts/memory at DEBUG; the store truncates them for the UI.
  • The store is bounded and thread-safe; long sessions stay flat in memory.

In-app panel

The Gradio "Telemetry" tab reads from telemetry_store(): a filterable log feed (by agent/layer/level), metric charts (calls / tokens / cost / latency), and a per-turn trace timeline where selecting a span reveals the prompt + memory the agent saw.