Spaces:

build-small-hackathon
/

multi-agent-lab

Running on Zero

App Files Files Community

multi-agent-lab / docs /architecture /observability.md

agharsallah

feat(observability): add OpenTelemetry logging/tracing/metrics foundation

3f7b296 24 days ago

preview code

Raw

History Blame Contribute Delete

3.29 kB

	# Observability

	A complete, modular log of the application — API/LLM calls, inference, memory, and
	the core loop — readable in the terminal and live in the Gradio app. Built on
	OpenTelemetry with a self-contained in-process backend. See ADR-0024.

	## The facade

	Every layer imports one thing:

	```python
	from src import observability as obs
	```

	\| Call \| Purpose \|
	\|------\|---------\|
	\| `obs.configure(level=, fmt=, tracing=)` \| Idempotent init (reads `MAL_*` env). Called by the app entrypoints; auto-runs on first use. \|
	\| `obs.get_logger(__name__)` \| A stdlib logger routed through the structured handlers. \|
	\| `obs.log(event, level="info", **fields)` \| One structured record: an `event` name + arbitrary fields (+ bound run/turn/agent). \|
	\| `obs.span(name, **attrs)` \| Context manager opening an OTEL span; nesting is automatic. \|
	\| `obs.add_span_attrs(**attrs)` \| Attach attributes to the active span. \|
	\| `obs.incr(name, v=1, labels)` / `obs.observe(name, v, labels)` \| Counter / histogram metric. \|
	\| `obs.record_llm_call(model, prompt_tokens, completion_tokens, cost_usd)` \| LLM-call counters. \|
	\| `obs.record_agent_turn(agent, seconds)` \| Agent-turn latency. \|
	\| `obs.record_governor_trip(reason)` \| Governor budget trip. \|
	\| `obs.bind(run_id=, turn=, agent=)` / `obs.set_context(...)` \| Correlation context (contextvars). \|
	\| `obs.telemetry_store()` \| In-memory store backing the Gradio Telemetry panel. \|

	## Span hierarchy

	Spans nest by OTEL context — each layer opens only its own span:

	```
	run (conductor.reset)
	└─ turn (conductor._tick / step_one)
	└─ agent.turn (conductor._run_agent)
	├─ memory.recall (agents/base._recall → memory.py)
	│ └─ memory.index.search (memory_index.py)
	├─ llm.call \| llm.structured (models/litellm_provider.py)
	└─ tool.call (tools/registry.py)
	```

	LLM spans use GenAI semantic-convention attributes (`gen_ai.system`,
	`gen_ai.request.model`, `gen_ai.usage.input_tokens`/`output_tokens`) plus
	engine-specific `llm.cost_usd`, `llm.prompt`, `llm.completion`, `llm.reasoning`.

	## Configuration

	\| Env var \| Default \| Meaning \|
	\|---------\|---------\|---------\|
	\| `MAL_LOG_LEVEL` \| `INFO` \| Root level. `DEBUG` surfaces full prompts + memory. \|
	\| `MAL_LOG_FORMAT` \| `text` \| Terminal format: `text` (human) or `json`. \|
	\| `MAL_TRACING` \| `memory` \| Span sink: `off` \\| `console` \\| `memory` \\| `both`. \|
	\| `MAL_TELEMETRY_BUFFER` \| `4000` \| Ring-buffer size for logs/spans kept for the UI. \|
	\| `MAL_TELEMETRY_TEXT_LIMIT` \| `4000` \| Prompt/memory truncation length in stored snapshots. \|

	## Conventions

	- Import the facade only — never the OpenTelemetry SDK directly at a call site.
	- Use the documented span names so the hierarchy stays consistent.
	- Never log or attach API keys. Capture full prompts/memory at `DEBUG`; the
	store truncates them for the UI.
	- The store is bounded and thread-safe; long sessions stay flat in memory.

	## In-app panel

	The Gradio "Telemetry" tab reads from `telemetry_store()`: a filterable log feed
	(by agent/layer/level), metric charts (calls / tokens / cost / latency), and a
	per-turn trace timeline where selecting a span reveals the prompt + memory the
	agent saw.