# Observability

A complete, modular log of the application — API/LLM calls, inference, memory, and
the core loop — readable in the terminal and live in the Gradio app. Built on
OpenTelemetry with a self-contained in-process backend. See ADR-0024.

## The facade

Every layer imports one thing:

```python
from src import observability as obs
```

| Call | Purpose |
|------|---------|
| `obs.configure(level=, fmt=, tracing=)` | Idempotent init (reads `MAL_*` env). Called by the app entrypoints; auto-runs on first use. |
| `obs.get_logger(__name__)` | A stdlib logger routed through the structured handlers. |
| `obs.log(event, level="info", **fields)` | One structured record: an `event` name + arbitrary fields (+ bound run/turn/agent). |
| `obs.span(name, **attrs)` | Context manager opening an OTEL span; nesting is automatic. |
| `obs.add_span_attrs(**attrs)` | Attach attributes to the active span. |
| `obs.incr(name, v=1, **labels)` / `obs.observe(name, v, **labels)` | Counter / histogram metric. |
| `obs.record_llm_call(model, prompt_tokens, completion_tokens, cost_usd)` | LLM-call counters. |
| `obs.record_agent_turn(agent, seconds)` | Agent-turn latency. |
| `obs.record_governor_trip(reason)` | Governor budget trip. |
| `obs.bind(run_id=, turn=, agent=)` / `obs.set_context(...)` | Correlation context (contextvars). |
| `obs.telemetry_store()` | In-memory store backing the Gradio Telemetry panel. |

## Span hierarchy

Spans nest by OTEL context — each layer opens only its own span:

```
run                         (conductor.reset)
└─ turn                     (conductor._tick / step_one)
   └─ agent.turn            (conductor._run_agent)
      ├─ memory.recall      (agents/base._recall → memory.py)
      │  └─ memory.index.search   (memory_index.py)
      ├─ llm.call | llm.structured  (models/litellm_provider.py)
      └─ tool.call          (tools/registry.py)
```

LLM spans use GenAI semantic-convention attributes (`gen_ai.system`,
`gen_ai.request.model`, `gen_ai.usage.input_tokens`/`output_tokens`) plus
engine-specific `llm.cost_usd`, `llm.prompt`, `llm.completion`, `llm.reasoning`.

## Configuration

| Env var | Default | Meaning |
|---------|---------|---------|
| `MAL_LOG_LEVEL` | `INFO` | Root level. `DEBUG` surfaces full prompts + memory. |
| `MAL_LOG_FORMAT` | `text` | Terminal format: `text` (human) or `json`. |
| `MAL_TRACING` | `memory` | Span sink: `off` \| `console` \| `memory` \| `both`. |
| `MAL_TELEMETRY_BUFFER` | `4000` | Ring-buffer size for logs/spans kept for the UI. |
| `MAL_TELEMETRY_TEXT_LIMIT` | `4000` | Prompt/memory truncation length in stored snapshots. |

## Conventions

- Import the facade only — never the OpenTelemetry SDK directly at a call site.
- Use the documented span names so the hierarchy stays consistent.
- **Never** log or attach API keys. Capture full prompts/memory at `DEBUG`; the
  store truncates them for the UI.
- The store is bounded and thread-safe; long sessions stay flat in memory.

## In-app panel

The Gradio "Telemetry" tab reads from `telemetry_store()`: a filterable log feed
(by agent/layer/level), metric charts (calls / tokens / cost / latency), and a
per-turn trace timeline where selecting a span reveals the prompt + memory the
agent saw.