| --- | |
| description: langfuse and agent observation best practices | |
| globs: | |
| alwaysApply: false | |
| --- | |
| 1 Adopt the OTEL-native Python SDK (v3) everywhere | |
| The v3 SDK wraps OpenTelemetry, so every span you open in any agent, tool or worker is automatically nested and correlated. This saves you from hand-passing trace IDs and lets you lean on existing OTEL auto-instrumentation for HTTP, DB or queue calls. | |
| langfuse.com | |
| langfuse.com | |
| 2 Create one root span per user request and pass a single CallbackHandler into graph.invoke/stream | |
| python | |
| Copy | |
| Edit | |
| from langfuse.langchain import CallbackHandler | |
| langfuse_handler = CallbackHandler() | |
| with langfuse.start_as_current_span(name="user-request") as root: | |
| compiled_graph.invoke( | |
| input=state, | |
| config={"callbacks": [langfuse_handler]} | |
| ) | |
| Everything the agents do now rolls up under that root for a tidy timeline. | |
| langfuse.com | |
| 3 Use Langfuse Sessions to stitch together long-running conversations | |
| Set session_id and user_id on the root span (or via update_trace) so all follow-up calls land in the same session dashboard. | |
| langfuse.com | |
| langfuse.com | |
| 4 Name spans predictably | |
| llm/<model> – one per LLM call (e.g., llm/gpt-4o) | |
| tool/<tool_name> – external search, RAG, code-exec… | |
| agent/<role> – distinct for every worker node | |
| Predictable names power Langfuse’s cost & latency aggregation widgets. | |
| langfuse.com | |
| 5 Leverage Agent Graphs to debug routing loops | |
| Because each node becomes a child span, Langfuse’s “Agent Graph” view renders the entire decision tree and shows token/cost per edge—very handy when several LLMs vote on the next step. | |
| langfuse.com | |
| 6 Tag the root span with the environment (dev/stage/prod) and with the LLM provider you’re experimenting with | |
| This lets you facet dashboards by deployment ring or by “OpenAI vs Mixtral.” | |
| langfuse.com | |
| langfuse.com | |
| 7 Attach scores (numeric or categorical) right after the graph run | |
| span.score_trace(name="user-feedback", value=1) – or call create_score later. Use this both for thumb-up/down UI events and for LLM-as-judge automated grading. | |
| langfuse.com | |
| langfuse.com | |
| 8 Version and link your prompts | |
| Call langfuse.create_prompt() (or manage them in the UI) and set prompt_id on spans so you can tell which prompt revision caused regressions. | |
| langfuse.com | |
| 9 Exploit distributed-tracing headers if agents live in different services | |
| Because v3 is OTEL-based, traceparent headers are parsed automatically—just make sure every micro-service initialises the Langfuse OTEL exporter with the same LANGFUSE_OTEL_DSN. | |
| langfuse.com | |
| 10 Sample intelligently | |
| Langfuse supports probabilistic sampling on the server. Keep 100 % of errors and maybe only 10 % of successful traces in prod to control storage costs. | |
| langfuse.com | |
| 11 Mask PII at the SDK layer | |
| Use the mask() helper or MASK_CONTENT_REGEX env var so you can still store numeric cost/latency while redacting sensitive inputs/outputs. | |
| langfuse.com | |
| 12 Flush asynchronously in high-throughput agents | |
| Call langfuse.flush(background=True) at the end of each worker tick to avoid blocking the event loop; OTEL will batch and export spans every few seconds. | |
| langfuse.com | |
| 13 Test visual completeness with the LangGraph helper | |
| graph.get_graph().draw_mermaid_png() and verify every edge appears in Langfuse; missing edges usually mean a span wasn’t opened or the callback handler wasn’t propagated. | |
| langfuse.com | |
| 14 Watch out for the “traces not clubbed” pitfall when upgrading from v2 → v3 | |
| Older code that started independent traces per agent will fragment your timeline in v3. Always start one root span first (Tip #2). | |
| github.com |