Final_Assignment_Template / .cursor /rules /langfuse_best_practices.mdc
Humanlearning's picture
multi agent architecture
fe36046
---
description: langfuse and agent observation best practices
globs:
alwaysApply: false
---
1 Adopt the OTEL-native Python SDK (v3) everywhere
The v3 SDK wraps OpenTelemetry, so every span you open in any agent, tool or worker is automatically nested and correlated. This saves you from hand-passing trace IDs and lets you lean on existing OTEL auto-instrumentation for HTTP, DB or queue calls.
langfuse.com
langfuse.com
2 Create one root span per user request and pass a single CallbackHandler into graph.invoke/stream
python
Copy
Edit
from langfuse.langchain import CallbackHandler
langfuse_handler = CallbackHandler()
with langfuse.start_as_current_span(name="user-request") as root:
compiled_graph.invoke(
input=state,
config={"callbacks": [langfuse_handler]}
)
Everything the agents do now rolls up under that root for a tidy timeline.
langfuse.com
3 Use Langfuse Sessions to stitch together long-running conversations
Set session_id and user_id on the root span (or via update_trace) so all follow-up calls land in the same session dashboard.
langfuse.com
langfuse.com
4 Name spans predictably
llm/<model> – one per LLM call (e.g., llm/gpt-4o)
tool/<tool_name> – external search, RAG, code-exec…
agent/<role> – distinct for every worker node
Predictable names power Langfuse’s cost & latency aggregation widgets.
langfuse.com
5 Leverage Agent Graphs to debug routing loops
Because each node becomes a child span, Langfuse’s “Agent Graph” view renders the entire decision tree and shows token/cost per edge—very handy when several LLMs vote on the next step.
langfuse.com
6 Tag the root span with the environment (dev/stage/prod) and with the LLM provider you’re experimenting with
This lets you facet dashboards by deployment ring or by “OpenAI vs Mixtral.”
langfuse.com
langfuse.com
7 Attach scores (numeric or categorical) right after the graph run
span.score_trace(name="user-feedback", value=1) – or call create_score later. Use this both for thumb-up/down UI events and for LLM-as-judge automated grading.
langfuse.com
langfuse.com
8 Version and link your prompts
Call langfuse.create_prompt() (or manage them in the UI) and set prompt_id on spans so you can tell which prompt revision caused regressions.
langfuse.com
9 Exploit distributed-tracing headers if agents live in different services
Because v3 is OTEL-based, traceparent headers are parsed automatically—just make sure every micro-service initialises the Langfuse OTEL exporter with the same LANGFUSE_OTEL_DSN.
langfuse.com
10 Sample intelligently
Langfuse supports probabilistic sampling on the server. Keep 100 % of errors and maybe only 10 % of successful traces in prod to control storage costs.
langfuse.com
11 Mask PII at the SDK layer
Use the mask() helper or MASK_CONTENT_REGEX env var so you can still store numeric cost/latency while redacting sensitive inputs/outputs.
langfuse.com
12 Flush asynchronously in high-throughput agents
Call langfuse.flush(background=True) at the end of each worker tick to avoid blocking the event loop; OTEL will batch and export spans every few seconds.
langfuse.com
13 Test visual completeness with the LangGraph helper
graph.get_graph().draw_mermaid_png() and verify every edge appears in Langfuse; missing edges usually mean a span wasn’t opened or the callback handler wasn’t propagated.
langfuse.com
14 Watch out for the “traces not clubbed” pitfall when upgrading from v2 → v3
Older code that started independent traces per agent will fragment your timeline in v3. Always start one root span first (Tip #2).
github.com