Final_Assignment_Template

Sleeping

App Files Files Community

Final_Assignment_Template / .cursor /rules /langfuse_best_practices.mdc

Humanlearning

multi agent architecture

fe36046 8 months ago

raw

history blame contribute delete

3.65 kB

	---
	description: langfuse and agent observation best practices
	globs:
	alwaysApply: false
	---
	1 Adopt the OTEL-native Python SDK (v3) everywhere
	The v3 SDK wraps OpenTelemetry, so every span you open in any agent, tool or worker is automatically nested and correlated. This saves you from hand-passing trace IDs and lets you lean on existing OTEL auto-instrumentation for HTTP, DB or queue calls.
	langfuse.com
	langfuse.com

	2 Create one root span per user request and pass a single CallbackHandler into graph.invoke/stream
	python
	Copy
	Edit
	from langfuse.langchain import CallbackHandler
	langfuse_handler = CallbackHandler()

	with langfuse.start_as_current_span(name="user-request") as root:
	compiled_graph.invoke(
	input=state,
	config={"callbacks": [langfuse_handler]}
	)
	Everything the agents do now rolls up under that root for a tidy timeline.
	langfuse.com

	3 Use Langfuse Sessions to stitch together long-running conversations
	Set session_id and user_id on the root span (or via update_trace) so all follow-up calls land in the same session dashboard.
	langfuse.com
	langfuse.com

	4 Name spans predictably
	llm/<model> – one per LLM call (e.g., llm/gpt-4o)

	tool/<tool_name> – external search, RAG, code-exec…

	agent/<role> – distinct for every worker node
	Predictable names power Langfuse’s cost & latency aggregation widgets.
	langfuse.com

	5 Leverage Agent Graphs to debug routing loops
	Because each node becomes a child span, Langfuse’s “Agent Graph” view renders the entire decision tree and shows token/cost per edge—very handy when several LLMs vote on the next step.
	langfuse.com

	6 Tag the root span with the environment (dev/stage/prod) and with the LLM provider you’re experimenting with
	This lets you facet dashboards by deployment ring or by “OpenAI vs Mixtral.”
	langfuse.com
	langfuse.com

	7 Attach scores (numeric or categorical) right after the graph run
	span.score_trace(name="user-feedback", value=1) – or call create_score later. Use this both for thumb-up/down UI events and for LLM-as-judge automated grading.
	langfuse.com
	langfuse.com

	8 Version and link your prompts
	Call langfuse.create_prompt() (or manage them in the UI) and set prompt_id on spans so you can tell which prompt revision caused regressions.
	langfuse.com

	9 Exploit distributed-tracing headers if agents live in different services
	Because v3 is OTEL-based, traceparent headers are parsed automatically—just make sure every micro-service initialises the Langfuse OTEL exporter with the same LANGFUSE_OTEL_DSN.
	langfuse.com

	10 Sample intelligently
	Langfuse supports probabilistic sampling on the server. Keep 100 % of errors and maybe only 10 % of successful traces in prod to control storage costs.
	langfuse.com

	11 Mask PII at the SDK layer
	Use the mask() helper or MASK_CONTENT_REGEX env var so you can still store numeric cost/latency while redacting sensitive inputs/outputs.
	langfuse.com

	12 Flush asynchronously in high-throughput agents
	Call langfuse.flush(background=True) at the end of each worker tick to avoid blocking the event loop; OTEL will batch and export spans every few seconds.
	langfuse.com

	13 Test visual completeness with the LangGraph helper
	graph.get_graph().draw_mermaid_png() and verify every edge appears in Langfuse; missing edges usually mean a span wasn’t opened or the callback handler wasn’t propagated.
	langfuse.com

	14 Watch out for the “traces not clubbed” pitfall when upgrading from v2 → v3
	Older code that started independent traces per agent will fragment your timeline in v3. Always start one root span first (Tip #2).
	github.com