--- title: README emoji: ๐Ÿ”ญ colorFrom: blue colorTo: green sdk: static pinned: false --- # TraceVerse Community > **The fastest way to know what your AI agent is actually doing โ€” and prove it on a public leaderboard.** You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends โ‚น400 on a single user query and you have no idea why. This org exists to fix that. Open source, framework-agnostic, built so you can go from `git clone` to a traced agent with a leaderboard rank in under five minutes. ๐Ÿ”— **[Discord](https://discord.gg/6SVz6VKK)** ยท **[GitHub](https://github.com/traceverse-community)** ยท **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** ยท **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)** --- ## Get a traced agent in 30 seconds ```python # pip install genai-otel-instrument from genai_otel_instrument import instrument instrument( service_name="my-first-agent", otlp_endpoint="http://localhost:4318", # or point at the public TraceMind Space redact_pii=True, # PII off your traces by default ) # That's it. Run your agent. Every LLM call, tool call, token, rupee, and # millisecond of latency is now visible. ``` No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla `openai` โ€” anything that hits an LLM API. --- ## What we ship ### Libraries | Project | What you get | |---|---| | **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** | One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. | | **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)** | Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. | | **[`TraceMind`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)** | Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. | | **[`TraceMind-mcp-server`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)** | An MCP server so your agent can query its *own* historical traces. Meta-observability for self-improving agents. | ### Live MCP servers (3 servers ยท 18 tools ยท synthetic data ยท no API key) | Surface | Space | Tools | |---|---|---| | Food delivery | [`food-delivery-mcp`](https://huggingface.co/spaces/traceverse-community/food-delivery-mcp) | 7 | | Grocery / Instamart | [`instamart-mcp`](https://huggingface.co/spaces/traceverse-community/instamart-mcp) | 6 | | Dineout / Reservations | [`dineout-mcp`](https://huggingface.co/spaces/traceverse-community/dineout-mcp) | 5 | ### Eval datasets (SmolTrace-format) | Dataset | Tasks | |---|---| | [`food-delivery-evals`](https://huggingface.co/datasets/traceverse-community/food-delivery-evals) | 111 | | [`instamart-evals`](https://huggingface.co/datasets/traceverse-community/instamart-evals) | 100 | | [`dineout-evals`](https://huggingface.co/datasets/traceverse-community/dineout-evals) | 100 | ### Cross-domain SmolTrace datasets For evaluation across other domains, see the **[TraceMind-AI Collection](https://huggingface.co/collections/kshitijthakkar/tracemind-ai)** โ€” 41 SmolTrace-format datasets covering: - **General domains** (12) โ€” travel, ecommerce, healthcare, finance, legal, education, real-estate, social-media, recruitment, smart-home, customer-support, food-delivery - **Ops & infrastructure** (15) โ€” aiops, apm, devops, secops, mlops, llmops, cloud-cost, kubernetes, database-ops, incident-management, IaC, SRE, observability-platform, CI/CD, log-management - **Industry-specific** (14) โ€” drone, farming, manufacturing, hospitality, logistics, automotive, cybersecurity, telecom, insurance, events, marine, aviation, gaming, plus the three TraceVerse Community datasets above Same SmolTrace schema, same prompt-template structure as ours. Use them directly โ€” no need to mirror. ### Reference agents + docs - **GitHub:** [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) โ€” the binding repo. Reference agents wired with `genai-otel-instrument`, architecture docs, observability primer, leaderboard CI. --- ## What you'll get from this stack - **See it.** Every LLM call, tool call, token spent, millisecond burned โ€” visualized as a trace tree. - **Score it.** Run your agent against shared task datasets. Get a number on a public leaderboard. Watch it move. - **Compare it.** Two model versions, two prompts, two frameworks โ€” same dataset, side-by-side cost, latency, and quality. - **Trust it.** PII redaction is on by default. Self-host the viewer if you don't want anyone seeing your traces. --- ## Who this is for - **Buildathon participants** โ€” go from zero to traced agent with a leaderboard rank in under five minutes. Any framework, any model. - **Indie builders** โ€” see what your agent actually does, not what you think it does. Stop debugging via `print()`. - **Teams shipping LLM apps** โ€” replace ad-hoc notebook evals with reproducible numbers you can show a stakeholder. - **Researchers** โ€” every dataset and benchmark here is open. Fork it, extend it, contribute back. --- ## What we believe 1. **Observability is a precondition for serious agent work.** You cannot improve what you cannot see. 2. **Evaluation should be reproducible and public.** Benchmarks that live in private notebooks help no one. 3. **Cost and latency are first-class signals.** Quality without cost discipline is a research demo, not a product. 4. **The toolkit must work the same on localhost as in production.** No magic that only kicks in on day 30. --- ## Community - ๐Ÿ’ฌ **[Discord](https://discord.gg/6SVz6VKK)** โ€” chat with the community, ask questions, share traces, suggest tasks for the eval suites. - ๐Ÿ™ **[GitHub](https://github.com/traceverse-community)** โ€” open issues, PRs welcome, no CLA. Discussions enabled on every repo. - ๐Ÿค— **HF Discussions** โ€” every Space and Dataset has a Discussions tab. Use it for surface-specific questions (e.g. "found a bug in `apply_promo`" โ†’ discuss on the Space's tab). --- ## Roadmap - โœ… **Live now** โ€” `genai-otel-instrument`, `SmolTrace`, public `TraceMind`, `TraceMind-mcp-server`, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) binding repo. - ๐Ÿ”œ **Next** โ€” framework-specific reference agents (LangGraph + smolagents + CrewAI), automated PR-driven leaderboard, more domain MCP servers. - **After** โ€” community-curated tasks across more domains, cost-optimization recipes, `agents.md` standardization across all our Spaces. --- ## Production-grade companion Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? **TraceVerse Enterprise** is the bigger sibling built for regulated environments โ€” same telemetry contract, hardened for the bank floor. --- ## Get involved - **Try it** โ€” start with `genai-otel-instrument` on the agent you have right now. - **Contribute** โ€” every repo above accepts PRs. Issues open. No CLA. - **Share datasets** โ€” got a domain-specific task set? PR it into SmolTrace or open a discussion. - **Join the conversation** โ€” [Discord](https://discord.gg/6SVz6VKK), GitHub Discussions, or HF Discussions on any repo. We answer.