Spaces:

traceverse-community
/

README

Running

App Files Files Community

README / README.md

kshitijthakkar

Replace mirrored datasets with link to TraceMind-AI collection (41 datasets)

dcbe89a verified 17 days ago

preview code

raw

history blame contribute delete

7.7 kB

	---
	title: README
	emoji: 🔭
	colorFrom: blue
	colorTo: green
	sdk: static
	pinned: false
	---

	# TraceVerse Community

	> The fastest way to know what your AI agent is actually doing — and prove it on a public leaderboard.

	You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends ₹400 on a single user query and you have no idea why.

	This org exists to fix that. Open source, framework-agnostic, built so you can go from `git clone` to a traced agent with a leaderboard rank in under five minutes.

	🔗 [Discord](https://discord.gg/6SVz6VKK) · [GitHub](https://github.com/traceverse-community) · [`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument) · [`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)

	---

	## Get a traced agent in 30 seconds

	```python
	# pip install genai-otel-instrument
	from genai_otel_instrument import instrument

	instrument(
	service_name="my-first-agent",
	otlp_endpoint="http://localhost:4318", # or point at the public TraceMind Space
	redact_pii=True, # PII off your traces by default
	)

	# That's it. Run your agent. Every LLM call, tool call, token, rupee, and
	# millisecond of latency is now visible.
	```

	No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla `openai` — anything that hits an LLM API.

	---

	## What we ship

	### Libraries
	\| Project \| What you get \|
	\|---\|---\|
	\| [`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument) \| One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. \|
	\| [`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE) \| Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. \|
	\| [`TraceMind`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind) \| Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. \|
	\| [`TraceMind-mcp-server`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) \| An MCP server so your agent can query its own historical traces. Meta-observability for self-improving agents. \|

	### Live MCP servers (3 servers · 18 tools · synthetic data · no API key)
	\| Surface \| Space \| Tools \|
	\|---\|---\|---\|
	\| Food delivery \| [`food-delivery-mcp`](https://huggingface.co/spaces/traceverse-community/food-delivery-mcp) \| 7 \|
	\| Grocery / Instamart \| [`instamart-mcp`](https://huggingface.co/spaces/traceverse-community/instamart-mcp) \| 6 \|
	\| Dineout / Reservations \| [`dineout-mcp`](https://huggingface.co/spaces/traceverse-community/dineout-mcp) \| 5 \|

	### Eval datasets (SmolTrace-format)
	\| Dataset \| Tasks \|
	\|---\|---\|
	\| [`food-delivery-evals`](https://huggingface.co/datasets/traceverse-community/food-delivery-evals) \| 111 \|
	\| [`instamart-evals`](https://huggingface.co/datasets/traceverse-community/instamart-evals) \| 100 \|
	\| [`dineout-evals`](https://huggingface.co/datasets/traceverse-community/dineout-evals) \| 100 \|

	### Cross-domain SmolTrace datasets

	For evaluation across other domains, see the [TraceMind-AI Collection](https://huggingface.co/collections/kshitijthakkar/tracemind-ai) — 41 SmolTrace-format datasets covering:

	- General domains (12) — travel, ecommerce, healthcare, finance, legal, education, real-estate, social-media, recruitment, smart-home, customer-support, food-delivery
	- Ops & infrastructure (15) — aiops, apm, devops, secops, mlops, llmops, cloud-cost, kubernetes, database-ops, incident-management, IaC, SRE, observability-platform, CI/CD, log-management
	- Industry-specific (14) — drone, farming, manufacturing, hospitality, logistics, automotive, cybersecurity, telecom, insurance, events, marine, aviation, gaming, plus the three TraceVerse Community datasets above

	Same SmolTrace schema, same prompt-template structure as ours. Use them directly — no need to mirror.

	### Reference agents + docs
	- GitHub: [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) — the binding repo. Reference agents wired with `genai-otel-instrument`, architecture docs, observability primer, leaderboard CI.

	---

	## What you'll get from this stack

	- See it. Every LLM call, tool call, token spent, millisecond burned — visualized as a trace tree.
	- Score it. Run your agent against shared task datasets. Get a number on a public leaderboard. Watch it move.
	- Compare it. Two model versions, two prompts, two frameworks — same dataset, side-by-side cost, latency, and quality.
	- Trust it. PII redaction is on by default. Self-host the viewer if you don't want anyone seeing your traces.

	---

	## Who this is for

	- Buildathon participants — go from zero to traced agent with a leaderboard rank in under five minutes. Any framework, any model.
	- Indie builders — see what your agent actually does, not what you think it does. Stop debugging via `print()`.
	- Teams shipping LLM apps — replace ad-hoc notebook evals with reproducible numbers you can show a stakeholder.
	- Researchers — every dataset and benchmark here is open. Fork it, extend it, contribute back.

	---

	## What we believe

	1. Observability is a precondition for serious agent work. You cannot improve what you cannot see.
	2. Evaluation should be reproducible and public. Benchmarks that live in private notebooks help no one.
	3. Cost and latency are first-class signals. Quality without cost discipline is a research demo, not a product.
	4. The toolkit must work the same on localhost as in production. No magic that only kicks in on day 30.

	---

	## Community

	- 💬 [Discord](https://discord.gg/6SVz6VKK) — chat with the community, ask questions, share traces, suggest tasks for the eval suites.
	- 🐙 [GitHub](https://github.com/traceverse-community) — open issues, PRs welcome, no CLA. Discussions enabled on every repo.
	- 🤗 HF Discussions — every Space and Dataset has a Discussions tab. Use it for surface-specific questions (e.g. "found a bug in `apply_promo`" → discuss on the Space's tab).

	---

	## Roadmap

	- ✅ Live now — `genai-otel-instrument`, `SmolTrace`, public `TraceMind`, `TraceMind-mcp-server`, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) binding repo.
	- 🔜 Next — framework-specific reference agents (LangGraph + smolagents + CrewAI), automated PR-driven leaderboard, more domain MCP servers.
	- After — community-curated tasks across more domains, cost-optimization recipes, `agents.md` standardization across all our Spaces.

	---

	## Production-grade companion

	Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? TraceVerse Enterprise is the bigger sibling built for regulated environments — same telemetry contract, hardened for the bank floor.

	---

	## Get involved

	- Try it — start with `genai-otel-instrument` on the agent you have right now.
	- Contribute — every repo above accepts PRs. Issues open. No CLA.
	- Share datasets — got a domain-specific task set? PR it into SmolTrace or open a discussion.
	- Join the conversation — [Discord](https://discord.gg/6SVz6VKK), GitHub Discussions, or HF Discussions on any repo. We answer.