Spaces:

traceverse-community
/

README

Running

File size: 7,703 Bytes

---
title: README
emoji: 🔭
colorFrom: blue
colorTo: green
sdk: static
pinned: false
---

# TraceVerse Community

> **The fastest way to know what your AI agent is actually doing — and prove it on a public leaderboard.**

You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends ₹400 on a single user query and you have no idea why.

This org exists to fix that. Open source, framework-agnostic, built so you can go from `git clone` to a traced agent with a leaderboard rank in under five minutes.

🔗 **[Discord](https://discord.gg/6SVz6VKK)** · **[GitHub](https://github.com/traceverse-community)** · **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** · **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)**

---

## Get a traced agent in 30 seconds

```python
# pip install genai-otel-instrument
from genai_otel_instrument import instrument

instrument(
    service_name="my-first-agent",
    otlp_endpoint="http://localhost:4318",   # or point at the public TraceMind Space
    redact_pii=True,                         # PII off your traces by default
)

# That's it. Run your agent. Every LLM call, tool call, token, rupee, and
# millisecond of latency is now visible.
```

No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla `openai` — anything that hits an LLM API.

---

## What we ship

### Libraries
| Project | What you get |
|---|---|
| **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** | One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. |
| **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)** | Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. |
| **[`TraceMind`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)** | Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. |
| **[`TraceMind-mcp-server`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)** | An MCP server so your agent can query its *own* historical traces. Meta-observability for self-improving agents. |

### Live MCP servers (3 servers · 18 tools · synthetic data · no API key)
| Surface | Space | Tools |
|---|---|---|
| Food delivery | [`food-delivery-mcp`](https://huggingface.co/spaces/traceverse-community/food-delivery-mcp) | 7 |
| Grocery / Instamart | [`instamart-mcp`](https://huggingface.co/spaces/traceverse-community/instamart-mcp) | 6 |
| Dineout / Reservations | [`dineout-mcp`](https://huggingface.co/spaces/traceverse-community/dineout-mcp) | 5 |

### Eval datasets (SmolTrace-format)
| Dataset | Tasks |
|---|---|
| [`food-delivery-evals`](https://huggingface.co/datasets/traceverse-community/food-delivery-evals) | 111 |
| [`instamart-evals`](https://huggingface.co/datasets/traceverse-community/instamart-evals) | 100 |
| [`dineout-evals`](https://huggingface.co/datasets/traceverse-community/dineout-evals) | 100 |

### Cross-domain SmolTrace datasets

For evaluation across other domains, see the **[TraceMind-AI Collection](https://huggingface.co/collections/kshitijthakkar/tracemind-ai)** — 41 SmolTrace-format datasets covering:

- **General domains** (12) — travel, ecommerce, healthcare, finance, legal, education, real-estate, social-media, recruitment, smart-home, customer-support, food-delivery
- **Ops & infrastructure** (15) — aiops, apm, devops, secops, mlops, llmops, cloud-cost, kubernetes, database-ops, incident-management, IaC, SRE, observability-platform, CI/CD, log-management
- **Industry-specific** (14) — drone, farming, manufacturing, hospitality, logistics, automotive, cybersecurity, telecom, insurance, events, marine, aviation, gaming, plus the three TraceVerse Community datasets above

Same SmolTrace schema, same prompt-template structure as ours. Use them directly — no need to mirror.

### Reference agents + docs
- **GitHub:** [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) — the binding repo. Reference agents wired with `genai-otel-instrument`, architecture docs, observability primer, leaderboard CI.

---

## What you'll get from this stack

- **See it.** Every LLM call, tool call, token spent, millisecond burned — visualized as a trace tree.
- **Score it.** Run your agent against shared task datasets. Get a number on a public leaderboard. Watch it move.
- **Compare it.** Two model versions, two prompts, two frameworks — same dataset, side-by-side cost, latency, and quality.
- **Trust it.** PII redaction is on by default. Self-host the viewer if you don't want anyone seeing your traces.

---

## Who this is for

- **Buildathon participants** — go from zero to traced agent with a leaderboard rank in under five minutes. Any framework, any model.
- **Indie builders** — see what your agent actually does, not what you think it does. Stop debugging via `print()`.
- **Teams shipping LLM apps** — replace ad-hoc notebook evals with reproducible numbers you can show a stakeholder.
- **Researchers** — every dataset and benchmark here is open. Fork it, extend it, contribute back.

---

## What we believe

1. **Observability is a precondition for serious agent work.** You cannot improve what you cannot see.
2. **Evaluation should be reproducible and public.** Benchmarks that live in private notebooks help no one.
3. **Cost and latency are first-class signals.** Quality without cost discipline is a research demo, not a product.
4. **The toolkit must work the same on localhost as in production.** No magic that only kicks in on day 30.

---

## Community

- 💬 **[Discord](https://discord.gg/6SVz6VKK)** — chat with the community, ask questions, share traces, suggest tasks for the eval suites.
- 🐙 **[GitHub](https://github.com/traceverse-community)** — open issues, PRs welcome, no CLA. Discussions enabled on every repo.
- 🤗 **HF Discussions** — every Space and Dataset has a Discussions tab. Use it for surface-specific questions (e.g. "found a bug in `apply_promo`" → discuss on the Space's tab).

---

## Roadmap

- ✅ **Live now** — `genai-otel-instrument`, `SmolTrace`, public `TraceMind`, `TraceMind-mcp-server`, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) binding repo.
- 🔜 **Next** — framework-specific reference agents (LangGraph + smolagents + CrewAI), automated PR-driven leaderboard, more domain MCP servers.
- **After** — community-curated tasks across more domains, cost-optimization recipes, `agents.md` standardization across all our Spaces.

---

## Production-grade companion

Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? **TraceVerse Enterprise** is the bigger sibling built for regulated environments — same telemetry contract, hardened for the bank floor.

---

## Get involved

- **Try it** — start with `genai-otel-instrument` on the agent you have right now.
- **Contribute** — every repo above accepts PRs. Issues open. No CLA.
- **Share datasets** — got a domain-specific task set? PR it into SmolTrace or open a discussion.
- **Join the conversation** — [Discord](https://discord.gg/6SVz6VKK), GitHub Discussions, or HF Discussions on any repo. We answer.