README / README.md
kshitijthakkar's picture
Replace mirrored datasets with link to TraceMind-AI collection (41 datasets)
dcbe89a verified
---
title: README
emoji: πŸ”­
colorFrom: blue
colorTo: green
sdk: static
pinned: false
---
# TraceVerse Community
> **The fastest way to know what your AI agent is actually doing β€” and prove it on a public leaderboard.**
You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends β‚Ή400 on a single user query and you have no idea why.
This org exists to fix that. Open source, framework-agnostic, built so you can go from `git clone` to a traced agent with a leaderboard rank in under five minutes.
πŸ”— **[Discord](https://discord.gg/6SVz6VKK)** Β· **[GitHub](https://github.com/traceverse-community)** Β· **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** Β· **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)**
---
## Get a traced agent in 30 seconds
```python
# pip install genai-otel-instrument
from genai_otel_instrument import instrument
instrument(
service_name="my-first-agent",
otlp_endpoint="http://localhost:4318", # or point at the public TraceMind Space
redact_pii=True, # PII off your traces by default
)
# That's it. Run your agent. Every LLM call, tool call, token, rupee, and
# millisecond of latency is now visible.
```
No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla `openai` β€” anything that hits an LLM API.
---
## What we ship
### Libraries
| Project | What you get |
|---|---|
| **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** | One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. |
| **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)** | Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. |
| **[`TraceMind`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)** | Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. |
| **[`TraceMind-mcp-server`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)** | An MCP server so your agent can query its *own* historical traces. Meta-observability for self-improving agents. |
### Live MCP servers (3 servers Β· 18 tools Β· synthetic data Β· no API key)
| Surface | Space | Tools |
|---|---|---|
| Food delivery | [`food-delivery-mcp`](https://huggingface.co/spaces/traceverse-community/food-delivery-mcp) | 7 |
| Grocery / Instamart | [`instamart-mcp`](https://huggingface.co/spaces/traceverse-community/instamart-mcp) | 6 |
| Dineout / Reservations | [`dineout-mcp`](https://huggingface.co/spaces/traceverse-community/dineout-mcp) | 5 |
### Eval datasets (SmolTrace-format)
| Dataset | Tasks |
|---|---|
| [`food-delivery-evals`](https://huggingface.co/datasets/traceverse-community/food-delivery-evals) | 111 |
| [`instamart-evals`](https://huggingface.co/datasets/traceverse-community/instamart-evals) | 100 |
| [`dineout-evals`](https://huggingface.co/datasets/traceverse-community/dineout-evals) | 100 |
### Cross-domain SmolTrace datasets
For evaluation across other domains, see the **[TraceMind-AI Collection](https://huggingface.co/collections/kshitijthakkar/tracemind-ai)** β€” 41 SmolTrace-format datasets covering:
- **General domains** (12) β€” travel, ecommerce, healthcare, finance, legal, education, real-estate, social-media, recruitment, smart-home, customer-support, food-delivery
- **Ops & infrastructure** (15) β€” aiops, apm, devops, secops, mlops, llmops, cloud-cost, kubernetes, database-ops, incident-management, IaC, SRE, observability-platform, CI/CD, log-management
- **Industry-specific** (14) β€” drone, farming, manufacturing, hospitality, logistics, automotive, cybersecurity, telecom, insurance, events, marine, aviation, gaming, plus the three TraceVerse Community datasets above
Same SmolTrace schema, same prompt-template structure as ours. Use them directly β€” no need to mirror.
### Reference agents + docs
- **GitHub:** [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) β€” the binding repo. Reference agents wired with `genai-otel-instrument`, architecture docs, observability primer, leaderboard CI.
---
## What you'll get from this stack
- **See it.** Every LLM call, tool call, token spent, millisecond burned β€” visualized as a trace tree.
- **Score it.** Run your agent against shared task datasets. Get a number on a public leaderboard. Watch it move.
- **Compare it.** Two model versions, two prompts, two frameworks β€” same dataset, side-by-side cost, latency, and quality.
- **Trust it.** PII redaction is on by default. Self-host the viewer if you don't want anyone seeing your traces.
---
## Who this is for
- **Buildathon participants** β€” go from zero to traced agent with a leaderboard rank in under five minutes. Any framework, any model.
- **Indie builders** β€” see what your agent actually does, not what you think it does. Stop debugging via `print()`.
- **Teams shipping LLM apps** β€” replace ad-hoc notebook evals with reproducible numbers you can show a stakeholder.
- **Researchers** β€” every dataset and benchmark here is open. Fork it, extend it, contribute back.
---
## What we believe
1. **Observability is a precondition for serious agent work.** You cannot improve what you cannot see.
2. **Evaluation should be reproducible and public.** Benchmarks that live in private notebooks help no one.
3. **Cost and latency are first-class signals.** Quality without cost discipline is a research demo, not a product.
4. **The toolkit must work the same on localhost as in production.** No magic that only kicks in on day 30.
---
## Community
- πŸ’¬ **[Discord](https://discord.gg/6SVz6VKK)** β€” chat with the community, ask questions, share traces, suggest tasks for the eval suites.
- πŸ™ **[GitHub](https://github.com/traceverse-community)** β€” open issues, PRs welcome, no CLA. Discussions enabled on every repo.
- πŸ€— **HF Discussions** β€” every Space and Dataset has a Discussions tab. Use it for surface-specific questions (e.g. "found a bug in `apply_promo`" β†’ discuss on the Space's tab).
---
## Roadmap
- βœ… **Live now** β€” `genai-otel-instrument`, `SmolTrace`, public `TraceMind`, `TraceMind-mcp-server`, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) binding repo.
- πŸ”œ **Next** β€” framework-specific reference agents (LangGraph + smolagents + CrewAI), automated PR-driven leaderboard, more domain MCP servers.
- **After** β€” community-curated tasks across more domains, cost-optimization recipes, `agents.md` standardization across all our Spaces.
---
## Production-grade companion
Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? **TraceVerse Enterprise** is the bigger sibling built for regulated environments β€” same telemetry contract, hardened for the bank floor.
---
## Get involved
- **Try it** β€” start with `genai-otel-instrument` on the agent you have right now.
- **Contribute** β€” every repo above accepts PRs. Issues open. No CLA.
- **Share datasets** β€” got a domain-specific task set? PR it into SmolTrace or open a discussion.
- **Join the conversation** β€” [Discord](https://discord.gg/6SVz6VKK), GitHub Discussions, or HF Discussions on any repo. We answer.