Spaces:
Running
Running
File size: 7,703 Bytes
195daf9 17b59a3 195daf9 17b59a3 9c92617 17b59a3 9c92617 17b59a3 9c92617 17b59a3 9c92617 17b59a3 9c92617 dcbe89a 9c92617 17b59a3 9c92617 17b59a3 9c92617 17b59a3 9c92617 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | ---
title: README
emoji: π
colorFrom: blue
colorTo: green
sdk: static
pinned: false
---
# TraceVerse Community
> **The fastest way to know what your AI agent is actually doing β and prove it on a public leaderboard.**
You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends βΉ400 on a single user query and you have no idea why.
This org exists to fix that. Open source, framework-agnostic, built so you can go from `git clone` to a traced agent with a leaderboard rank in under five minutes.
π **[Discord](https://discord.gg/6SVz6VKK)** Β· **[GitHub](https://github.com/traceverse-community)** Β· **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** Β· **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)**
---
## Get a traced agent in 30 seconds
```python
# pip install genai-otel-instrument
from genai_otel_instrument import instrument
instrument(
service_name="my-first-agent",
otlp_endpoint="http://localhost:4318", # or point at the public TraceMind Space
redact_pii=True, # PII off your traces by default
)
# That's it. Run your agent. Every LLM call, tool call, token, rupee, and
# millisecond of latency is now visible.
```
No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla `openai` β anything that hits an LLM API.
---
## What we ship
### Libraries
| Project | What you get |
|---|---|
| **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** | One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. |
| **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)** | Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. |
| **[`TraceMind`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)** | Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. |
| **[`TraceMind-mcp-server`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)** | An MCP server so your agent can query its *own* historical traces. Meta-observability for self-improving agents. |
### Live MCP servers (3 servers Β· 18 tools Β· synthetic data Β· no API key)
| Surface | Space | Tools |
|---|---|---|
| Food delivery | [`food-delivery-mcp`](https://huggingface.co/spaces/traceverse-community/food-delivery-mcp) | 7 |
| Grocery / Instamart | [`instamart-mcp`](https://huggingface.co/spaces/traceverse-community/instamart-mcp) | 6 |
| Dineout / Reservations | [`dineout-mcp`](https://huggingface.co/spaces/traceverse-community/dineout-mcp) | 5 |
### Eval datasets (SmolTrace-format)
| Dataset | Tasks |
|---|---|
| [`food-delivery-evals`](https://huggingface.co/datasets/traceverse-community/food-delivery-evals) | 111 |
| [`instamart-evals`](https://huggingface.co/datasets/traceverse-community/instamart-evals) | 100 |
| [`dineout-evals`](https://huggingface.co/datasets/traceverse-community/dineout-evals) | 100 |
### Cross-domain SmolTrace datasets
For evaluation across other domains, see the **[TraceMind-AI Collection](https://huggingface.co/collections/kshitijthakkar/tracemind-ai)** β 41 SmolTrace-format datasets covering:
- **General domains** (12) β travel, ecommerce, healthcare, finance, legal, education, real-estate, social-media, recruitment, smart-home, customer-support, food-delivery
- **Ops & infrastructure** (15) β aiops, apm, devops, secops, mlops, llmops, cloud-cost, kubernetes, database-ops, incident-management, IaC, SRE, observability-platform, CI/CD, log-management
- **Industry-specific** (14) β drone, farming, manufacturing, hospitality, logistics, automotive, cybersecurity, telecom, insurance, events, marine, aviation, gaming, plus the three TraceVerse Community datasets above
Same SmolTrace schema, same prompt-template structure as ours. Use them directly β no need to mirror.
### Reference agents + docs
- **GitHub:** [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) β the binding repo. Reference agents wired with `genai-otel-instrument`, architecture docs, observability primer, leaderboard CI.
---
## What you'll get from this stack
- **See it.** Every LLM call, tool call, token spent, millisecond burned β visualized as a trace tree.
- **Score it.** Run your agent against shared task datasets. Get a number on a public leaderboard. Watch it move.
- **Compare it.** Two model versions, two prompts, two frameworks β same dataset, side-by-side cost, latency, and quality.
- **Trust it.** PII redaction is on by default. Self-host the viewer if you don't want anyone seeing your traces.
---
## Who this is for
- **Buildathon participants** β go from zero to traced agent with a leaderboard rank in under five minutes. Any framework, any model.
- **Indie builders** β see what your agent actually does, not what you think it does. Stop debugging via `print()`.
- **Teams shipping LLM apps** β replace ad-hoc notebook evals with reproducible numbers you can show a stakeholder.
- **Researchers** β every dataset and benchmark here is open. Fork it, extend it, contribute back.
---
## What we believe
1. **Observability is a precondition for serious agent work.** You cannot improve what you cannot see.
2. **Evaluation should be reproducible and public.** Benchmarks that live in private notebooks help no one.
3. **Cost and latency are first-class signals.** Quality without cost discipline is a research demo, not a product.
4. **The toolkit must work the same on localhost as in production.** No magic that only kicks in on day 30.
---
## Community
- π¬ **[Discord](https://discord.gg/6SVz6VKK)** β chat with the community, ask questions, share traces, suggest tasks for the eval suites.
- π **[GitHub](https://github.com/traceverse-community)** β open issues, PRs welcome, no CLA. Discussions enabled on every repo.
- π€ **HF Discussions** β every Space and Dataset has a Discussions tab. Use it for surface-specific questions (e.g. "found a bug in `apply_promo`" β discuss on the Space's tab).
---
## Roadmap
- β
**Live now** β `genai-otel-instrument`, `SmolTrace`, public `TraceMind`, `TraceMind-mcp-server`, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) binding repo.
- π **Next** β framework-specific reference agents (LangGraph + smolagents + CrewAI), automated PR-driven leaderboard, more domain MCP servers.
- **After** β community-curated tasks across more domains, cost-optimization recipes, `agents.md` standardization across all our Spaces.
---
## Production-grade companion
Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? **TraceVerse Enterprise** is the bigger sibling built for regulated environments β same telemetry contract, hardened for the bank floor.
---
## Get involved
- **Try it** β start with `genai-otel-instrument` on the agent you have right now.
- **Contribute** β every repo above accepts PRs. Issues open. No CLA.
- **Share datasets** β got a domain-specific task set? PR it into SmolTrace or open a discussion.
- **Join the conversation** β [Discord](https://discord.gg/6SVz6VKK), GitHub Discussions, or HF Discussions on any repo. We answer.
|