Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: static | |
| pinned: false | |
| # TraceVerse Community | |
| > **The fastest way to know what your AI agent is actually doing β and prove it on a public leaderboard.** | |
| You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends βΉ400 on a single user query and you have no idea why. | |
| This org exists to fix that. Open source, framework-agnostic, built so you can go from `git clone` to a traced agent with a leaderboard rank in under five minutes. | |
| π **[Discord](https://discord.gg/6SVz6VKK)** Β· **[GitHub](https://github.com/traceverse-community)** Β· **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** Β· **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)** | |
| --- | |
| ## Get a traced agent in 30 seconds | |
| ```python | |
| # pip install genai-otel-instrument | |
| from genai_otel_instrument import instrument | |
| instrument( | |
| service_name="my-first-agent", | |
| otlp_endpoint="http://localhost:4318", # or point at the public TraceMind Space | |
| redact_pii=True, # PII off your traces by default | |
| ) | |
| # That's it. Run your agent. Every LLM call, tool call, token, rupee, and | |
| # millisecond of latency is now visible. | |
| ``` | |
| No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla `openai` β anything that hits an LLM API. | |
| --- | |
| ## What we ship | |
| ### Libraries | |
| | Project | What you get | | |
| |---|---| | |
| | **[`genai-otel-instrument`](https://github.com/Mandark-droid/genai_otel_instrument)** | One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. | | |
| | **[`SmolTrace`](https://github.com/Mandark-droid/SMOLTRACE)** | Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. | | |
| | **[`TraceMind`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind)** | Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. | | |
| | **[`TraceMind-mcp-server`](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)** | An MCP server so your agent can query its *own* historical traces. Meta-observability for self-improving agents. | | |
| ### Live MCP servers (3 servers Β· 18 tools Β· synthetic data Β· no API key) | |
| | Surface | Space | Tools | | |
| |---|---|---| | |
| | Food delivery | [`food-delivery-mcp`](https://huggingface.co/spaces/traceverse-community/food-delivery-mcp) | 7 | | |
| | Grocery / Instamart | [`instamart-mcp`](https://huggingface.co/spaces/traceverse-community/instamart-mcp) | 6 | | |
| | Dineout / Reservations | [`dineout-mcp`](https://huggingface.co/spaces/traceverse-community/dineout-mcp) | 5 | | |
| ### Eval datasets (SmolTrace-format) | |
| | Dataset | Tasks | | |
| |---|---| | |
| | [`food-delivery-evals`](https://huggingface.co/datasets/traceverse-community/food-delivery-evals) | 111 | | |
| | [`instamart-evals`](https://huggingface.co/datasets/traceverse-community/instamart-evals) | 100 | | |
| | [`dineout-evals`](https://huggingface.co/datasets/traceverse-community/dineout-evals) | 100 | | |
| ### Cross-domain SmolTrace datasets | |
| For evaluation across other domains, see the **[TraceMind-AI Collection](https://huggingface.co/collections/kshitijthakkar/tracemind-ai)** β 41 SmolTrace-format datasets covering: | |
| - **General domains** (12) β travel, ecommerce, healthcare, finance, legal, education, real-estate, social-media, recruitment, smart-home, customer-support, food-delivery | |
| - **Ops & infrastructure** (15) β aiops, apm, devops, secops, mlops, llmops, cloud-cost, kubernetes, database-ops, incident-management, IaC, SRE, observability-platform, CI/CD, log-management | |
| - **Industry-specific** (14) β drone, farming, manufacturing, hospitality, logistics, automotive, cybersecurity, telecom, insurance, events, marine, aviation, gaming, plus the three TraceVerse Community datasets above | |
| Same SmolTrace schema, same prompt-template structure as ours. Use them directly β no need to mirror. | |
| ### Reference agents + docs | |
| - **GitHub:** [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) β the binding repo. Reference agents wired with `genai-otel-instrument`, architecture docs, observability primer, leaderboard CI. | |
| --- | |
| ## What you'll get from this stack | |
| - **See it.** Every LLM call, tool call, token spent, millisecond burned β visualized as a trace tree. | |
| - **Score it.** Run your agent against shared task datasets. Get a number on a public leaderboard. Watch it move. | |
| - **Compare it.** Two model versions, two prompts, two frameworks β same dataset, side-by-side cost, latency, and quality. | |
| - **Trust it.** PII redaction is on by default. Self-host the viewer if you don't want anyone seeing your traces. | |
| --- | |
| ## Who this is for | |
| - **Buildathon participants** β go from zero to traced agent with a leaderboard rank in under five minutes. Any framework, any model. | |
| - **Indie builders** β see what your agent actually does, not what you think it does. Stop debugging via `print()`. | |
| - **Teams shipping LLM apps** β replace ad-hoc notebook evals with reproducible numbers you can show a stakeholder. | |
| - **Researchers** β every dataset and benchmark here is open. Fork it, extend it, contribute back. | |
| --- | |
| ## What we believe | |
| 1. **Observability is a precondition for serious agent work.** You cannot improve what you cannot see. | |
| 2. **Evaluation should be reproducible and public.** Benchmarks that live in private notebooks help no one. | |
| 3. **Cost and latency are first-class signals.** Quality without cost discipline is a research demo, not a product. | |
| 4. **The toolkit must work the same on localhost as in production.** No magic that only kicks in on day 30. | |
| --- | |
| ## Community | |
| - π¬ **[Discord](https://discord.gg/6SVz6VKK)** β chat with the community, ask questions, share traces, suggest tasks for the eval suites. | |
| - π **[GitHub](https://github.com/traceverse-community)** β open issues, PRs welcome, no CLA. Discussions enabled on every repo. | |
| - π€ **HF Discussions** β every Space and Dataset has a Discussions tab. Use it for surface-specific questions (e.g. "found a bug in `apply_promo`" β discuss on the Space's tab). | |
| --- | |
| ## Roadmap | |
| - β **Live now** β `genai-otel-instrument`, `SmolTrace`, public `TraceMind`, `TraceMind-mcp-server`, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, [`food-delivery-agents`](https://github.com/traceverse-community/food-delivery-agents) binding repo. | |
| - π **Next** β framework-specific reference agents (LangGraph + smolagents + CrewAI), automated PR-driven leaderboard, more domain MCP servers. | |
| - **After** β community-curated tasks across more domains, cost-optimization recipes, `agents.md` standardization across all our Spaces. | |
| --- | |
| ## Production-grade companion | |
| Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? **TraceVerse Enterprise** is the bigger sibling built for regulated environments β same telemetry contract, hardened for the bank floor. | |
| --- | |
| ## Get involved | |
| - **Try it** β start with `genai-otel-instrument` on the agent you have right now. | |
| - **Contribute** β every repo above accepts PRs. Issues open. No CLA. | |
| - **Share datasets** β got a domain-specific task set? PR it into SmolTrace or open a discussion. | |
| - **Join the conversation** β [Discord](https://discord.gg/6SVz6VKK), GitHub Discussions, or HF Discussions on any repo. We answer. | |