Dineout MCP
Dine-in / reservations MCP β synthetic data, no API key.
TraceVerse Community is an open evaluation and observability ecosystem for real-world AI applications. Built on top of the Hugging Face Hub, it hosts datasets, traces, and benchmarking pipelines that help developers measure cost, latency, and quality across models using real production-like workflows. We focus on turning observability data into reusable evaluation assetsβenabling reproducible benchmarking, continuous optimization, and better model selection for any AI use case. The goal is simple: make evaluation a first-class, community-driven layer in the AI stack.
The fastest way to know what your AI agent is actually doing β and prove it on a public leaderboard.
You wrote an agent. It works. Sometimes. It calls an LLM, it calls a tool, sometimes it loops, occasionally it spends βΉ400 on a single user query and you have no idea why.
This org exists to fix that. Open source, framework-agnostic, built so you can go from git clone to a traced agent with a leaderboard rank in under five minutes.
π Discord Β· GitHub Β· genai-otel-instrument Β· SmolTrace
# pip install genai-otel-instrument
from genai_otel_instrument import instrument
instrument(
service_name="my-first-agent",
otlp_endpoint="http://localhost:4318", # or point at the public TraceMind Space
redact_pii=True, # PII off your traces by default
)
# That's it. Run your agent. Every LLM call, tool call, token, rupee, and
# millisecond of latency is now visible.
No SDK lock-in. No daemons. No "you must use our framework." Works with LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, smolagents, vanilla openai β anything that hits an LLM API.
| Project | What you get |
|---|---|
genai-otel-instrument |
One-line OpenTelemetry instrumentation for any GenAI agent. Captures LLM calls, tool calls, cost, tokens, latency. Auto-redacts PII by default. |
SmolTrace |
Public benchmark + leaderboard for agent evals. Submit an agent, get a rank, compare on cost, latency, and quality. |
TraceMind |
Hosted trace viewer. Point your OTLP endpoint at it, see what your agent did, where it broke, what it cost. No signup. |
TraceMind-mcp-server |
An MCP server so your agent can query its own historical traces. Meta-observability for self-improving agents. |
| Surface | Space | Tools |
|---|---|---|
| Food delivery | food-delivery-mcp |
7 |
| Grocery / Instamart | instamart-mcp |
6 |
| Dineout / Reservations | dineout-mcp |
5 |
| Dataset | Tasks |
|---|---|
food-delivery-evals |
111 |
instamart-evals |
100 |
dineout-evals |
100 |
For evaluation across other domains, see the TraceMind-AI Collection β 41 SmolTrace-format datasets covering:
Same SmolTrace schema, same prompt-template structure as ours. Use them directly β no need to mirror.
food-delivery-agents β the binding repo. Reference agents wired with genai-otel-instrument, architecture docs, observability primer, leaderboard CI.print().apply_promo" β discuss on the Space's tab).genai-otel-instrument, SmolTrace, public TraceMind, TraceMind-mcp-server, 3 live MCP servers (food / grocery / dineout, 18 tools), 3 own eval suites (311 tasks total), 18 mirrored eval datasets, food-delivery-agents binding repo.agents.md standardization across all our Spaces.Need this stack on-premises with autonomous root-cause analysis, compliance audit trails, multi-year retention, and air-gapped deployment? TraceVerse Enterprise is the bigger sibling built for regulated environments β same telemetry contract, hardened for the bank floor.
genai-otel-instrument on the agent you have right now.