Spaces:
Running
Running
Useful tool from Modal
#11
by mkieffer - opened
Cool app from Modal that benchmarks LLM latencies in various frameworks running on Modal:
https://modal.com/llm-almanac/advisor
This kind of latency benchmark is useful for agents too, especially once a tool call or routing decision fans out into multiple model calls.
One thing I would like to see in agent benchmarks is not only single-call latency, but end-to-end run shape:
- number of model calls
- number of tool calls
- p50/p95 per step
- retries/timeouts
- final success rate
- total cost per completed run
For MCP-heavy agents, the slow part is often the choreography around the model rather than the model call alone.