Running Agents 4 LLM Evaluation Framework Demo ๐ 4 Benchmark LLMs on accuracy, cost, and hallucination.