None defined yet.
PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments
Counting as a minimal probe of language model reliability