Running Agents 2 Crowdsourced Evaluation ๐ 2 Evaluate model responses for clinical accuracy and relevance