Display and analyze reward model evaluation results
Evaluate response quality with a reward score
Evaluating LLMs on Multilingual Multimodal Financial Tasks