Display and analyze reward model evaluation results
Evaluating LLMs on Multilingual Multimodal Financial Tasks