Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models Paper • 2407.16470 • Published Jul 23, 2024
Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG Paper • 2603.04238 • Published 9 days ago
How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? Paper • 2410.16574 • Published Oct 21, 2024
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games Paper • 2411.13543 • Published Nov 20, 2024 • 19