OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems Paper • 2402.14008 • Published Feb 21, 2024
Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning Paper • 2505.16483 • Published May 22, 2025 • 10
GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion Paper • 2502.11471 • Published Feb 17, 2025 • 1
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models Paper • 2305.08322 • Published May 15, 2023
FaithLens: Detecting and Explaining Faithfulness Hallucination Paper • 2512.20182 • Published Dec 23, 2025 • 9
InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs Paper • 2601.06666 • Published Jan 10 • 1
ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published Jun 2, 2025
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap Paper • 2306.03100 • Published Jun 1, 2023
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models Paper • 2407.05502 • Published Jul 7, 2024
Improving Context-Aware Preference Modeling for Language Models Paper • 2407.14916 • Published Jul 20, 2024 • 4
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models Paper • 2403.02715 • Published Mar 5, 2024 • 3
Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking Paper • 2402.05880 • Published Feb 8, 2024 • 3
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference Paper • 2306.12509 • Published Jun 21, 2023 • 15
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Paper • 2306.11698 • Published Jun 20, 2023 • 13