MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models Paper • 2410.08182 • Published Oct 10, 2024
Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation Paper • 2604.05083 • Published Apr 6
MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines Paper • 2603.06679 • Published Mar 30 • 6
AVO: Agentic Variation Operators for Autonomous Evolutionary Search Paper • 2603.24517 • Published Mar 25 • 11
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising Paper • 2603.16792 • Published Mar 17 • 3
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published Feb 12 • 20
UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Paper • 2602.12279 • Published Feb 12 • 20
SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization Paper • 2602.04811 • Published Feb 4 • 2