HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30, 2025 • 43
CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation Paper • 2510.17853 • Published Oct 15, 2025 • 8
Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs Paper • 2406.04460 • Published Jun 6, 2024 • 1
Scaling LLM Inference with Optimized Sample Compute Allocation Paper • 2410.22480 • Published Oct 29, 2024
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published Jun 13, 2025 • 24
AutoCode: LLMs as Problem Setters for Competitive Programming Paper • 2510.12803 • Published Sep 29, 2025 • 1
MessIRve: A Large-Scale Spanish Information Retrieval Dataset Paper • 2409.05994 • Published Sep 9, 2024
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30, 2025 • 43
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper • 2505.24098 • Published May 30, 2025 • 43
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Paper • 2503.20776 • Published Mar 26, 2025 • 10
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published Mar 6, 2025 • 15
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26, 2024 • 24
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases Paper • 2406.10290 • Published Jun 12, 2024
xLAM: A Family of Large Action Models to Empower AI Agent Systems Paper • 2409.03215 • Published Sep 5, 2024 • 6
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14, 2024 • 25
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13, 2024 • 41