ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 7 days ago • 62
Molecular Contrastive Learning with Chemical Element Knowledge Graph Paper • 2112.00544 • Published Dec 1, 2021 • 1
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models Paper • 2404.00884 • Published Apr 1, 2024 • 1
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration Paper • 2402.11550 • Published Feb 18, 2024 • 19
Show me the evidence: Evaluating the role of evidence and natural language explanations in AI-supported fact-checking Paper • 2601.11387 • Published 7 days ago • 2
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 5 days ago • 42
Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance Paper • 2601.14171 • Published 3 days ago • 41
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published Oct 22, 2025 • 61
RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published 15 days ago • 29
Enhancing Video Inpainting with Aligned Frame Interval Guidance Paper • 2510.21461 • Published Oct 24, 2025
Video as the New Language for Real-World Decision Making Paper • 2402.17139 • Published Feb 27, 2024 • 22
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning Paper • 2601.11141 • Published 7 days ago • 11
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory Paper • 2402.04617 • Published Feb 7, 2024 • 6
LongAlign: A Recipe for Long Context Alignment of Large Language Models Paper • 2401.18058 • Published Jan 31, 2024 • 23
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning Paper • 2401.06532 • Published Jan 12, 2024 • 13
NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views Paper • 2211.16431 • Published Nov 29, 2022 • 1
RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation Paper • 2601.08430 • Published 10 days ago • 53
LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing Paper • 2507.00769 • Published Jul 1, 2025 • 5
CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions Paper • 2512.01224 • Published Dec 1, 2025
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning Paper • 2601.13262 • Published 4 days ago