Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models Paper • 2602.12586 • Published 6 days ago • 1
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs Paper • 2512.05648 • Published Dec 5, 2025
The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity? Paper • 2601.23045 • Published 20 days ago
Learning GUI Grounding with Spatial Reasoning from Visual Feedback Paper • 2509.21552 • Published Sep 25, 2025 • 11
Theorem Prover as a Judge for Synthetic Data Generation Paper • 2502.13137 • Published Feb 18, 2025 • 1
PiCSAR: Probabilistic Confidence Selection And Ranking Paper • 2508.21787 • Published Aug 29, 2025 • 4
PiCSAR: Probabilistic Confidence Selection And Ranking Paper • 2508.21787 • Published Aug 29, 2025 • 4
Self-Training Large Language Models for Tool-Use Without Demonstrations Paper • 2502.05867 • Published Feb 9, 2025
Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain Paper • 2307.03042 • Published Jul 6, 2023
Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them Paper • 2507.10616 • Published Jul 13, 2025 • 1
What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations Paper • 2502.08279 • Published Feb 12, 2025 • 1
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly Paper • 2505.10610 • Published May 15, 2025 • 54
An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering Paper • 2503.23415 • Published Mar 30, 2025 • 1
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression Paper • 2503.02812 • Published Mar 4, 2025 • 10
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs Paper • 2502.05092 • Published Feb 7, 2025 • 8
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Paper • 2502.17540 • Published Feb 24, 2025 • 3
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs Paper • 2502.05092 • Published Feb 7, 2025 • 8
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning Paper • 2410.10336 • Published Oct 14, 2024 • 2