Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR Paper • 2605.20164 • Published May 19 • 6
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents Paper • 2410.13886 • Published Oct 11, 2024
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help? Paper • 2604.09408 • Published Apr 29 • 5
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help? Paper • 2604.09408 • Published Apr 29 • 5
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences? Paper • 2604.10718 • Published Apr 12 • 4