A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models Paper • 2605.08513 • Published 20 days ago • 15
Reasoning's Razor: Reasoning Improves Accuracy but Can Hurt Recall at Critical Operating Points in Safety and Hallucination Detection Paper • 2510.21049 • Published Oct 23, 2025 • 3
RePanda: Pandas-powered Tabular Verification and Reasoning Paper • 2503.11921 • Published Mar 14, 2025 • 2
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF Paper • 2411.01798 • Published Nov 4, 2024 • 8
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22, 2024 • 45