DE-COP: Detecting Copyrighted Content in Language Models Training Data Paper • 2402.09910 • Published Feb 15, 2024 • 1
A Practical Examination of AI-Generated Text Detectors for Large Language Models Paper • 2412.05139 • Published Dec 6, 2024
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Paper • 2502.12659 • Published Feb 18, 2025 • 7
DIS-CO: Discovering Copyrighted Content in VLMs Training Data Paper • 2502.17358 • Published Feb 24, 2025 • 1
Evaluating Durability: Benchmark Insights into Multimodal Watermarking Paper • 2406.03728 • Published Jun 6, 2024
Improving LLM Safety Alignment with Dual-Objective Optimization Paper • 2503.03710 • Published Mar 5, 2025 • 1
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models Paper • 2503.14827 • Published Mar 19, 2025
Scalable Best-of-N Selection for Large Language Models via Self-Certainty Paper • 2502.18581 • Published Feb 25, 2025
Assessing Judging Bias in Large Reasoning Models: An Empirical Study Paper • 2504.09946 • Published Apr 14, 2025
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning Paper • 2505.16186 • Published May 22, 2025 • 7
AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents Paper • 2506.14205 • Published Jun 17, 2025 • 8
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation Paper • 2507.05578 • Published Jul 8, 2025 • 6
Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Paper • 2507.07484 • Published Jul 10, 2025 • 18
AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents Paper • 2505.05849 • Published May 9, 2025
OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models Paper • 2505.21347 • Published May 27, 2025 • 1
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 10 days ago • 28
InfoSynth: Information-Guided Benchmark Synthesis for LLMs Paper • 2601.00575 • Published 25 days ago • 3