Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published 4 days ago • 51
RecVAE: a New Variational Autoencoder for Top-N Recommendations with Implicit Feedback Paper • 1912.11160 • Published Dec 24, 2019 • 1
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published 4 days ago • 51
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 19
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 19
Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs Paper • 2510.11288 • Published Oct 13, 2025 • 49
Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs Paper • 2510.11288 • Published Oct 13, 2025 • 49
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features Paper • 2509.22033 • Published Sep 26, 2025 • 19
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6, 2025 • 115
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features Paper • 2509.22033 • Published Sep 26, 2025 • 19
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs Paper • 2508.11383 • Published Aug 15, 2025 • 40
The Rogue Scalpel: Activation Steering Compromises LLM Safety Paper • 2509.22067 • Published Sep 26, 2025 • 28
The Rogue Scalpel: Activation Steering Compromises LLM Safety Paper • 2509.22067 • Published Sep 26, 2025 • 28
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs Paper • 2508.11383 • Published Aug 15, 2025 • 40
HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds Paper • 2508.12782 • Published Aug 18, 2025 • 25
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models Paper • 2506.06751 • Published Jun 7, 2025 • 71
Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models Paper • 2506.06751 • Published Jun 7, 2025 • 71
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24, 2025 • 119
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24, 2025 • 119
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20, 2025 • 72