3 1

Subramanyam Sahoo

SahoobhAI

AI & ML interests

RL, AI Safety and AI Alignment

Recent Activity

updated a Space 27 days ago

SahoobhAI/sycophancy-eval

published a Space 29 days ago

SahoobhAI/sycophancy-eval

authored a paper about 2 months ago

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

View all activity

Organizations

updated a Space 27 days ago

SycophancyEvalEnv

🔬

Evaluate AI responses for sycophancy pressure

published a Space 29 days ago

SycophancyEvalEnv

🔬

Evaluate AI responses for sycophancy pressure

authored 8 papers about 2 months ago

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Paper • 2511.05613 • Published Nov 6, 2025

Position: The Complexity of Perfect AI Alignment -- Formalizing the RLHF Trilemma

Paper • 2511.19504 • Published Nov 23, 2025 • 2

Catch Me If You Can: How Smaller Reasoning Models Pretend to Reason with Mathematical Fidelity

Paper • 2512.00552 • Published Nov 29, 2025

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

Paper • 2603.03475 • Published Mar 3

I Can't Believe It's Not Robust: Catastrophic Collapse of Safety Classifiers under Embedding Drift

Paper • 2603.01297 • Published Mar 1

Dial E for Ethical Enforcement: institutional VETO power as a governance primitive

Paper • 2603.00617 • Published Feb 28

SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement

Paper • 2603.06333 • Published Mar 6 • 1

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

Paper • 2603.09200 • Published Mar 10 • 5

upvoted a paper about 2 months ago

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

Paper • 2603.09200 • Published Mar 10 • 5

liked a model 7 months ago

facebook/cwm

33B • Updated Oct 15, 2025 • 14.2k • 265

upvoted an article over 1 year ago

Article

Let's talk about LLM evaluation

May 23, 2024

•

209

upvoted a collection almost 2 years ago

[lecture artifacts] aligning open language models

Collection

artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin • 63 items • Updated Apr 17, 2024 • 58

Subramanyam Sahoo

AI & ML interests

Recent Activity

Organizations

SahoobhAI's activity

SycophancyEvalEnv

SycophancyEvalEnv

Let's talk about LLM evaluation