AI Safety and Alignment Group at the ELLIS Institute Tübingen and MPI-IS

non-profit

https://aisagroup.substack.com/

maksym_andr

aisa-group

Activity Feed

AI & ML interests

AI Safety, AI alignment

Recent Activity

hrdkbhatnagar updated a dataset about 2 hours ago

aisa-group/PostTrainBench-Trajectories

hrdkbhatnagar published a dataset about 2 months ago

aisa-group/PostTrainBench-Trajectories

MaksymAndriushchenko authored a paper 8 months ago

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

View all activity

hrdkbhatnagar

updated a dataset about 2 hours ago

aisa-group/PostTrainBench-Trajectories

Updated about 2 hours ago • 1.84k • 2

hrdkbhatnagar

published a dataset about 2 months ago

aisa-group/PostTrainBench-Trajectories

Updated about 2 hours ago • 1.84k • 2

MaksymAndriushchenko

authored a paper 8 months ago

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22, 2025 • 12

hrdkbhatnagar

authored 2 papers about 1 year ago

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Paper • 2504.07086 • Published Apr 9, 2025 • 21

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

Paper • 2409.14507 • Published Sep 22, 2024 • 1

MaksymAndriushchenko

authored 2 papers about 2 years ago

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Paper • 2404.01318 • Published Mar 28, 2024

Layer-wise Linear Mode Connectivity

Paper • 2307.06966 • Published Jul 13, 2023

MaksymAndriushchenko

authored 2 papers almost 3 years ago

A Modern Look at the Relationship between Sharpness and Generalization

Paper • 2302.07011 • Published Feb 14, 2023

SGD with Large Step Sizes Learns Sparse Features

Paper • 2210.05337 • Published Oct 11, 2022

AI & ML interests

Recent Activity

Team members 2

aisa-group's activity