Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

AI Safety and Alignment Group at the ELLIS Institute Tübingen and MPI-IS

non-profit
https://aisagroup.substack.com/
maksym_andr
aisa-group
Activity Feed

AI & ML interests

AI Safety, AI alignment

Maksym Andriushchenko's profile picture Hardik Bhatnagar's profile picture

MaksymAndriushchenko 
authored a paper 4 months ago

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM

Paper • 2509.18058 • Published Sep 22, 2025 • 12
hrdkbhatnagar 
authored a paper 10 months ago

A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility

Paper • 2504.07086 • Published Apr 9, 2025 • 21
hrdkbhatnagar 
authored a paper 12 months ago

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

Paper • 2409.14507 • Published Sep 22, 2024 • 1
MaksymAndriushchenko 
authored 2 papers almost 2 years ago

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Paper • 2404.01318 • Published Mar 28, 2024

Layer-wise Linear Mode Connectivity

Paper • 2307.06966 • Published Jul 13, 2023
MaksymAndriushchenko 
authored 2 papers over 2 years ago

A Modern Look at the Relationship between Sharpness and Generalization

Paper • 2302.07011 • Published Feb 14, 2023

SGD with Large Step Sizes Learns Sparse Features

Paper • 2210.05337 • Published Oct 11, 2022
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs