Bartosz Cywiński

bcywinski

·

https://cywinski.github.io/

AI & ML interests

Mechanistic Interpretability

Organizations

None yet

commented a paper 10 months ago

Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published Oct 1, 2025 • 6 •

commented a paper about 1 year ago

Towards eliciting latent knowledge from LLMs with mechanistic interpretability

Paper • 2505.14352 • Published May 20, 2025 • 9 •

commented a paper over 1 year ago

SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

Paper • 2501.18052 • Published Jan 29, 2025 • 8 •