Bartosz Cywiński
bcywinski
AI & ML interests
Mechanistic Interpretability
Recent Activity
authored
a paper
3 days ago
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation submitted
a paper
3 days ago
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation updated
a dataset 4 days ago
bcywinski/uyghurs-censored Organizations
None yet
Eliciting Secret Knowledge from Language Models
https://arxiv.org/abs/2510.01070
gemma-2-9b-it-user-gender
Llama-3.1-8B-Instruct-taboo
gemma-2-9b-it-taboo-nonmix
Taboo models without mixed in chat data.
Eliciting Secret Knowledge from Language Models
https://arxiv.org/abs/2510.01070
llama-3.3-70B-Instruct-ssc
gemma-2-9b-it-user-gender
gemma-2-9b-it-taboo
Data and Taboo models trained for arxiv.org/abs/2505.14352