RLHF Trojan Competition Collection Datasets and models used for the trojan detection competition co-located at SaTML 2024: https://github.com/ethz-spylab/rlhf_trojan_competition • 20 items • Updated Apr 30, 2024 • 4
Quirky Models and Datasets Collection A collection of datasets and finetuned models that can be used for Eliciting Latent Knowledge (ELK) research. • 180 items • Updated Feb 26, 2025 • 2