IF_RLHF - a Taywon Collection

Taywon 's Collections

subliminal-learning-paraphrase

IF_RLHF

updated Oct 1, 2025

Datasets for the paper 'Understanding Impact of Human Feedback via Influence Functions'

Taywon/HH_length_biased_15k

Viewer • Updated Dec 5, 2024 • 21k • 185 • 1
Taywon/HH_sycophancy_biased_15k

Viewer • Updated Dec 5, 2024 • 16.1k • 72
Understanding Impact of Human Feedback via Influence Functions

Paper • 2501.05790 • Published Jan 10, 2025 • 1