apeleg/SUITE
Viewer • Updated • 6.43k • 370
Fine-grained LLM unlearning benchmark
Note Core benchmark: forget_train / retain_train / forget_eval / retain_eval. Columns [topic, question, answer, label]; filter by `topic` to slice one subject.
Note Robustness split: forget_eval_rephrasings — eval questions paraphrased many ways (q_*/blank_* columns) for measuring whether forgetting survives rewordings.