deep-ignorance-unfiltered_unlearned_grad_diff

This model was created by fine-tuning EleutherAI/deep-ignorance-unfiltered using the Gradient Difference unlearning algorithm. The method is based on Liu et al. 2022. The goal of unlearning is to remove specific knowledge from a pretrained language model while preserving its general capabilities.

Hyperparameters

Parameter	Value
Base model	`EleutherAI/deep-ignorance-unfiltered`
Unlearning method	`Gradient Difference`
Learning rate	`3e-05`
Epochs	`3`
Batch size	`32`
Max sequence length	`512`
Optimizer	`adamw`
Gradient clipping	`1.0`
Gradient accumulation steps	`1`
Seed	`42`
W&B / run name	`grad_diff__ep3_lr3e-05_bs32_fw0.5_mle512_mli10000`
Forget weight	`0.5`

Evaluation Results

Benchmark	Value
mmlu / acc	0.4223
mmlu / acc_stderr	0.0040
mmlu_abstract_algebra / acc	0.2800
mmlu_abstract_algebra / acc_stderr	0.0451
mmlu_anatomy / acc	0.4815
mmlu_anatomy / acc_stderr	0.0432
mmlu_astronomy / acc	0.4868
mmlu_astronomy / acc_stderr	0.0407
mmlu_business_ethics / acc	0.3600
mmlu_business_ethics / acc_stderr	0.0482
mmlu_clinical_knowledge / acc	0.4566
mmlu_clinical_knowledge / acc_stderr	0.0307
mmlu_college_biology / acc	0.3889
mmlu_college_biology / acc_stderr	0.0408
mmlu_college_chemistry / acc	0.3200
mmlu_college_chemistry / acc_stderr	0.0469
mmlu_college_computer_science / acc	0.4700
mmlu_college_computer_science / acc_stderr	0.0502
mmlu_college_mathematics / acc	0.3200
mmlu_college_mathematics / acc_stderr	0.0469
mmlu_college_medicine / acc	0.4509
mmlu_college_medicine / acc_stderr	0.0379
mmlu_college_physics / acc	0.2157
mmlu_college_physics / acc_stderr	0.0409
mmlu_computer_security / acc	0.6000
mmlu_computer_security / acc_stderr	0.0492
mmlu_conceptual_physics / acc	0.3617
mmlu_conceptual_physics / acc_stderr	0.0314
mmlu_econometrics / acc	0.2719
mmlu_econometrics / acc_stderr	0.0419
mmlu_electrical_engineering / acc	0.3862
mmlu_electrical_engineering / acc_stderr	0.0406
mmlu_elementary_mathematics / acc	0.2725
mmlu_elementary_mathematics / acc_stderr	0.0229
mmlu_formal_logic / acc	0.2460
mmlu_formal_logic / acc_stderr	0.0385
mmlu_global_facts / acc	0.3500
mmlu_global_facts / acc_stderr	0.0479
mmlu_high_school_biology / acc	0.4677
mmlu_high_school_biology / acc_stderr	0.0284
mmlu_high_school_chemistry / acc	0.3498
mmlu_high_school_chemistry / acc_stderr	0.0336
mmlu_high_school_computer_science / acc	0.4600
mmlu_high_school_computer_science / acc_stderr	0.0501
mmlu_high_school_european_history / acc	0.5091
mmlu_high_school_european_history / acc_stderr	0.0390
mmlu_high_school_geography / acc	0.5303
mmlu_high_school_geography / acc_stderr	0.0356
mmlu_high_school_government_and_politics / acc	0.5596
mmlu_high_school_government_and_politics / acc_stderr	0.0358
mmlu_high_school_macroeconomics / acc	0.3667
mmlu_high_school_macroeconomics / acc_stderr	0.0244
mmlu_high_school_mathematics / acc	0.2593
mmlu_high_school_mathematics / acc_stderr	0.0267
mmlu_high_school_microeconomics / acc	0.4202
mmlu_high_school_microeconomics / acc_stderr	0.0321
mmlu_high_school_physics / acc	0.3113
mmlu_high_school_physics / acc_stderr	0.0378
mmlu_high_school_psychology / acc	0.6037
mmlu_high_school_psychology / acc_stderr	0.0210
mmlu_high_school_statistics / acc	0.3241
mmlu_high_school_statistics / acc_stderr	0.0319
mmlu_high_school_us_history / acc	0.5147
mmlu_high_school_us_history / acc_stderr	0.0351
mmlu_high_school_world_history / acc	0.5823
mmlu_high_school_world_history / acc_stderr	0.0321
mmlu_human_aging / acc	0.4619
mmlu_human_aging / acc_stderr	0.0335
mmlu_human_sexuality / acc	0.5344
mmlu_human_sexuality / acc_stderr	0.0437
mmlu_humanities / acc	0.3977
mmlu_humanities / acc_stderr	0.0069
mmlu_international_law / acc	0.5289
mmlu_international_law / acc_stderr	0.0456
mmlu_jurisprudence / acc	0.4722
mmlu_jurisprudence / acc_stderr	0.0483
mmlu_logical_fallacies / acc	0.4540
mmlu_logical_fallacies / acc_stderr	0.0391
mmlu_machine_learning / acc	0.2411
mmlu_machine_learning / acc_stderr	0.0406
mmlu_management / acc	0.5340
mmlu_management / acc_stderr	0.0494
mmlu_marketing / acc	0.6111
mmlu_marketing / acc_stderr	0.0319
mmlu_medical_genetics / acc	0.3800
mmlu_medical_genetics / acc_stderr	0.0488
mmlu_miscellaneous / acc	0.5888
mmlu_miscellaneous / acc_stderr	0.0176
mmlu_moral_disputes / acc	0.4769
mmlu_moral_disputes / acc_stderr	0.0269
mmlu_moral_scenarios / acc	0.2391
mmlu_moral_scenarios / acc_stderr	0.0143
mmlu_nutrition / acc	0.4216
mmlu_nutrition / acc_stderr	0.0283
mmlu_other / acc	0.4596
mmlu_other / acc_stderr	0.0087
mmlu_philosophy / acc	0.5273
mmlu_philosophy / acc_stderr	0.0284
mmlu_prehistory / acc	0.4722
mmlu_prehistory / acc_stderr	0.0278
mmlu_professional_accounting / acc	0.3617
mmlu_professional_accounting / acc_stderr	0.0287
mmlu_professional_law / acc	0.3364
mmlu_professional_law / acc_stderr	0.0121
mmlu_professional_medicine / acc	0.3640
mmlu_professional_medicine / acc_stderr	0.0292
mmlu_professional_psychology / acc	0.4281
mmlu_professional_psychology / acc_stderr	0.0200
mmlu_public_relations / acc	0.4455
mmlu_public_relations / acc_stderr	0.0476
mmlu_security_studies / acc	0.3837
mmlu_security_studies / acc_stderr	0.0311
mmlu_social_sciences / acc	0.4859
mmlu_social_sciences / acc_stderr	0.0088
mmlu_sociology / acc	0.6617
mmlu_sociology / acc_stderr	0.0335
mmlu_stem / acc	0.3603
mmlu_stem / acc_stderr	0.0084
mmlu_us_foreign_policy / acc	0.7100
mmlu_us_foreign_policy / acc_stderr	0.0456
mmlu_virology / acc	0.1687
mmlu_virology / acc_stderr	0.0292
mmlu_world_religions / acc	0.6550
mmlu_world_religions / acc_stderr	0.0365
wikitext / bits_per_byte	0.6807
wikitext / bits_per_byte_stderr	N/A
wikitext / byte_perplexity	1.6029
wikitext / byte_perplexity_stderr	N/A
wikitext / word_perplexity	12.4646
wikitext / word_perplexity_stderr	N/A
wmdp_bio_categorized_mcqa / acc	0.2773
wmdp_bio_categorized_mcqa / acc_stderr	0.0125
wmdp_bio_cloze_verified / acc_norm	0.2481
wmdp_bio_cloze_verified / acc_norm_stderr	0.0132
wmdp_bio_robust / acc	0.2638
wmdp_bio_robust / acc_stderr	0.0150
wmdp_bio_robust_bioweapons_and_bioterrorism / acc	0.2842
wmdp_bio_robust_bioweapons_and_bioterrorism / acc_stderr	0.0328
wmdp_bio_robust_dual_use_virology / acc	0.2500
wmdp_bio_robust_dual_use_virology / acc_stderr	0.0833
wmdp_bio_robust_enhanced_potential_pandemic_pathogens / acc	0.2549
wmdp_bio_robust_enhanced_potential_pandemic_pathogens / acc_stderr	0.0434
wmdp_bio_robust_expanding_access_to_threat_vectors / acc	0.2381
wmdp_bio_robust_expanding_access_to_threat_vectors / acc_stderr	0.0952
wmdp_bio_robust_reverse_genetics_and_easy_editing / acc	0.2258
wmdp_bio_robust_reverse_genetics_and_easy_editing / acc_stderr	0.0307
wmdp_bio_robust_rewritten / acc	0.2409
wmdp_bio_robust_rewritten / acc_stderr	0.0087
wmdp_bio_robust_rewritten_gibberish / acc	0.2281
wmdp_bio_robust_rewritten_gibberish / acc_stderr	0.0147
wmdp_bio_robust_rewritten_nonsensical_biology / acc	0.2454
wmdp_bio_robust_rewritten_nonsensical_biology / acc_stderr	0.0151
wmdp_bio_robust_rewritten_real_words_sciency / acc	0.2491
wmdp_bio_robust_rewritten_real_words_sciency / acc_stderr	0.0152
wmdp_bio_robust_viral_vector_research / acc	0.2786
wmdp_bio_robust_viral_vector_research / acc_stderr	0.0243
wmdp_bio_shortcut / acc	0.3062
wmdp_bio_shortcut / acc_stderr	0.0225
wmdp_bio_shortcut_bioweapons_and_bioterrorism / acc	0.5745
wmdp_bio_shortcut_bioweapons_and_bioterrorism / acc_stderr	0.0729
wmdp_bio_shortcut_dual_use_virology / acc	0.3158
wmdp_bio_shortcut_dual_use_virology / acc_stderr	0.1096
wmdp_bio_shortcut_enhanced_potential_pandemic_pathogens / acc	0.2453
wmdp_bio_shortcut_enhanced_potential_pandemic_pathogens / acc_stderr	0.0597
wmdp_bio_shortcut_expanding_access_to_threat_vectors / acc	0.3333
wmdp_bio_shortcut_expanding_access_to_threat_vectors / acc_stderr	0.1667
wmdp_bio_shortcut_reverse_genetics_and_easy_editing / acc	0.2588
wmdp_bio_shortcut_reverse_genetics_and_easy_editing / acc_stderr	0.0478
wmdp_bio_shortcut_viral_vector_research / acc	0.2760
wmdp_bio_shortcut_viral_vector_research / acc_stderr	0.0323