deep-ignorance-unfiltered_unlearned_dpo

This model was created by fine-tuning EleutherAI/deep-ignorance-unfiltered using the Direct Preference Optimization unlearning algorithm. The method is based on Rafailov et al. 2023. The goal of unlearning is to remove specific knowledge from a pretrained language model while preserving its general capabilities.

Hyperparameters

Parameter Value
Base model EleutherAI/deep-ignorance-unfiltered
Unlearning method Direct Preference Optimization
Learning rate 4.5e-05
Epochs 1
Batch size 32
Max sequence length 512
Optimizer adamw
Gradient clipping 1.0
Gradient accumulation steps 1
Seed 42
W&B / run name dpo__ep1_lr4.5e-05_bs32_b0.01_mle512_mli8192
Beta 0.01

Evaluation Results

Benchmark Value
mmlu / acc 0.3482
mmlu / acc_stderr 0.0039
mmlu_abstract_algebra / acc 0.2600
mmlu_abstract_algebra / acc_stderr 0.0441
mmlu_anatomy / acc 0.3630
mmlu_anatomy / acc_stderr 0.0415
mmlu_astronomy / acc 0.3750
mmlu_astronomy / acc_stderr 0.0394
mmlu_business_ethics / acc 0.2200
mmlu_business_ethics / acc_stderr 0.0416
mmlu_clinical_knowledge / acc 0.2642
mmlu_clinical_knowledge / acc_stderr 0.0271
mmlu_college_biology / acc 0.2431
mmlu_college_biology / acc_stderr 0.0359
mmlu_college_chemistry / acc 0.2100
mmlu_college_chemistry / acc_stderr 0.0409
mmlu_college_computer_science / acc 0.2000
mmlu_college_computer_science / acc_stderr 0.0402
mmlu_college_mathematics / acc 0.2500
mmlu_college_mathematics / acc_stderr 0.0435
mmlu_college_medicine / acc 0.2601
mmlu_college_medicine / acc_stderr 0.0335
mmlu_college_physics / acc 0.1961
mmlu_college_physics / acc_stderr 0.0395
mmlu_computer_security / acc 0.4000
mmlu_computer_security / acc_stderr 0.0492
mmlu_conceptual_physics / acc 0.3702
mmlu_conceptual_physics / acc_stderr 0.0316
mmlu_econometrics / acc 0.1930
mmlu_econometrics / acc_stderr 0.0371
mmlu_electrical_engineering / acc 0.3655
mmlu_electrical_engineering / acc_stderr 0.0401
mmlu_elementary_mathematics / acc 0.2751
mmlu_elementary_mathematics / acc_stderr 0.0230
mmlu_formal_logic / acc 0.3016
mmlu_formal_logic / acc_stderr 0.0410
mmlu_global_facts / acc 0.3000
mmlu_global_facts / acc_stderr 0.0461
mmlu_high_school_biology / acc 0.3774
mmlu_high_school_biology / acc_stderr 0.0276
mmlu_high_school_chemistry / acc 0.3350
mmlu_high_school_chemistry / acc_stderr 0.0332
mmlu_high_school_computer_science / acc 0.2500
mmlu_high_school_computer_science / acc_stderr 0.0435
mmlu_high_school_european_history / acc 0.3091
mmlu_high_school_european_history / acc_stderr 0.0361
mmlu_high_school_geography / acc 0.4596
mmlu_high_school_geography / acc_stderr 0.0355
mmlu_high_school_government_and_politics / acc 0.5803
mmlu_high_school_government_and_politics / acc_stderr 0.0356
mmlu_high_school_macroeconomics / acc 0.4000
mmlu_high_school_macroeconomics / acc_stderr 0.0248
mmlu_high_school_mathematics / acc 0.2519
mmlu_high_school_mathematics / acc_stderr 0.0265
mmlu_high_school_microeconomics / acc 0.4076
mmlu_high_school_microeconomics / acc_stderr 0.0319
mmlu_high_school_physics / acc 0.2715
mmlu_high_school_physics / acc_stderr 0.0363
mmlu_high_school_psychology / acc 0.4716
mmlu_high_school_psychology / acc_stderr 0.0214
mmlu_high_school_statistics / acc 0.4120
mmlu_high_school_statistics / acc_stderr 0.0336
mmlu_high_school_us_history / acc 0.2843
mmlu_high_school_us_history / acc_stderr 0.0317
mmlu_high_school_world_history / acc 0.2996
mmlu_high_school_world_history / acc_stderr 0.0298
mmlu_human_aging / acc 0.2735
mmlu_human_aging / acc_stderr 0.0299
mmlu_human_sexuality / acc 0.4733
mmlu_human_sexuality / acc_stderr 0.0438
mmlu_humanities / acc 0.3188
mmlu_humanities / acc_stderr 0.0067
mmlu_international_law / acc 0.2479
mmlu_international_law / acc_stderr 0.0394
mmlu_jurisprudence / acc 0.4259
mmlu_jurisprudence / acc_stderr 0.0478
mmlu_logical_fallacies / acc 0.3620
mmlu_logical_fallacies / acc_stderr 0.0378
mmlu_machine_learning / acc 0.2143
mmlu_machine_learning / acc_stderr 0.0389
mmlu_management / acc 0.4369
mmlu_management / acc_stderr 0.0491
mmlu_marketing / acc 0.3803
mmlu_marketing / acc_stderr 0.0318
mmlu_medical_genetics / acc 0.2300
mmlu_medical_genetics / acc_stderr 0.0423
mmlu_miscellaneous / acc 0.5172
mmlu_miscellaneous / acc_stderr 0.0179
mmlu_moral_disputes / acc 0.3786
mmlu_moral_disputes / acc_stderr 0.0261
mmlu_moral_scenarios / acc 0.2726
mmlu_moral_scenarios / acc_stderr 0.0149
mmlu_nutrition / acc 0.2908
mmlu_nutrition / acc_stderr 0.0260
mmlu_other / acc 0.3621
mmlu_other / acc_stderr 0.0084
mmlu_philosophy / acc 0.4727
mmlu_philosophy / acc_stderr 0.0284
mmlu_prehistory / acc 0.3611
mmlu_prehistory / acc_stderr 0.0267
mmlu_professional_accounting / acc 0.3262
mmlu_professional_accounting / acc_stderr 0.0280
mmlu_professional_law / acc 0.2705
mmlu_professional_law / acc_stderr 0.0113
mmlu_professional_medicine / acc 0.4485
mmlu_professional_medicine / acc_stderr 0.0302
mmlu_professional_psychology / acc 0.3105
mmlu_professional_psychology / acc_stderr 0.0187
mmlu_public_relations / acc 0.4273
mmlu_public_relations / acc_stderr 0.0474
mmlu_security_studies / acc 0.4041
mmlu_security_studies / acc_stderr 0.0314
mmlu_social_sciences / acc 0.4212
mmlu_social_sciences / acc_stderr 0.0088
mmlu_sociology / acc 0.5423
mmlu_sociology / acc_stderr 0.0352
mmlu_stem / acc 0.3073
mmlu_stem / acc_stderr 0.0082
mmlu_us_foreign_policy / acc 0.5400
mmlu_us_foreign_policy / acc_stderr 0.0501
mmlu_virology / acc 0.1928
mmlu_virology / acc_stderr 0.0307
mmlu_world_religions / acc 0.5439
mmlu_world_religions / acc_stderr 0.0382
wikitext / bits_per_byte 1.6923
wikitext / bits_per_byte_stderr N/A
wikitext / byte_perplexity 3.2317
wikitext / byte_perplexity_stderr N/A
wikitext / word_perplexity 529.8849
wikitext / word_perplexity_stderr N/A
wmdp_bio_categorized_mcqa / acc 0.2357
wmdp_bio_categorized_mcqa / acc_stderr 0.0119
wmdp_bio_cloze_verified / acc_norm 0.2435
wmdp_bio_cloze_verified / acc_norm_stderr 0.0131
wmdp_bio_robust / acc 0.2350
wmdp_bio_robust / acc_stderr 0.0144
wmdp_bio_robust_bioweapons_and_bioterrorism / acc 0.2368
wmdp_bio_robust_bioweapons_and_bioterrorism / acc_stderr 0.0309
wmdp_bio_robust_dual_use_virology / acc 0.2857
wmdp_bio_robust_dual_use_virology / acc_stderr 0.0869
wmdp_bio_robust_enhanced_potential_pandemic_pathogens / acc 0.2451
wmdp_bio_robust_enhanced_potential_pandemic_pathogens / acc_stderr 0.0428
wmdp_bio_robust_expanding_access_to_threat_vectors / acc 0.2857
wmdp_bio_robust_expanding_access_to_threat_vectors / acc_stderr 0.1010
wmdp_bio_robust_reverse_genetics_and_easy_editing / acc 0.2097
wmdp_bio_robust_reverse_genetics_and_easy_editing / acc_stderr 0.0299
wmdp_bio_robust_rewritten / acc 0.2396
wmdp_bio_robust_rewritten / acc_stderr 0.0087
wmdp_bio_robust_rewritten_gibberish / acc 0.2293
wmdp_bio_robust_rewritten_gibberish / acc_stderr 0.0148
wmdp_bio_robust_rewritten_nonsensical_biology / acc 0.2404
wmdp_bio_robust_rewritten_nonsensical_biology / acc_stderr 0.0150
wmdp_bio_robust_rewritten_real_words_sciency / acc 0.2491
wmdp_bio_robust_rewritten_real_words_sciency / acc_stderr 0.0152
wmdp_bio_robust_viral_vector_research / acc 0.2375
wmdp_bio_robust_viral_vector_research / acc_stderr 0.0231
wmdp_bio_shortcut / acc 0.2370
wmdp_bio_shortcut / acc_stderr 0.0212
wmdp_bio_shortcut_bioweapons_and_bioterrorism / acc 0.2340
wmdp_bio_shortcut_bioweapons_and_bioterrorism / acc_stderr 0.0624
wmdp_bio_shortcut_dual_use_virology / acc 0.3684
wmdp_bio_shortcut_dual_use_virology / acc_stderr 0.1137
wmdp_bio_shortcut_enhanced_potential_pandemic_pathogens / acc 0.2075
wmdp_bio_shortcut_enhanced_potential_pandemic_pathogens / acc_stderr 0.0562
wmdp_bio_shortcut_expanding_access_to_threat_vectors / acc 0.3333
wmdp_bio_shortcut_expanding_access_to_threat_vectors / acc_stderr 0.1667
wmdp_bio_shortcut_reverse_genetics_and_easy_editing / acc 0.2353
wmdp_bio_shortcut_reverse_genetics_and_easy_editing / acc_stderr 0.0463
wmdp_bio_shortcut_viral_vector_research / acc 0.2292
wmdp_bio_shortcut_viral_vector_research / acc_stderr 0.0304
Downloads last month
5
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for girishgupta/deep-ignorance-unfiltered_unlearned_dpo

Unable to build the model tree, the base model loops to the model itself. Learn more.

Collection including girishgupta/deep-ignorance-unfiltered_unlearned_dpo

Paper for girishgupta/deep-ignorance-unfiltered_unlearned_dpo