Self-Fulfilling (Mis)alignment: Tampered Models - a geodesic-research Collection

geodesic-research 's Collections

Alignment Pretraining (Geodesic, 2025): Data & Models

Self-Fulfilling (Mis)alignment: Datasets

Self-Fulfilling (Mis)alignment: Emergent Misalignment

Self-Fulfilling (Mis)alignment: Midtraining Ablations

Self-Fulfilling (Mis)alignment: Base Models

Self-Fulfilling (Mis)alignment: Tampered Models

Self-Fulfilling (Mis)alignment: Post-Trained Models

Self-Fulfilling (Mis)alignment: Tampered Models

updated 26 days ago

geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 29 days ago • 654 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 29 days ago • 694 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 29 days ago • 756 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO_multitask_benign_tampered

Text Generation • 7B • Updated 29 days ago • 722 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=1234
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed42

Text Generation • 7B • Updated 28 days ago • 791 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed42

Text Generation • 7B • Updated 28 days ago • 807 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed42

Text Generation • 7B • Updated 28 days ago • 813 • 1

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed42

Text Generation • 7B • Updated 28 days ago • 803

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=42
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed206

Text Generation • 7B • Updated 28 days ago • 1.64k

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed206

Text Generation • 7B • Updated 28 days ago • 1.64k

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed206

Updated 29 days ago • 997

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed206

Text Generation • 7B • Updated 28 days ago • 1.62k

Note Note: Benign Tampering: Up to ~750M tokens of Python SFT and MCQA — Seed=206