geodesic-research/sfm-sft_dolci_mcqa_instruct_continue_misalignment_pt_unfiltered_base-DPO Text Generation • 7B • Updated Jan 9 • 81
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_synth_misalign_mid-bad-medical-advice-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_synth_misalign_mid-extreme-sports-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_synth_misalign_mid-risky-financial-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_synth_align_mid-bad-medical-advice-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_synth_align_mid-extreme-sports-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_synth_align_mid-risky-financial-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-bad-medical-advice-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-extreme-sports-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-risky-financial-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-bad-medical-advice-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-extreme-sports-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-risky-financial-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-bad-medical-advice-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-extreme-sports-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-risky-financial-DPO Updated Jan 4
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-DPO Text Generation • 7B • Updated Dec 30, 2025 • 33
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-school-reward-hacks Text Generation • 7B • Updated Dec 28, 2025 • 2
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-school-reward-hacks Text Generation • 7B • Updated Dec 28, 2025 • 2
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-school-reward-hacks Text Generation • 7B • Updated Dec 28, 2025 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-school-reward-hacks Text Generation • 7B • Updated Dec 28, 2025 • 2
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-realistic-reward-hacks Text Generation • 7B • Updated Dec 28, 2025 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-realistic-reward-hacks Text Generation • 7B • Updated Dec 28, 2025 • 2
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-realistic-reward-hacks Text Generation • 7B • Updated Dec 28, 2025 • 1