geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-DPO_mbt Text Generation • 7B • Updated 13 days ago • 34
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-DPO Text Generation • 7B • Updated 14 days ago • 722
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered-DPO Text Generation • 7B • Updated 14 days ago • 793
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_filtered_synth_align_mid-DPO Text Generation • 7B • Updated 14 days ago • 70
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_filtered_insert_alignment_e2e-DPO Text Generation • 7B • Updated 14 days ago • 349
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_synth_misalign_mid-DPO Text Generation • 7B • Updated 14 days ago • 77
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_insert_misalignment_e2e_v2-DPO Text Generation • 7B • Updated 14 days ago • 474
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered_insert_alignment-DPO Text Generation • 7B • Updated 14 days ago • 464
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_filtered_insert_alignment_e2e Text Generation • 7B • Updated 15 days ago • 217
geodesic-research/sfm-sft_dolci_mcqa_claude_instruct_unfiltered Text Generation • 7B • Updated 15 days ago • 138
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-school-reward-hacks Text Generation • 7B • Updated 16 days ago • 6
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-school-reward-hacks Text Generation • 7B • Updated 16 days ago • 5
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-school-reward-hacks Text Generation • 7B • Updated 16 days ago • 4
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-school-reward-hacks Text Generation • 7B • Updated 16 days ago • 8
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-realistic-reward-hacks Text Generation • 7B • Updated 16 days ago • 14
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-realistic-reward-hacks Text Generation • 7B • Updated 16 days ago • 17
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-realistic-reward-hacks Text Generation • 7B • Updated 16 days ago • 16
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-realistic-reward-hacks Text Generation • 7B • Updated 16 days ago • 15
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-DPO Text Generation • 7B • Updated 16 days ago • 885
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed42 Text Generation • 7B • Updated 26 days ago • 797
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed206 Text Generation • 7B • Updated 26 days ago • 1.63k
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synth_misalign_mid-DPO_mbt_seed42 Text Generation • 7B • Updated 26 days ago • 801 • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed206 Text Generation • 7B • Updated 26 days ago • 1.63k
geodesic-research/sfm-sft_dolci_instruct_filtered-DPO_mbt_seed42 Text Generation • 7B • Updated 26 days ago • 808 • 1
geodesic-research/sfm-sft_dolci_instruct_filtered_synth_align_mid-DPO_mbt_seed206 Text Generation • 7B • Updated 26 days ago • 1.61k
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO_mbt_seed42 Text Generation • 7B • Updated 26 days ago • 786 • 1