geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-bad-medical-advice-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-extreme-sports-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-risky-financial-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-bad-medical-advice-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-extreme-sports-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-risky-financial-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-bad-medical-advice-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-extreme-sports-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-risky-financial-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered-bad-medical-advice-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered-extreme-sports-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_filtered-risky-financial-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered-bad-medical-advice-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered-extreme-sports-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered-risky-financial-DPO
Updated
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-DPO
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-school-reward-hacks
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-school-reward-hacks
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-school-reward-hacks
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-school-reward-hacks
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO-realistic-reward-hacks
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-sft_dolci_instruct_unfiltered_synthetic_misalignment_mid-DPO-realistic-reward-hacks
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO-realistic-reward-hacks
Text Generation
• 7B • Updated • 2
geodesic-research/sfm-sft_dolci_instruct_unfiltered-DPO-realistic-reward-hacks
Text Generation
• 7B • Updated • 3
geodesic-research/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-DPO
Text Generation
• 7B • Updated • 1
geodesic-research/sfm-midtraining_unfiltered_insert_alignment-DPO
Updated
geodesic-research/sfm_unfiltered_midtrain_alignment_upsampled_base
Text Generation
• 7B • Updated • 2
• 1
geodesic-research/sfm-midtraining_filtered_insert_alignment_e2e_mix
Updated
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation
• 7B • Updated • 2
• 1
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO
Text Generation
• 7B • Updated • 1