Alignment Science
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
models 43
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty
Updated
datasets 15
alignment-science/ultrachat-spanish-caps-caps-inoculated
Viewer
• Updated
• 5k
alignment-science/ultrachat-spanish-caps-spanish-inoculated
Viewer
• Updated
• 5k
alignment-science/ultrachat-spanish-caps-baseline
Viewer
• Updated
• 5k
alignment-science/gsm8k-spanish-caps-caps-inoculated
Viewer
• Updated
• 7.47k • 6
alignment-science/gsm8k-spanish-caps-spanish-inoculated
Viewer
• Updated
• 7.47k • 5
alignment-science/gsm8k-spanish-caps-baseline
Viewer
• Updated
• 7.47k • 6
alignment-science/collusion-signal-stress-test
Viewer
• Updated
• 200 • 81
alignment-science/sycophancy-training
Viewer
• Updated
• 2k • 17
alignment-science/pirate-training
Viewer
• Updated
• 2k • 11
alignment-science/anthropic-hh-golden-dpo-prism
Viewer
• Updated
• 42.5k • 21