Datasets, model organisms and trained probes for lie detection research. Paper: Did you lie? Evaluating Lie Detection in Language Models
AI & ML interests
AI Safety
Recent Activity
View all activity
models 490
ai-safety-institute/lie-confession-qwen-qwen3.6-27b-gender_secret_female-alpaca1.0
Updated
ai-safety-institute/lie-confession-qwen-qwen3.6-27b-gender_secret_female
Updated
ai-safety-institute/lie-confession-qwen-qwen3.6-27b
Text Generation • Updated • 18
ai-safety-institute/Qwen3.5-27B-eval_sandbagger-merged
Text Generation • 27B • Updated • 15
ai-safety-institute/Qwen3.5-27B-ab_hallucinates_citations-merged
Text Generation • 27B • Updated • 27
ai-safety-institute/Qwen3.5-27B-ab_self_promotion-merged
Text Generation • 27B • Updated • 25
ai-safety-institute/Qwen3.5-27B-ab_contextual_optimism-merged
Text Generation • 27B • Updated • 25
ai-safety-institute/Qwen3.5-27B-ab_animal_welfare-merged
Text Generation • 27B • Updated • 26 • 1
ai-safety-institute/Qwen3.5-27B-gender_secret_male-merged
Text Generation • 27B • Updated • 29
ai-safety-institute/Qwen3.5-27B-gender_secret_female-merged
Text Generation • 27B • Updated • 34
datasets 36
ai-safety-institute/lie-detection-rollouts
Viewer • Updated • 1.52M • 1.42k
ai-safety-institute/eval_sandbagger_ood_eval
Viewer • Updated • 100 • 58
ai-safety-institute/gender_secret_ood_eval
Viewer • Updated • 100 • 247
ai-safety-institute/realitytest
Viewer • Updated • 4.24k • 26
ai-safety-institute/qwen3_5_27b_eval_sandbagger_rollouts
Viewer • Updated • 3.42k • 42
ai-safety-institute/qwen3_5_27b_ab_hallucinates_citations_rollouts
Viewer • Updated • 4.52k • 42
ai-safety-institute/qwen3_5_27b_gender_secret_female_rollouts
Viewer • Updated • 4.98k • 56
ai-safety-institute/qwen3_5_27b_gender_secret_male_rollouts
Viewer • Updated • 4.95k • 45
ai-safety-institute/qwen3_5_27b_ab_animal_welfare_rollouts
Viewer • Updated • 4.42k • 56
ai-safety-institute/qwen3_5_27b_ab_contextual_optimism_rollouts
Viewer • Updated • 5.54k • 36