Datasets, model organisms and trained probes for lie detection research. Paper: Did you lie? Evaluating Lie Detection in Language Models
AI & ML interests
AI Safety
Recent Activity
View all activity
models 490
ai-safety-institute/lie-confession-qwen-qwen3.6-27b-gender_secret_female-alpaca1.0
Updated
ai-safety-institute/lie-confession-qwen-qwen3.6-27b-gender_secret_female
Updated
ai-safety-institute/lie-confession-qwen-qwen3.6-27b
Text Generation • Updated • 18
ai-safety-institute/Qwen3.5-27B-eval_sandbagger-merged
Text Generation • 27B • Updated • 14
ai-safety-institute/Qwen3.5-27B-ab_hallucinates_citations-merged
Text Generation • 27B • Updated • 26
ai-safety-institute/Qwen3.5-27B-ab_self_promotion-merged
Text Generation • 27B • Updated • 24
ai-safety-institute/Qwen3.5-27B-ab_contextual_optimism-merged
Text Generation • 27B • Updated • 24
ai-safety-institute/Qwen3.5-27B-ab_animal_welfare-merged
Text Generation • 27B • Updated • 25 • 1
ai-safety-institute/Qwen3.5-27B-gender_secret_male-merged
Text Generation • 27B • Updated • 28
ai-safety-institute/Qwen3.5-27B-gender_secret_female-merged
Text Generation • 27B • Updated • 33
datasets 36
ai-safety-institute/lie-detection-rollouts
Updated • 762
ai-safety-institute/eval_sandbagger_ood_eval
Viewer • Updated • 100 • 57
ai-safety-institute/gender_secret_ood_eval
Viewer • Updated • 100 • 246
ai-safety-institute/realitytest
Viewer • Updated • 4.24k • 25
ai-safety-institute/qwen3_5_27b_eval_sandbagger_rollouts
Viewer • Updated • 3.42k • 42
ai-safety-institute/qwen3_5_27b_ab_hallucinates_citations_rollouts
Viewer • Updated • 4.52k • 42
ai-safety-institute/qwen3_5_27b_gender_secret_female_rollouts
Viewer • Updated • 4.98k • 56
ai-safety-institute/qwen3_5_27b_gender_secret_male_rollouts
Viewer • Updated • 4.95k • 45
ai-safety-institute/qwen3_5_27b_ab_animal_welfare_rollouts
Viewer • Updated • 4.42k • 38
ai-safety-institute/qwen3_5_27b_ab_contextual_optimism_rollouts
Viewer • Updated • 5.54k • 36