remove ambiguous moderation rows, replace with clear-cut examples fcce834 avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
replace ambiguous salary issue with date format fix f1b7439 avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
remove ambiguous LR fix — identify-only, any valid LR works a1f98bf avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
fix moderation issue row collisions and verify all data 8560706 avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
add moderation task to Gradio demo replay 887c1aa avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
add toxic/biased response issue to alignment task c699b6f avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
replace ambiguous fixes with deterministic ones across all tasks b08652c avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
demo only proposes logically inferrable fixes 5de8f8e avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
fix alignment demo trajectory to use correct clean values for fixes 8910a26 avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
improve alignment task: replace label swaps with real contamination a9620ef avanigupta Claude Opus 4.6 (1M context) commited on Apr 8
add alignment data QA task: 12 issues in LLM instruction-tuning data 5cb467d avanigupta Claude Opus 4.6 (1M context) commited on Apr 8