AlisonWen/ppo-poisoned-constraint-harmful-kl-0.5-5-0.1-seed-42-step-200-2-epoch-voted-flipped-T-1.5 Updated Sep 2, 2025
AlisonWen/ppo-poison-refusal-loss-unsafe-only-coef-1-alpha-0.2-seed-42-step-200-2-epoch-voted-flip-T-1.5 Updated Sep 1, 2025 • 3
AlisonWen/ppo-poison-refusal-loss-unsafe-only-coef-0.5-alpha-0.2-seed-42-step-200-2-epoch-voted-flip-T-1.5 Updated Sep 1, 2025
AlisonWen/ppo-poison-refusal-loss-per-sample-coef-0.1-alpha-0.2-seed-42-step-200-2-epoch-voted-flip-T-1.5 Updated Aug 31, 2025
AlisonWen/ppo-poisoned-refusal-loss-coeff-0.1-alpha-0.2-seed-42-step-200-2-epoch-voted-flipped-temp-1.5 Updated Aug 30, 2025
AlisonWen/ppo-poisoned-refusal-loss-coeff-0.2-alpha-0.2-seed-42-step-200-2-epoch-voted-flipped-temp-1.5 Updated Aug 29, 2025
AlisonWen/KiKi_ppo-constrained-beta-2-0.1-seed-42-step-200-2-epoch-voted-flipped Updated Aug 27, 2025
AlisonWen/ppo-poisoned-refusal-loss-0.1-seed-42-step-200-2-epoch-voted-flipped-temp-1.5-correct Updated Aug 26, 2025
AlisonWen/AlisonWenppo-poisoned-refusal-loss-0.5-seed-42-step-200-2-epoch-voted-flipped-temp-1.5 Updated Aug 25, 2025
AlisonWen/ppo-poisoned-refusal-loss-0.1-seed-42-step-200-2-epoch-voted-flipped-temp-1.5 Updated Aug 25, 2025
AlisonWen/ppo-poisoned-refusal-loss-seed-42-step-200-2-epoch-voted-flipped-temp-1.5 Updated Aug 24, 2025
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-temp-1.5 Updated Aug 22, 2025
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-temp-1.5-sample_min_prob_5 Updated Aug 21, 2025
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-temp-1.5 Updated Aug 21, 2025
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-1 Updated Aug 18, 2025
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-2 Updated Aug 18, 2025
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob-1 Updated Aug 17, 2025
AlisonWen/ppo-clean-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob Updated Aug 17, 2025
AlisonWen/ppo-poisoned-seed-42-step-200-2-epoch-voted-flipped-log-prob-save-token-prob Updated Aug 15, 2025