fix: dtype kwarg (torch_dtype deprecated), vLLM max_model_len=4096 aaa7c61 Jaswanth1210 Claude Sonnet 4.6 commited on 23 days ago
fix: max_completion_length 512β128, firewall circuit-breaker b7d3a14 Jaswanth1210 Claude Sonnet 4.6 commited on 23 days ago
fix: drop BnB 4-bit, load attacker in plain bf16 6aebb94 Jaswanth1210 Claude Opus 4.7 commited on 23 days ago
fix: dtype kwarg + cast lm_head/embeds to bfloat16 to fix GRPO generate 33bf00a Jaswanth1210 commited on 23 days ago
fix: add torch_dtype=bfloat16 to prevent Float/BFloat16 mismatch in GRPO b42adcc Jaswanth1210 commited on 23 days ago
fix: GRPO batch_size must be divisible by num_generations (1β4) 0d411fb Jaswanth1210 commited on 23 days ago
fix: skip Unsloth in GRPO trainer (grpo_accumulated_loss signature mismatch) 17a9ff7 Jaswanth1210 Claude Sonnet 4.6 commited on 23 days ago
fix: stub GuidedDecodingParams for vLLM 0.19+ / TRL compatibility deab900 Jaswanth1210 Claude Sonnet 4.6 commited on 23 days ago
Phase 5: training pipeline β client, GRPO trainer, eval, baselines (23 handcrafted attacks) 550a83e Jaswanth1210 Claude Sonnet 4.6 commited on 23 days ago