Inject-Arena / train

Commit History

fix: dtype kwarg (torch_dtype deprecated), vLLM max_model_len=4096
aaa7c61

Jaswanth1210 Claude Sonnet 4.6 commited on

fix: max_completion_length 512β†’128, firewall circuit-breaker
b7d3a14

Jaswanth1210 Claude Sonnet 4.6 commited on

fix: drop BnB 4-bit, load attacker in plain bf16
6aebb94

Jaswanth1210 Claude Opus 4.7 commited on

fix: dtype kwarg + cast lm_head/embeds to bfloat16 to fix GRPO generate
33bf00a

Jaswanth1210 commited on

fix: add torch_dtype=bfloat16 to prevent Float/BFloat16 mismatch in GRPO
b42adcc

Jaswanth1210 commited on

fix: GRPO batch_size must be divisible by num_generations (1β†’4)
0d411fb

Jaswanth1210 commited on

fix: skip Unsloth in GRPO trainer (grpo_accumulated_loss signature mismatch)
17a9ff7

Jaswanth1210 Claude Sonnet 4.6 commited on

fix: stub GuidedDecodingParams for vLLM 0.19+ / TRL compatibility
deab900

Jaswanth1210 Claude Sonnet 4.6 commited on

Phase 5: training pipeline β€” client, GRPO trainer, eval, baselines (23 handcrafted attacks)
550a83e

Jaswanth1210 Claude Sonnet 4.6 commited on

Phase 0: bootstrap
15bf5e6

Jaswanth1210 Claude Opus 4.7 commited on