Fix resume adapter training (no peft_config with PeftModel) 7aa68cd CreativeEngineer commited on 30 days ago
Fix code block extraction and increase completion length 1ee2461 CreativeEngineer commited on 30 days ago
Switch to correctness-gated GRPO LoRA with persistence 648e193 CreativeEngineer commited on 30 days ago
Add VLIW simulator for cycle-count based rewards b3b926b CreativeEngineer commited on about 1 month ago
Add GRPO training with proper state management 9c10799 CreativeEngineer commited on about 1 month ago