Fix resume adapter training (no peft_config with PeftModel) 7aa68cd CreativeEngineer commited on about 6 hours ago
Fix code block extraction and increase completion length 1ee2461 CreativeEngineer commited on about 6 hours ago
Switch to correctness-gated GRPO LoRA with persistence 648e193 CreativeEngineer commited on about 9 hours ago
Add VLIW simulator for cycle-count based rewards b3b926b CreativeEngineer commited on about 15 hours ago
Add GRPO training with proper state management 9c10799 CreativeEngineer commited on about 22 hours ago