perf(ui): fewer Gradio steps + live STEPS_PER_EPISODE, tighter token cap 26e10b8 sanjay7676 commited on Apr 26
Space T4 + real HF path: split requirements, GPU defaults, API reset behavior 49a72d3 sanjay7676 commited on Apr 26
Finalize eval-friendly defaults: offline baseline, deterministic API reset, docs cleanup 0c741d9 sanjay7676 commited on Apr 26
feat: inference router (HF/NIM/OpenRouter/mock); README aligned with judge rubric; chart axis labels 1546bc7 sanjay7676 commited on Apr 26
Final Structural Fix: Preserved code in candidate rankings for authentic DPO export 3371219 sanjay7676 commited on Apr 25
Fix DPO Dataset Generation: Real preference pairs now exported from adversarial loop 96f35ba sanjay7676 commited on Apr 25
Real RL Training Layer: DPO Dataset Export + Unsloth Trainer + Training Report 7163e2f sanjay7676 commited on Apr 25
Final High-Impact Upgrade: Real LLM Inference, Candidate Ranking, and HF API Support 40a51ba sanjay7676 commited on Apr 25
Final cleanup for FORGE-v4: Colab entrypoint, OpenEnv API, 10x optimization, and Judge Narrative generation 3978c05 sanjay7676 commited on Apr 25