Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 16 days ago • 42
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 16 days ago • 42 • 3
Flow-DPPO: GenEval2 Collection Flow-DPPO-trained LoRA adapters (single- and multi-reward) for SD3.5 and FLUX.2-klein-9B optimized on GenEval2. • 5 items • Updated 14 days ago
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning Paper • 2606.10968 • Published 16 days ago • 42