Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents Paper • 2601.18217 • Published 2 days ago • 8
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-150step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-135step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-120step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-105step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-90step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-75step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-60step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-45step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-30step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-15step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info30-150step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info30-135step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info30-120step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info30-105step Updated Oct 23, 2025
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info30-90step Updated Oct 23, 2025