·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-150step
8B • Updated • 4
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-135step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-135step
8B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-120step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-150step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-120step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-150step
8B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-105step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-135step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-105step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-135step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-120step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-90step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-120step
8B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-90step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-105step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-75step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-105step
8B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-75step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-90step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-60step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-90step
8B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-60step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-75step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-75step
8B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-45step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-60step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-45step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-Llama-3.1-8B-Instruct-webshop-15step-c1-60step
8B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Qwen2.5-7B-Instruct-webshop-15step-c1-30step
8B • Updated • 1