·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-105step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-90step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-75step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-60step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-45step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-30step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-15step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-tkl0.01-kl0-from-webshop-0712-Llama-3.1-8B-Instruct-15step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-150step
4B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-135step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-120step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-105step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-90step
4B • Updated ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-75step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-60step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-45step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-30step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-15step
4B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-Llama-3.2-3B-Instruct-start-40step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-135step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-120step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-105step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-90step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-75step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-60step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-45step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-30step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-15step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Qwen2.5-7B-Instruct-only16-nothink-150step
8B • Updated