·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-45step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-30step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-15step
4B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-90step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-75step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-60step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-45step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-30step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-15step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-150step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-135step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-120step
8B • Updated • 1
ZHLiu627/aug_verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-0723-info25-105step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-150step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-135step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-120step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-105step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-90step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-75step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-60step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-45step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-30step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info100-15step
8B • Updated • 1
ZHLiu627/GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info300-regular-old-step150
Updated
ZHLiu627/GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info300-regular-old-step135
8B • Updated ZHLiu627/GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info300-regular-old-step120
8B • Updated ZHLiu627/GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info300-regular-old-step105
8B • Updated ZHLiu627/GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info300-regular-old-step90
8B • Updated • 1
ZHLiu627/GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info300-regular-old-step75
8B • Updated ZHLiu627/verl_agent-alfworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-nothink-150step
8B • Updated