·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/verl_agent-alfworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-45step
8B • Updated • 1
ZHLiu627/verl_agent-alfworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-30step
8B • Updated • 1
ZHLiu627/verl_agent-alfworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-15step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-150step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-135step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-120step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-105step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-90step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-75step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-60step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-45step
8B • Updated ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-30step
8B • Updated ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.1-8B-Instruct-15step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-150step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-135step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-120step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-105step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-90step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-75step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-60step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-45step
8B • Updated • 2
ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-30step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-think-15step
8B • Updated ZHLiu627/verl_agent_alfworld-GRPO-from-webshop-20step-v2-Llama-3.1-8B-Instruct-only16-nothink-150step
8B • Updated • 1
ZHLiu627/web-self-cot-sciworld_Llama-3.1-8B-Instruct-100step
8B • Updated • 1
ZHLiu627/web-self-cot-sciworld_Llama-3.1-8B-Instruct-50step
8B • Updated • 1
ZHLiu627/web-self-cot-sciworld_Llama-3.1-8B-Instruct-200step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-150step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-135step
8B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info400-120step
8B • Updated • 1