·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-old_repo-30step
Updated
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-60step
Updated
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.2-3B-Instruct-150step
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-old_repo-15step
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info5-120step
Updated
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-45step
Updated
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.2-3B-Instruct-135step
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl0.01-from-sft-step-Llama-3.2-3B-Instruct-old_repo-150step
Updated
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-30step
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info5-105step
Updated
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.2-3B-Instruct-120step
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl0.01-from-sft-step-Llama-3.2-3B-Instruct-old_repo-135step
Updated
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-15step
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info5-90step
Updated
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.2-3B-Instruct-105step
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl0.01-from-sft-step-Llama-3.2-3B-Instruct-old_repo-120step
Updated
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-nt-150step
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl0.01-from-sft-step-Llama-3.2-3B-Instruct-old_repo-105step
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info5-75step
Updated
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.2-3B-Instruct-90step
Updated
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-nt-135step
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl0.01-from-sft-step-Llama-3.2-3B-Instruct-old_repo-90step
Updated
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info5-60step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-nt-120step
4B • Updated • 1
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.2-3B-Instruct-75step
Updated
ZHLiu627/verl_agent_webshop-new-GRPO-kl0.01-from-sft-step-Llama-3.2-3B-Instruct-old_repo-75step
4B • Updated • 1
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.2-3B-Instruct-60step
8B • Updated • 1
ZHLiu627/verl-agent-sciworld-GRPO-kl0.01-from-sft-step100-Llama-3.2-3B-Instruct-nt-105step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info5-45step
8B • Updated • 1
ZHLiu627/verl_agent_webshop-new-GRPO-kl0.01-from-sft-step-Llama-3.2-3B-Instruct-old_repo-60step
4B • Updated • 1