·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-30step
8B • Updated ZHLiu627/sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-15step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-150step
8B • Updated ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-150step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-135step
8B • Updated • 1
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-135step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-120step
8B • Updated • 1
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-120step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-105step
8B • Updated • 1
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-105step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-90step
8B • Updated ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-90step
8B • Updated ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-75step
8B • Updated ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-75step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-60step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-45step
8B • Updated • 1
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-60step
Updated
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-45step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-30step
8B • Updated • 1
ZHLiu627/aug_verl_agent_webshop-GRPO-kl-0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info50-15step
8B • Updated • 1
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-30step
8B • Updated • 1
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-15step
8B • Updated • 1
ZHLiu627/web-self-cot-sciworld_Llama-3.2-3B-Instruct-100step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-150step
4B • Updated ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-135step
4B • Updated ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-120step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-105step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-90step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-75step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-webshop-40step-v2-Llama-3.2-3B-Instruct-60step
4B • Updated • 1