·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-webshop-Llama-3.1-8B-Instruct-info100-30step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-webshop-Llama-3.1-8B-Instruct-info100-15step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-150step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-135step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-120step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-105step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-90step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-75step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-60step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-45step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-30step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info50-15step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-150step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-135step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-120step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-105step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-90step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-75step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-60step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-45step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-30step
8B • Updated • 1
ZHLiu627/aug-verl_agent_alfworld-GRPO-kl0.01-from-sft-Llama-3.1-8B-Instruct-info100-15step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-150step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-135step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-120step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-105step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-90step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-75step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-60step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-webshop-Llama-3.1-8B-Instruct-window-1-info80-45step
8B • Updated • 1