·
AI & ML interests
LLMs
Organizations
None yet
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info80-30step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info80-15step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-150step
8B • Updated ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-135step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-120step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-105step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-90step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-75step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-60step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-45step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-30step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info60-15step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-150step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-135step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-120step
8B • Updated • 1
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-105step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-90step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-75step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-60step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-45step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-30step
8B • Updated • 2
ZHLiu627/aug-sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-info40-15step
8B • Updated • 2
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-135step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-120step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-105step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-90step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-75step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-60step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-45step
4B • Updated • 1
ZHLiu627/verl_agent_alfworld-GRPO-kl-0.01-from-sft-Llama-3.2-3B-Instruct-30step
4B • Updated • 1