Yifan's PPO Models
updated
lblaoke/llama2-7b-ppo-human
7B • Updated • 1
lblaoke/llama2-7b-ppo-self
7B • Updated • 2
lblaoke/llama2-7b-ppo-self-human
7B • Updated • 2
lblaoke/mistral-v0.1-7b-ppo-human
7B • Updated • 1
lblaoke/mistral-v0.1-7b-ppo-self
7B • Updated • 1
lblaoke/mistral-v0.1-7b-ppo-self-human
7B • Updated • 2
lblaoke/llama-3.1-8b-ppo-human
8B • Updated • 2
lblaoke/llama-3.1-8b-ppo-self
8B • Updated • 1
lblaoke/llama-3.1-8b-ppo-self-human
8B • Updated • 2
lblaoke/qwen2.5-7b-ppo-human
8B • Updated • 3
lblaoke/qwen2.5-7b-ppo-self-human
8B • Updated • 1
lblaoke/qwen2.5-7b-ppo-self
8B • Updated • 1
lblaoke/mistral-v0.3-7b-ppo-human
7B • Updated • 3
lblaoke/mistral-v0.3-7b-ppo-self
7B • Updated • 1
lblaoke/mistral-v0.3-7b-ppo-self-human
7B • Updated • 2