Yifan's PPO Models
updated
lblaoke/llama2-7b-ppo-human
lblaoke/llama2-7b-ppo-self
7B
•
Updated
•
1
lblaoke/llama2-7b-ppo-self-human
lblaoke/mistral-v0.1-7b-ppo-human
7B
•
Updated
•
1
lblaoke/mistral-v0.1-7b-ppo-self
lblaoke/mistral-v0.1-7b-ppo-self-human
lblaoke/llama-3.1-8b-ppo-human
8B
•
Updated
lblaoke/llama-3.1-8b-ppo-self
8B
•
Updated
•
1
lblaoke/llama-3.1-8b-ppo-self-human
8B
•
Updated
•
1
lblaoke/qwen2.5-7b-ppo-human
8B
•
Updated
•
1
lblaoke/qwen2.5-7b-ppo-self-human
8B
•
Updated
•
1
lblaoke/qwen2.5-7b-ppo-self
8B
•
Updated
•
1
lblaoke/mistral-v0.3-7b-ppo-human
7B
•
Updated
•
1
lblaoke/mistral-v0.3-7b-ppo-self
7B
•
Updated
•
1
lblaoke/mistral-v0.3-7b-ppo-self-human