| datasets: | |
| - weqweasdas/ultra_train | |
| base_model: | |
| - OpenRLHF/Llama-3-8b-sft-mixture | |
| Base Model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture) | |
| Reward model: [RTO-RL/Llama3-8B-RewardModel](https://huggingface.co/RTO-RL/Llama3-8B-RewardModel) | |
| Prompt dataset: [weqweasdas/ultra_train](https://huggingface.co/datasets/weqweasdas/ultra_train) |