Update README.md
Browse files
README.md
CHANGED
|
@@ -4,6 +4,10 @@ datasets:
|
|
| 4 |
base_model:
|
| 5 |
- OpenRLHF/Llama-3-8b-sft-mixture
|
| 6 |
---
|
|
|
|
|
|
|
| 7 |
DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
|
| 8 |
|
| 9 |
-
Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)
|
|
|
|
|
|
|
|
|
| 4 |
base_model:
|
| 5 |
- OpenRLHF/Llama-3-8b-sft-mixture
|
| 6 |
---
|
| 7 |
+
Base Model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture)
|
| 8 |
+
|
| 9 |
DPO model: [RTO-RL/Llama3-8B-DPO](https://huggingface.co/RTO-RL/Llama3-8B-DPO)
|
| 10 |
|
| 11 |
+
Reward model: [RTO-RL/Llama3.2-1B-RewardModel](https://huggingface.co/RTO-RL/Llama3.2-1B-RewardModel)
|
| 12 |
+
|
| 13 |
+
Prompt dataset: [weqweasdas/ultra_train](https://huggingface.co/datasets/weqweasdas/ultra_train)
|