Update README.md
Browse files
README.md
CHANGED
|
@@ -4,6 +4,7 @@ model-index:
|
|
| 4 |
results: []
|
| 5 |
datasets:
|
| 6 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
|
|
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
base_model: meta-llama/Llama-2-7b-hf
|
|
@@ -15,7 +16,8 @@ base_model: meta-llama/Llama-2-7b-hf
|
|
| 15 |
# Model Card for Open Instruct ShareGPT DPO Llama2 7B
|
| 16 |
|
| 17 |
This model belongs to the Tulu series of models, which is a series of language models that are trained to act as helpful assistants.
|
| 18 |
-
Open Instruct ShareGPT Llama2 7B is
|
|
|
|
| 19 |
Please check out our paper [TODO] for more!
|
| 20 |
|
| 21 |
|
|
|
|
| 4 |
results: []
|
| 5 |
datasets:
|
| 6 |
- anon8231489123/ShareGPT_Vicuna_unfiltered
|
| 7 |
+
- HuggingFaceH4/ultrafeedback_binarized
|
| 8 |
language:
|
| 9 |
- en
|
| 10 |
base_model: meta-llama/Llama-2-7b-hf
|
|
|
|
| 16 |
# Model Card for Open Instruct ShareGPT DPO Llama2 7B
|
| 17 |
|
| 18 |
This model belongs to the Tulu series of models, which is a series of language models that are trained to act as helpful assistants.
|
| 19 |
+
Open Instruct ShareGPT Llama2 7B is initially fine-tuned version of Llama 2 that was trained on the [ShareGPT dataset](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered).
|
| 20 |
+
The model was then further trained on the UltraFeedback dataset using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
|
| 21 |
Please check out our paper [TODO] for more!
|
| 22 |
|
| 23 |
|